Replace EVRP in DOM with ranger.

Message ID 20220428163015.595263-1-aldyh@redhat.com
State New
Headers
Series Replace EVRP in DOM with ranger. |

Commit Message

Aldy Hernandez April 28, 2022, 4:30 p.m. UTC
  [Jeff, this is the same patch I sent you last week with minor tweaks
to the commit message.]

[Despite the verbosity of the message, this is actually a pretty
straightforward patch.  It should've gone in last cycle, but there
was a nagging regression I couldn't get to until after stage1
had closed.]

There are 3 uses of EVRP in DOM that must be converted.
Unfortunately, they need to be converted in one go, so further
splitting of this patch would be problematic.

There's nothing here earth shattering.  It's all pretty obvious in
retrospect, but I've added a short description of each use to aid in
reviewing:

* Convert evrp use in cprop to ranger.

  This is easy, as cprop in DOM was converted to the ranger API last
  cycle, so this is just a matter of using a ranger instead of an
  evrp_range_analyzer.

* Convert evrp use in threader to ranger.

  The idea here is to use the hybrid approach we used for the initial
  VRP threader conversion last cycle.  The DOM threader will continue
  using the forward threader infrastructure while continuing to query
  DOM data structures, and only if the conditional does not relsolve,
  using the ranger.  This gives us the best of both worlds, and is a
  proven approach.

  Furthermore, as frange and prange come live in the next cycle, we
  can move away from the forward threader altogether, and just add
  another backward threader.  This will not only remove the last use
  of the forward threader, but will allow us to remove at least 1 or 2
  threader instances.

* Convert conditional folding to use the method used by the ranger and
  evrp.  Previously DOM was calling into the guts of
  simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
  using fold_cond() which rewrites the conditional and edges
  automatically.

  When legacy is removed, simplify_using_ranges will be further
  cleaned up, and there will only be one entry point into simplifying
  a statement.

* DOM was setting global ranges determined from unreachable edges as a
  side-effect of using the evrp engine.  We must handle these cases
  before nuking evrp, and DOM seems like a good fit.  I've just moved
  the snippet to DOM, but it could live anywhere else we do a DOM
  walk.

  For the record, this is the case *vrp handled:

	<bb C>:
	...
	if (c_5(D) != 5)
	goto <bb N>;
	else
	goto <bb M>;
	<bb N>:
	__builtin_unreachable ();
	<bb M>:

  If M dominates all uses of c_5, we can set the global range of c_5
  to [5,5].

I have tested on x86-64, pcc64le, and aarch64 Linux.

I also ran threading benchmarks as well as performance benchmarks.

DOM threads 1.56% more paths which ultimately yields a miniscule total
increase of 0.03%.

The conversion to ranger brings a 7.87% performance drop in DOM, which
is a wash in overall compilation.  This is in line with other
replacements of legacy evrp with ranger.  We handle a lot more cases.
It's not free :).

There is a a regression on Wstringop-overflow-4.C which I'm planning
on XFAILing.  It's another variant of the usual middle-end false
positives: having no ranges produces no warnings, but slightly refined
ranges, or worse-- isolating specific problematic cases in the
threader causes flare-ups.

As an aside, as Richi has suggested, I think we should discuss
restricting the threader's ability to thread highly unlikely paths.
These cause no end of pain for middle-end warnings.  However,
I don't know if this would conflict with path isolation for
things like null dereferencing.  ISTR you were interested in this.

BTW, I think the Wstringop-overflow-4.C test is problematic and I've
attached my analysis.  Basically the regression is caused by a bad
interaction with the rounding/alignment that placement new has inlined
into the IL.  This happens for int16_r[] which the test is testing.
Ranger can glean some range info, which causes DOM threading to
isolate a path which causes a warning.

OK for trunk?

gcc/ChangeLog:

	* tree-ssa-dom.cc (dom_jt_state): Pass ranger to constructor
	instead of evrp.
	(dom_jt_state::push): Remove m_evrp.
	(dom_jt_state::pop): Same.
	(dom_jt_state::record_ranges_from_stmt): Remove.
	(dom_jt_state::register_equiv): Remove updating of evrp ranges.
	(class dom_jt_simplifier): Pass ranger to constructor.
	Inherit from hybrid_jt_simplifier.
	(dom_jt_simplifier::simplify): Convert to ranger.
	(pass_dominator::execute): Same.
	(all_uses_feed_or_dominated_by_stmt): New.
	(dom_opt_dom_walker::set_global_ranges_from_unreachable_edges): New.
	(dom_opt_dom_walker::before_dom_children): Call
	set_global_ranges_from_unreachable_edges.
	Do not call record_ranges_from_stmt.
	(dom_opt_dom_walker::after_dom_children): Remove evrp use.
	(cprop_operand): Use int_range<> instead of value_range.
	(dom_opt_dom_walker::fold_cond): New.
	(dom_opt_dom_walker::optimize_stmt): Pass ranger to
	cprop_into_stmt.
	Use fold_cond() instead of vrp_visit_cond_stmt().
	* tree-ssa-threadedge.cc (jt_state::register_equivs_stmt): Do not
	pass state to simplifier.
	* vr-values.h (class vr_values): Make fold_cond public.

gcc/testsuite/ChangeLog:

	* gcc.dg/sancov/cmp0.c: Adjust for conversion to ranger.
	* gcc.dg/tree-ssa/ssa-dom-branch-1.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/vect/bb-slp-pr81635-2.c: Same.
	* gcc.dg/vect/bb-slp-pr81635-4.c: Same.
---
 .../g++.dg/warn/Wstringop-overflow-4.C        |  34 +++
 gcc/testsuite/gcc.dg/sancov/cmp0.c            |   2 +-
 .../gcc.dg/tree-ssa/ssa-dom-branch-1.c        |   5 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        |   2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-2.c  |   2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c  |   6 +-
 gcc/tree-ssa-dom.cc                           | 223 ++++++++++--------
 gcc/tree-ssa-threadedge.cc                    |   4 +-
 gcc/vr-values.h                               |   2 +-
 9 files changed, 168 insertions(+), 112 deletions(-)
  

Comments

Jeff Law April 28, 2022, 6:43 p.m. UTC | #1
On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> [Jeff, this is the same patch I sent you last week with minor tweaks
> to the commit message.]
And I just dropped it into the tester.  We should have the vast majority 
of targets tested by this time tomorrow.

>
> [Despite the verbosity of the message, this is actually a pretty
> straightforward patch.  It should've gone in last cycle, but there
> was a nagging regression I couldn't get to until after stage1
> had closed.]
You had other, non-work/gcc priorities.  No worries at missing gcc-12.


>
> There are 3 uses of EVRP in DOM that must be converted.
> Unfortunately, they need to be converted in one go, so further
> splitting of this patch would be problematic.
ACK


> There's nothing here earth shattering.  It's all pretty obvious in
> retrospect, but I've added a short description of each use to aid in
> reviewing:
>
> * Convert evrp use in cprop to ranger.
>
>    This is easy, as cprop in DOM was converted to the ranger API last
>    cycle, so this is just a matter of using a ranger instead of an
>    evrp_range_analyzer.
Yea.  We may have covered this before, but what does ranger do with a 
conditional equivalence?

ie if we have something like

if (a == b)
   {
     blah blah
   }

Within the true arm we know there's an equivalence.  *But* we don't want 
to just blindly replace a with b or vice-versa.  The equivalence is 
primarily useful for simplification rather than propagation.

In fact, we every much do not want to cprop in these cases.   It rarely 
helps in a meaningful way and there's no good way to know which is the 
better value to use.  Canonicalization on SSA_NAMEs doesn't  really help 
due to SSA_NAME recycling.


>
> * Convert evrp use in threader to ranger.
>
>    The idea here is to use the hybrid approach we used for the initial
>    VRP threader conversion last cycle.  The DOM threader will continue
>    using the forward threader infrastructure while continuing to query
>    DOM data structures, and only if the conditional does not relsolve,
>    using the ranger.  This gives us the best of both worlds, and is a
>    proven approach.
>
>    Furthermore, as frange and prange come live in the next cycle, we
>    can move away from the forward threader altogether, and just add
>    another backward threader.  This will not only remove the last use
>    of the forward threader, but will allow us to remove at least 1 or 2
>    threader instances.
Excellent.

>
> * Convert conditional folding to use the method used by the ranger and
>    evrp.  Previously DOM was calling into the guts of
>    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
>    using fold_cond() which rewrites the conditional and edges
>    automatically.
>
>    When legacy is removed, simplify_using_ranges will be further
>    cleaned up, and there will only be one entry point into simplifying
>    a statement.
Sure.

>
> * DOM was setting global ranges determined from unreachable edges as a
>    side-effect of using the evrp engine.  We must handle these cases
>    before nuking evrp, and DOM seems like a good fit.  I've just moved
>    the snippet to DOM, but it could live anywhere else we do a DOM
>    walk.
  It was a semi-desirable to record/refine global ranges in DOM, but I'd 
be surprised if too much code depended on that.  Presumably there's a 
testcase in the suite which we don't handle well if we don't let DOM 
refine/record global ranges?

>
>    For the record, this is the case *vrp handled:
>
> 	<bb C>:
> 	...
> 	if (c_5(D) != 5)
> 	goto <bb N>;
> 	else
> 	goto <bb M>;
> 	<bb N>:
> 	__builtin_unreachable ();
> 	<bb M>:
>
>    If M dominates all uses of c_5, we can set the global range of c_5
>    to [5,5].
And that will only happen if M eventually loops back to the conditional, 
right?  Otherwise all the uses wouldn't be dominated. I suspect this is 
exceedingly rare in practice an it looks really familiar.  This is 
coming from a testcase, right?

This is the only part of the patch that makes me go "ewwww", so I would 
like to at least explore if we can rethink the value of that test.

> There is a a regression on Wstringop-overflow-4.C which I'm planning
> on XFAILing.  It's another variant of the usual middle-end false
> positives: having no ranges produces no warnings, but slightly refined
> ranges, or worse-- isolating specific problematic cases in the
> threader causes flare-ups.
ACK.


>
> As an aside, as Richi has suggested, I think we should discuss
> restricting the threader's ability to thread highly unlikely paths.
> These cause no end of pain for middle-end warnings.  However,
> I don't know if this would conflict with path isolation for
> things like null dereferencing.  ISTR you were interested in this.
I've never though too  much about it.  You're thinking of Richi :-)

It's a balance.  I bet if we stop threading those paths we're going to 
see other issues start popping up like uninitialized warnings.

It may be the case that it's really the constant propagation on an 
isolated path that's more problematical for those warnings.  But I would 
probably contend that what isolation is doing is showing how certain 
constant values can flow into places where they're not expected and 
could cause problems.  So I wouldn't necessary suggest throttling it.

Happy to be proven wrong here!


>
> BTW, I think the Wstringop-overflow-4.C test is problematic and I've
> attached my analysis.  Basically the regression is caused by a bad
> interaction with the rounding/alignment that placement new has inlined
> into the IL.  This happens for int16_r[] which the test is testing.
> Ranger can glean some range info, which causes DOM threading to
> isolate a path which causes a warning.
Presumably this is triggering a warning at the new() call?

jeff
  
Richard Biener April 29, 2022, 7:09 a.m. UTC | #2
On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > [Jeff, this is the same patch I sent you last week with minor tweaks
> > to the commit message.]
> And I just dropped it into the tester.  We should have the vast majority
> of targets tested by this time tomorrow.
>
> >
> > [Despite the verbosity of the message, this is actually a pretty
> > straightforward patch.  It should've gone in last cycle, but there
> > was a nagging regression I couldn't get to until after stage1
> > had closed.]
> You had other, non-work/gcc priorities.  No worries at missing gcc-12.
>
>
> >
> > There are 3 uses of EVRP in DOM that must be converted.
> > Unfortunately, they need to be converted in one go, so further
> > splitting of this patch would be problematic.
> ACK
>
>
> > There's nothing here earth shattering.  It's all pretty obvious in
> > retrospect, but I've added a short description of each use to aid in
> > reviewing:
> >
> > * Convert evrp use in cprop to ranger.
> >
> >    This is easy, as cprop in DOM was converted to the ranger API last
> >    cycle, so this is just a matter of using a ranger instead of an
> >    evrp_range_analyzer.
> Yea.  We may have covered this before, but what does ranger do with a
> conditional equivalence?
>
> ie if we have something like
>
> if (a == b)
>    {
>      blah blah
>    }
>
> Within the true arm we know there's an equivalence.  *But* we don't want
> to just blindly replace a with b or vice-versa.  The equivalence is
> primarily useful for simplification rather than propagation.
>
> In fact, we every much do not want to cprop in these cases.   It rarely
> helps in a meaningful way and there's no good way to know which is the
> better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> due to SSA_NAME recycling.
>
>
> >
> > * Convert evrp use in threader to ranger.
> >
> >    The idea here is to use the hybrid approach we used for the initial
> >    VRP threader conversion last cycle.  The DOM threader will continue
> >    using the forward threader infrastructure while continuing to query
> >    DOM data structures, and only if the conditional does not relsolve,
> >    using the ranger.  This gives us the best of both worlds, and is a
> >    proven approach.
> >
> >    Furthermore, as frange and prange come live in the next cycle, we
> >    can move away from the forward threader altogether, and just add
> >    another backward threader.  This will not only remove the last use
> >    of the forward threader, but will allow us to remove at least 1 or 2
> >    threader instances.
> Excellent.
>
> >
> > * Convert conditional folding to use the method used by the ranger and
> >    evrp.  Previously DOM was calling into the guts of
> >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> >    using fold_cond() which rewrites the conditional and edges
> >    automatically.
> >
> >    When legacy is removed, simplify_using_ranges will be further
> >    cleaned up, and there will only be one entry point into simplifying
> >    a statement.
> Sure.
>
> >
> > * DOM was setting global ranges determined from unreachable edges as a
> >    side-effect of using the evrp engine.  We must handle these cases
> >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> >    the snippet to DOM, but it could live anywhere else we do a DOM
> >    walk.
>   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> be surprised if too much code depended on that.  Presumably there's a
> testcase in the suite which we don't handle well if we don't let DOM
> refine/record global ranges?
>
> >
> >    For the record, this is the case *vrp handled:
> >
> >       <bb C>:
> >       ...
> >       if (c_5(D) != 5)
> >       goto <bb N>;
> >       else
> >       goto <bb M>;
> >       <bb N>:
> >       __builtin_unreachable ();
> >       <bb M>:
> >
> >    If M dominates all uses of c_5, we can set the global range of c_5
> >    to [5,5].
> And that will only happen if M eventually loops back to the conditional,
> right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> exceedingly rare in practice an it looks really familiar.  This is
> coming from a testcase, right?

This is how we make use of assert() when it uses __builtin_unreachable.

Note the difficulty here is that we have to do this at a very specific point
in the pass pipeline (setting the global range), since the first time we'll
fold the if (c_5(D) != 5) it will be folded to false and the IL representation
of the assert() is gone.

So the idea was that with the IL present value-range analysis can determine
the range omf c_5(D) on the false path but once we remove it (and we do
want to remove it) we want to put a global range on c_5(D).

That means the best of both worlds would be to detect this pattern at
a specific point in the pass pipeline (somewhen before loop opts for sure)
and do both - remove the IL _and_ set the global range.  The fear of
doing this too early is of course that we lose the range by copy propagation
or similar transforms.  The fear of doing this change at a single place only
is that we miss it because there's a stray use before the conditional.

Note DOM is also run in pass_ipa_oacc_kernels which may be too early.

In the end we want to perform dead code elimination here so maybe
DCE is the best fit to do this (but as said we want to avoid doing this too
early).

> This is the only part of the patch that makes me go "ewwww", so I would
> like to at least explore if we can rethink the value of that test.
>
> > There is a a regression on Wstringop-overflow-4.C which I'm planning
> > on XFAILing.  It's another variant of the usual middle-end false
> > positives: having no ranges produces no warnings, but slightly refined
> > ranges, or worse-- isolating specific problematic cases in the
> > threader causes flare-ups.
> ACK.
>
>
> >
> > As an aside, as Richi has suggested, I think we should discuss
> > restricting the threader's ability to thread highly unlikely paths.
> > These cause no end of pain for middle-end warnings.  However,
> > I don't know if this would conflict with path isolation for
> > things like null dereferencing.  ISTR you were interested in this.
> I've never though too  much about it.  You're thinking of Richi :-)
>
> It's a balance.  I bet if we stop threading those paths we're going to
> see other issues start popping up like uninitialized warnings.
>
> It may be the case that it's really the constant propagation on an
> isolated path that's more problematical for those warnings.  But I would
> probably contend that what isolation is doing is showing how certain
> constant values can flow into places where they're not expected and
> could cause problems.  So I wouldn't necessary suggest throttling it.
>
> Happy to be proven wrong here!

I think we need to revisit the whole costing of threading which is of course
best done when we only have the backwards threader left.  I also always
wanted to play with some kind of optimistic code generation & rollback
on things like threading and unrolling, basically value-number the
duplicated stmts on-the-fly and make it "stop" when we reach a defined
threshold (and then throw away the result).  I never went around to
do this for unrolling since there we can end up duplicating diverging
control flow, but that at least doesn't happen with threading(?)

And just a note - please wait with installing this change - if approved - until
after GCC 12 is actually released.

Thanks,
Richard.

>
> >
> > BTW, I think the Wstringop-overflow-4.C test is problematic and I've
> > attached my analysis.  Basically the regression is caused by a bad
> > interaction with the rounding/alignment that placement new has inlined
> > into the IL.  This happens for int16_r[] which the test is testing.
> > Ranger can glean some range info, which causes DOM threading to
> > isolate a path which causes a warning.
> Presumably this is triggering a warning at the new() call?
>
> jeff
  
Aldy Hernandez April 29, 2022, 9:52 a.m. UTC | #3
On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > to the commit message.]
> > And I just dropped it into the tester.  We should have the vast majority
> > of targets tested by this time tomorrow.
> >
> > >
> > > [Despite the verbosity of the message, this is actually a pretty
> > > straightforward patch.  It should've gone in last cycle, but there
> > > was a nagging regression I couldn't get to until after stage1
> > > had closed.]
> > You had other, non-work/gcc priorities.  No worries at missing gcc-12.
> >
> >
> > >
> > > There are 3 uses of EVRP in DOM that must be converted.
> > > Unfortunately, they need to be converted in one go, so further
> > > splitting of this patch would be problematic.
> > ACK
> >
> >
> > > There's nothing here earth shattering.  It's all pretty obvious in
> > > retrospect, but I've added a short description of each use to aid in
> > > reviewing:
> > >
> > > * Convert evrp use in cprop to ranger.
> > >
> > >    This is easy, as cprop in DOM was converted to the ranger API last
> > >    cycle, so this is just a matter of using a ranger instead of an
> > >    evrp_range_analyzer.
> > Yea.  We may have covered this before, but what does ranger do with a
> > conditional equivalence?
> >
> > ie if we have something like
> >
> > if (a == b)
> >    {
> >      blah blah
> >    }
> >
> > Within the true arm we know there's an equivalence.  *But* we don't want
> > to just blindly replace a with b or vice-versa.  The equivalence is
> > primarily useful for simplification rather than propagation.
> >
> > In fact, we every much do not want to cprop in these cases.   It rarely
> > helps in a meaningful way and there's no good way to know which is the
> > better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> > due to SSA_NAME recycling.
> >
> >
> > >
> > > * Convert evrp use in threader to ranger.
> > >
> > >    The idea here is to use the hybrid approach we used for the initial
> > >    VRP threader conversion last cycle.  The DOM threader will continue
> > >    using the forward threader infrastructure while continuing to query
> > >    DOM data structures, and only if the conditional does not relsolve,
> > >    using the ranger.  This gives us the best of both worlds, and is a
> > >    proven approach.
> > >
> > >    Furthermore, as frange and prange come live in the next cycle, we
> > >    can move away from the forward threader altogether, and just add
> > >    another backward threader.  This will not only remove the last use
> > >    of the forward threader, but will allow us to remove at least 1 or 2
> > >    threader instances.
> > Excellent.
> >
> > >
> > > * Convert conditional folding to use the method used by the ranger and
> > >    evrp.  Previously DOM was calling into the guts of
> > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> > >    using fold_cond() which rewrites the conditional and edges
> > >    automatically.
> > >
> > >    When legacy is removed, simplify_using_ranges will be further
> > >    cleaned up, and there will only be one entry point into simplifying
> > >    a statement.
> > Sure.
> >
> > >
> > > * DOM was setting global ranges determined from unreachable edges as a
> > >    side-effect of using the evrp engine.  We must handle these cases
> > >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > >    walk.
> >   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > be surprised if too much code depended on that.  Presumably there's a
> > testcase in the suite which we don't handle well if we don't let DOM
> > refine/record global ranges?
> >
> > >
> > >    For the record, this is the case *vrp handled:
> > >
> > >       <bb C>:
> > >       ...
> > >       if (c_5(D) != 5)
> > >       goto <bb N>;
> > >       else
> > >       goto <bb M>;
> > >       <bb N>:
> > >       __builtin_unreachable ();
> > >       <bb M>:
> > >
> > >    If M dominates all uses of c_5, we can set the global range of c_5
> > >    to [5,5].
> > And that will only happen if M eventually loops back to the conditional,
> > right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> > exceedingly rare in practice an it looks really familiar.  This is
> > coming from a testcase, right?
>
> This is how we make use of assert() when it uses __builtin_unreachable.
>
> Note the difficulty here is that we have to do this at a very specific point
> in the pass pipeline (setting the global range), since the first time we'll
> fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> of the assert() is gone.
>
> So the idea was that with the IL present value-range analysis can determine
> the range omf c_5(D) on the false path but once we remove it (and we do
> want to remove it) we want to put a global range on c_5(D).
>
> That means the best of both worlds would be to detect this pattern at
> a specific point in the pass pipeline (somewhen before loop opts for sure)
> and do both - remove the IL _and_ set the global range.  The fear of
> doing this too early is of course that we lose the range by copy propagation
> or similar transforms.  The fear of doing this change at a single place only
> is that we miss it because there's a stray use before the conditional.
>
> Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
>
> In the end we want to perform dead code elimination here so maybe
> DCE is the best fit to do this (but as said we want to avoid doing this too
> early).

I agree that doing both would be the ideal scenario.  However, I would
prefer this be done in a follow-up patch to minimize the amount of
stuff being changed in one go.

To provide additional background, this was being done in evrp which
runs at -O2, but was also being done at -O1 as a side-effect of DOM
using the evrp engine and exporting global ranges.  For -O2, I don't
know how much DOM was originally (prior to ranger) contributing to
these assert global ranges,  since evrp was probably picking them up
first.  However, since evrp now runs in ranger mode, this
functionality has silently moved to DOM (without us realizing) for all
compilation levels.

Do we care about this functionality at -O1?  ISTR at least one -O1
testcase failing in RTL land if exporting global ranges isn't done in
DOM.  I don't remember if it was a global range due to a
__builtin_unreachable or otherwise.

One more thing, we have batted around the idea of running a fast evrp
even at -O1 (ranger without looking at back edges, IIRC).  This would
mean that when the DOM threader becomes disentangled from DOM (when we
have frange and prange), there's very little DOM will be doing that
for instance PRE can't get.  So perhaps, if we remove DOM, the place
to put this optimization is in the fast evrp, and fold the conditional
as has been suggested?  Anyways... I'm digressing.

Doing this optimization in DOM at least keeps with GCC 12, but I'm
happy to move it to whatever is recommended.

The test case is gcc.dg/builtin-unreachable-6a.c by the way.

>
> > This is the only part of the patch that makes me go "ewwww", so I would
> > like to at least explore if we can rethink the value of that test.
> >
> > > There is a a regression on Wstringop-overflow-4.C which I'm planning
> > > on XFAILing.  It's another variant of the usual middle-end false
> > > positives: having no ranges produces no warnings, but slightly refined
> > > ranges, or worse-- isolating specific problematic cases in the
> > > threader causes flare-ups.
> > ACK.
> >
> >
> > >
> > > As an aside, as Richi has suggested, I think we should discuss
> > > restricting the threader's ability to thread highly unlikely paths.
> > > These cause no end of pain for middle-end warnings.  However,
> > > I don't know if this would conflict with path isolation for
> > > things like null dereferencing.  ISTR you were interested in this.
> > I've never though too  much about it.  You're thinking of Richi :-)
> >
> > It's a balance.  I bet if we stop threading those paths we're going to
> > see other issues start popping up like uninitialized warnings.
> >
> > It may be the case that it's really the constant propagation on an
> > isolated path that's more problematical for those warnings.  But I would
> > probably contend that what isolation is doing is showing how certain
> > constant values can flow into places where they're not expected and
> > could cause problems.  So I wouldn't necessary suggest throttling it.
> >
> > Happy to be proven wrong here!
>
> I think we need to revisit the whole costing of threading which is of course
> best done when we only have the backwards threader left.  I also always
> wanted to play with some kind of optimistic code generation & rollback
> on things like threading and unrolling, basically value-number the
> duplicated stmts on-the-fly and make it "stop" when we reach a defined
> threshold (and then throw away the result).  I never went around to
> do this for unrolling since there we can end up duplicating diverging
> control flow, but that at least doesn't happen with threading(?)

WRT stopping after a threshold is reached,
back_threader_profitablity::profitable_path_p() bails after a number
of statements is reached.  If so, the path is not even put into the
queue.  So, no rolling back would be necessary.  Is this what you
meant, or something else?  The threaders (all variants) queue things
up and modify the IL at the end of the pass.

>
> And just a note - please wait with installing this change - if approved - until
> after GCC 12 is actually released.

Indeed.  This is far too invasive for the interim.  I just sent the
patch early so Jeff could get a chance to review it before I leave on
paternity leave.  But he's also volunteered to kick it over the goal
line if we can't get it in before May 9.

Aldy
  
Richard Biener April 29, 2022, 10:13 a.m. UTC | #4
On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > >
> > >
> > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > > to the commit message.]
> > > And I just dropped it into the tester.  We should have the vast majority
> > > of targets tested by this time tomorrow.
> > >
> > > >
> > > > [Despite the verbosity of the message, this is actually a pretty
> > > > straightforward patch.  It should've gone in last cycle, but there
> > > > was a nagging regression I couldn't get to until after stage1
> > > > had closed.]
> > > You had other, non-work/gcc priorities.  No worries at missing gcc-12.
> > >
> > >
> > > >
> > > > There are 3 uses of EVRP in DOM that must be converted.
> > > > Unfortunately, they need to be converted in one go, so further
> > > > splitting of this patch would be problematic.
> > > ACK
> > >
> > >
> > > > There's nothing here earth shattering.  It's all pretty obvious in
> > > > retrospect, but I've added a short description of each use to aid in
> > > > reviewing:
> > > >
> > > > * Convert evrp use in cprop to ranger.
> > > >
> > > >    This is easy, as cprop in DOM was converted to the ranger API last
> > > >    cycle, so this is just a matter of using a ranger instead of an
> > > >    evrp_range_analyzer.
> > > Yea.  We may have covered this before, but what does ranger do with a
> > > conditional equivalence?
> > >
> > > ie if we have something like
> > >
> > > if (a == b)
> > >    {
> > >      blah blah
> > >    }
> > >
> > > Within the true arm we know there's an equivalence.  *But* we don't want
> > > to just blindly replace a with b or vice-versa.  The equivalence is
> > > primarily useful for simplification rather than propagation.
> > >
> > > In fact, we every much do not want to cprop in these cases.   It rarely
> > > helps in a meaningful way and there's no good way to know which is the
> > > better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> > > due to SSA_NAME recycling.
> > >
> > >
> > > >
> > > > * Convert evrp use in threader to ranger.
> > > >
> > > >    The idea here is to use the hybrid approach we used for the initial
> > > >    VRP threader conversion last cycle.  The DOM threader will continue
> > > >    using the forward threader infrastructure while continuing to query
> > > >    DOM data structures, and only if the conditional does not relsolve,
> > > >    using the ranger.  This gives us the best of both worlds, and is a
> > > >    proven approach.
> > > >
> > > >    Furthermore, as frange and prange come live in the next cycle, we
> > > >    can move away from the forward threader altogether, and just add
> > > >    another backward threader.  This will not only remove the last use
> > > >    of the forward threader, but will allow us to remove at least 1 or 2
> > > >    threader instances.
> > > Excellent.
> > >
> > > >
> > > > * Convert conditional folding to use the method used by the ranger and
> > > >    evrp.  Previously DOM was calling into the guts of
> > > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> > > >    using fold_cond() which rewrites the conditional and edges
> > > >    automatically.
> > > >
> > > >    When legacy is removed, simplify_using_ranges will be further
> > > >    cleaned up, and there will only be one entry point into simplifying
> > > >    a statement.
> > > Sure.
> > >
> > > >
> > > > * DOM was setting global ranges determined from unreachable edges as a
> > > >    side-effect of using the evrp engine.  We must handle these cases
> > > >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> > > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > > >    walk.
> > >   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > > be surprised if too much code depended on that.  Presumably there's a
> > > testcase in the suite which we don't handle well if we don't let DOM
> > > refine/record global ranges?
> > >
> > > >
> > > >    For the record, this is the case *vrp handled:
> > > >
> > > >       <bb C>:
> > > >       ...
> > > >       if (c_5(D) != 5)
> > > >       goto <bb N>;
> > > >       else
> > > >       goto <bb M>;
> > > >       <bb N>:
> > > >       __builtin_unreachable ();
> > > >       <bb M>:
> > > >
> > > >    If M dominates all uses of c_5, we can set the global range of c_5
> > > >    to [5,5].
> > > And that will only happen if M eventually loops back to the conditional,
> > > right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> > > exceedingly rare in practice an it looks really familiar.  This is
> > > coming from a testcase, right?
> >
> > This is how we make use of assert() when it uses __builtin_unreachable.
> >
> > Note the difficulty here is that we have to do this at a very specific point
> > in the pass pipeline (setting the global range), since the first time we'll
> > fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> > of the assert() is gone.
> >
> > So the idea was that with the IL present value-range analysis can determine
> > the range omf c_5(D) on the false path but once we remove it (and we do
> > want to remove it) we want to put a global range on c_5(D).
> >
> > That means the best of both worlds would be to detect this pattern at
> > a specific point in the pass pipeline (somewhen before loop opts for sure)
> > and do both - remove the IL _and_ set the global range.  The fear of
> > doing this too early is of course that we lose the range by copy propagation
> > or similar transforms.  The fear of doing this change at a single place only
> > is that we miss it because there's a stray use before the conditional.
> >
> > Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
> >
> > In the end we want to perform dead code elimination here so maybe
> > DCE is the best fit to do this (but as said we want to avoid doing this too
> > early).
>
> I agree that doing both would be the ideal scenario.  However, I would
> prefer this be done in a follow-up patch to minimize the amount of
> stuff being changed in one go.
>
> To provide additional background, this was being done in evrp which
> runs at -O2, but was also being done at -O1 as a side-effect of DOM
> using the evrp engine and exporting global ranges.  For -O2, I don't
> know how much DOM was originally (prior to ranger) contributing to
> these assert global ranges,  since evrp was probably picking them up
> first.  However, since evrp now runs in ranger mode, this
> functionality has silently moved to DOM (without us realizing) for all
> compilation levels.

I see.

> Do we care about this functionality at -O1?  ISTR at least one -O1
> testcase failing in RTL land if exporting global ranges isn't done in
> DOM.  I don't remember if it was a global range due to a
> __builtin_unreachable or otherwise.

Well, at -O1 we definitely didn't set the global range (did we?) but
eventually pruned paths running into __builtin_unreachable ()
anyway (would have to check at which point).

>
> One more thing, we have batted around the idea of running a fast evrp
> even at -O1 (ranger without looking at back edges, IIRC).  This would
> mean that when the DOM threader becomes disentangled from DOM (when we
> have frange and prange), there's very little DOM will be doing that
> for instance PRE can't get.  So perhaps, if we remove DOM, the place
> to put this optimization is in the fast evrp, and fold the conditional
> as has been suggested?  Anyways... I'm digressing.

The idea is to get rid of DOM once its threading is separated out.

I think doing O (n * log n) value range analysis at -O1 would be welcome.
We didn't get around doing that for EVRP which I think had such
bound - the iterating VRP definitely didn't.  I'm not sure ranger really
qualifies given gori and all it relies on.  The advantage of the simple
whole-IL walk way was to reasonably easily guarantee such bound.
That's unfortunately lost now :/

> Doing this optimization in DOM at least keeps with GCC 12, but I'm
> happy to move it to whatever is recommended.

As said the intent of this trick is to preserve range information from
asserts even when we removed the IL carrying it.  When we care
about range info, which means when some VRP pass runs, which
is why it was done in such a pass.  That is, we probably want to
avoid altering the global range for -fno-tree-vrp.

> The test case is gcc.dg/builtin-unreachable-6a.c by the way.

But 6.c checks the same but even w/ DOM disabled...

> >
> > > This is the only part of the patch that makes me go "ewwww", so I would
> > > like to at least explore if we can rethink the value of that test.
> > >
> > > > There is a a regression on Wstringop-overflow-4.C which I'm planning
> > > > on XFAILing.  It's another variant of the usual middle-end false
> > > > positives: having no ranges produces no warnings, but slightly refined
> > > > ranges, or worse-- isolating specific problematic cases in the
> > > > threader causes flare-ups.
> > > ACK.
> > >
> > >
> > > >
> > > > As an aside, as Richi has suggested, I think we should discuss
> > > > restricting the threader's ability to thread highly unlikely paths.
> > > > These cause no end of pain for middle-end warnings.  However,
> > > > I don't know if this would conflict with path isolation for
> > > > things like null dereferencing.  ISTR you were interested in this.
> > > I've never though too  much about it.  You're thinking of Richi :-)
> > >
> > > It's a balance.  I bet if we stop threading those paths we're going to
> > > see other issues start popping up like uninitialized warnings.
> > >
> > > It may be the case that it's really the constant propagation on an
> > > isolated path that's more problematical for those warnings.  But I would
> > > probably contend that what isolation is doing is showing how certain
> > > constant values can flow into places where they're not expected and
> > > could cause problems.  So I wouldn't necessary suggest throttling it.
> > >
> > > Happy to be proven wrong here!
> >
> > I think we need to revisit the whole costing of threading which is of course
> > best done when we only have the backwards threader left.  I also always
> > wanted to play with some kind of optimistic code generation & rollback
> > on things like threading and unrolling, basically value-number the
> > duplicated stmts on-the-fly and make it "stop" when we reach a defined
> > threshold (and then throw away the result).  I never went around to
> > do this for unrolling since there we can end up duplicating diverging
> > control flow, but that at least doesn't happen with threading(?)
>
> WRT stopping after a threshold is reached,
> back_threader_profitablity::profitable_path_p() bails after a number
> of statements is reached.  If so, the path is not even put into the
> queue.  So, no rolling back would be necessary.  Is this what you
> meant, or something else?  The threaders (all variants) queue things
> up and modify the IL at the end of the pass.

I meant that as currently we have no idea about followup simplifications
on the isolated path we conservatively account for all stmts in there
and thus need an optimistically high threading --param.  We should
be able to do better by simplifying copied stmts with the some now
constant PHIs or conditionally derived values.  So we could do
"unbound" analysis but when we instantiate stop the actual copying
if simplification doesn't keep us within the desired bounds.
For example unrolling tries to estimate what's going to be optimized
away but if we actually did the simplification that would be more precise.

> >
> > And just a note - please wait with installing this change - if approved - until
> > after GCC 12 is actually released.
>
> Indeed.  This is far too invasive for the interim.  I just sent the
> patch early so Jeff could get a chance to review it before I leave on
> paternity leave.  But he's also volunteered to kick it over the goal
> line if we can't get it in before May 9.
>
> Aldy
>
  
Aldy Hernandez April 29, 2022, 11:47 a.m. UTC | #5
On Thu, Apr 28, 2022 at 8:43 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > [Jeff, this is the same patch I sent you last week with minor tweaks
> > to the commit message.]
> And I just dropped it into the tester.  We should have the vast majority
> of targets tested by this time tomorrow.
>
> >
> > [Despite the verbosity of the message, this is actually a pretty
> > straightforward patch.  It should've gone in last cycle, but there
> > was a nagging regression I couldn't get to until after stage1
> > had closed.]
> You had other, non-work/gcc priorities.  No worries at missing gcc-12.

Ha.  I keep forgetting I have a permanent excuse going forward for any
delays... parenthood!

>
>
> >
> > There are 3 uses of EVRP in DOM that must be converted.
> > Unfortunately, they need to be converted in one go, so further
> > splitting of this patch would be problematic.
> ACK
>
>
> > There's nothing here earth shattering.  It's all pretty obvious in
> > retrospect, but I've added a short description of each use to aid in
> > reviewing:
> >
> > * Convert evrp use in cprop to ranger.
> >
> >    This is easy, as cprop in DOM was converted to the ranger API last
> >    cycle, so this is just a matter of using a ranger instead of an
> >    evrp_range_analyzer.
> Yea.  We may have covered this before, but what does ranger do with a
> conditional equivalence?
>
> ie if we have something like
>
> if (a == b)
>    {
>      blah blah
>    }
>
> Within the true arm we know there's an equivalence.  *But* we don't want
> to just blindly replace a with b or vice-versa.  The equivalence is
> primarily useful for simplification rather than propagation.
>
> In fact, we every much do not want to cprop in these cases.   It rarely
> helps in a meaningful way and there's no good way to know which is the
> better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> due to SSA_NAME recycling.

Ranger only returns constants, not symbolics, so the call to
range_of_expr is guaranteed to be constant.  I verified that this
subtle change to range_of_expr didn't cause any problems in my
conversion to the value_query API in commit e1d01f4973 (last cyce):

    There is one subtle change.  The call to vr_value's
    op_with_constant_singleton_value_range can theoretically return
    non-constants, unlike the range_query API which only returns constants.
    In this particular case it doesn't matter because the symbolic stuff will
    have been handled by the const_and_copies/avail_exprs read in the
    SSA_NAME_VALUE copy immediately before.  I have verified this is the case
    by asserting that all calls to op_with_constant_singleton_value_range at
    this point return either NULL or an INTEGER_CST.

Just to be clear, even though ranger doesn't return a symbolic, it
does know about relations via the relation oracle the ranger uses.  So
on the TRUE arm, it will know that a==b, which it can leverage to fold
any statements.  This could be done with the statement folding API in
gimple-range-fold.h (or via range_of_stmt).  You could theoretically
query the relation oracle yourself with query_relation() which are
exported by the ranger (via the range_query class it inherits from).
But for the case above, we're calling range_of_expr which is
guaranteed to be a constant.

Since we're calling range_of_expr on operands, we shouldn't be folding
anything.  OTOH, if you were to call range_of_stmt on the conditional,
it could return  TRUE/FALSE based on how it would fold.  In
dom_opt_dom_walker::fold_cond() which I have provided in the patch, we
will fold conditionals with knowledge of relations, because it uses
range_of_stmt.

BTW, when frange and prange come live, there will be other attributes
that may be attached to a range.  For instance for prange, we'll have
a "points-to" attribute.   For example:

p_5 = &foo;

The range of p_5 will be nonzero with a points-to attribute of &foo +
0 or some such.

So later this cycle we'll have symbolics of sorts, but never in the
range itself, but as an attribute say prange::get_points_to().

Similarly for floats.  We'll probably have not-NAN, not-INF, etc attributes.

>
>
> >
> > * Convert evrp use in threader to ranger.
> >
> >    The idea here is to use the hybrid approach we used for the initial
> >    VRP threader conversion last cycle.  The DOM threader will continue
> >    using the forward threader infrastructure while continuing to query
> >    DOM data structures, and only if the conditional does not relsolve,
> >    using the ranger.  This gives us the best of both worlds, and is a
> >    proven approach.
> >
> >    Furthermore, as frange and prange come live in the next cycle, we
> >    can move away from the forward threader altogether, and just add
> >    another backward threader.  This will not only remove the last use
> >    of the forward threader, but will allow us to remove at least 1 or 2
> >    threader instances.
> Excellent.
>
> >
> > * Convert conditional folding to use the method used by the ranger and
> >    evrp.  Previously DOM was calling into the guts of
> >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> >    using fold_cond() which rewrites the conditional and edges
> >    automatically.
> >
> >    When legacy is removed, simplify_using_ranges will be further
> >    cleaned up, and there will only be one entry point into simplifying
> >    a statement.
> Sure.
>
> >
> > * DOM was setting global ranges determined from unreachable edges as a
> >    side-effect of using the evrp engine.  We must handle these cases
> >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> >    the snippet to DOM, but it could live anywhere else we do a DOM
> >    walk.
>   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> be surprised if too much code depended on that.  Presumably there's a
> testcase in the suite which we don't handle well if we don't let DOM
> refine/record global ranges?
>
> >
> >    For the record, this is the case *vrp handled:
> >
> >       <bb C>:
> >       ...
> >       if (c_5(D) != 5)
> >       goto <bb N>;
> >       else
> >       goto <bb M>;
> >       <bb N>:
> >       __builtin_unreachable ();
> >       <bb M>:
> >
> >    If M dominates all uses of c_5, we can set the global range of c_5
> >    to [5,5].
> And that will only happen if M eventually loops back to the conditional,
> right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> exceedingly rare in practice an it looks really familiar.  This is
> coming from a testcase, right?
>
> This is the only part of the patch that makes me go "ewwww", so I would
> like to at least explore if we can rethink the value of that test.

[the above is discussed down thread]

>
> > There is a a regression on Wstringop-overflow-4.C which I'm planning
> > on XFAILing.  It's another variant of the usual middle-end false
> > positives: having no ranges produces no warnings, but slightly refined
> > ranges, or worse-- isolating specific problematic cases in the
> > threader causes flare-ups.
> ACK.
>
>
> >
> > As an aside, as Richi has suggested, I think we should discuss
> > restricting the threader's ability to thread highly unlikely paths.
> > These cause no end of pain for middle-end warnings.  However,
> > I don't know if this would conflict with path isolation for
> > things like null dereferencing.  ISTR you were interested in this.
> I've never though too  much about it.  You're thinking of Richi :-)
>
> It's a balance.  I bet if we stop threading those paths we're going to
> see other issues start popping up like uninitialized warnings.
>
> It may be the case that it's really the constant propagation on an
> isolated path that's more problematical for those warnings.  But I would
> probably contend that what isolation is doing is showing how certain
> constant values can flow into places where they're not expected and
> could cause problems.  So I wouldn't necessary suggest throttling it.
>
> Happy to be proven wrong here!
>
>
> >
> > BTW, I think the Wstringop-overflow-4.C test is problematic and I've
> > attached my analysis.  Basically the regression is caused by a bad
> > interaction with the rounding/alignment that placement new has inlined
> > into the IL.  This happens for int16_r[] which the test is testing.
> > Ranger can glean some range info, which causes DOM threading to
> > isolate a path which causes a warning.
> Presumably this is triggering a warning at the new() call?

Yes.

Aldy
  
Aldy Hernandez April 29, 2022, 4:22 p.m. UTC | #6
On Fri, Apr 29, 2022 at 12:13 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > >
> > > >
> > > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > > > to the commit message.]
> > > > And I just dropped it into the tester.  We should have the vast majority
> > > > of targets tested by this time tomorrow.
> > > >
> > > > >
> > > > > [Despite the verbosity of the message, this is actually a pretty
> > > > > straightforward patch.  It should've gone in last cycle, but there
> > > > > was a nagging regression I couldn't get to until after stage1
> > > > > had closed.]
> > > > You had other, non-work/gcc priorities.  No worries at missing gcc-12.
> > > >
> > > >
> > > > >
> > > > > There are 3 uses of EVRP in DOM that must be converted.
> > > > > Unfortunately, they need to be converted in one go, so further
> > > > > splitting of this patch would be problematic.
> > > > ACK
> > > >
> > > >
> > > > > There's nothing here earth shattering.  It's all pretty obvious in
> > > > > retrospect, but I've added a short description of each use to aid in
> > > > > reviewing:
> > > > >
> > > > > * Convert evrp use in cprop to ranger.
> > > > >
> > > > >    This is easy, as cprop in DOM was converted to the ranger API last
> > > > >    cycle, so this is just a matter of using a ranger instead of an
> > > > >    evrp_range_analyzer.
> > > > Yea.  We may have covered this before, but what does ranger do with a
> > > > conditional equivalence?
> > > >
> > > > ie if we have something like
> > > >
> > > > if (a == b)
> > > >    {
> > > >      blah blah
> > > >    }
> > > >
> > > > Within the true arm we know there's an equivalence.  *But* we don't want
> > > > to just blindly replace a with b or vice-versa.  The equivalence is
> > > > primarily useful for simplification rather than propagation.
> > > >
> > > > In fact, we every much do not want to cprop in these cases.   It rarely
> > > > helps in a meaningful way and there's no good way to know which is the
> > > > better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> > > > due to SSA_NAME recycling.
> > > >
> > > >
> > > > >
> > > > > * Convert evrp use in threader to ranger.
> > > > >
> > > > >    The idea here is to use the hybrid approach we used for the initial
> > > > >    VRP threader conversion last cycle.  The DOM threader will continue
> > > > >    using the forward threader infrastructure while continuing to query
> > > > >    DOM data structures, and only if the conditional does not relsolve,
> > > > >    using the ranger.  This gives us the best of both worlds, and is a
> > > > >    proven approach.
> > > > >
> > > > >    Furthermore, as frange and prange come live in the next cycle, we
> > > > >    can move away from the forward threader altogether, and just add
> > > > >    another backward threader.  This will not only remove the last use
> > > > >    of the forward threader, but will allow us to remove at least 1 or 2
> > > > >    threader instances.
> > > > Excellent.
> > > >
> > > > >
> > > > > * Convert conditional folding to use the method used by the ranger and
> > > > >    evrp.  Previously DOM was calling into the guts of
> > > > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> > > > >    using fold_cond() which rewrites the conditional and edges
> > > > >    automatically.
> > > > >
> > > > >    When legacy is removed, simplify_using_ranges will be further
> > > > >    cleaned up, and there will only be one entry point into simplifying
> > > > >    a statement.
> > > > Sure.
> > > >
> > > > >
> > > > > * DOM was setting global ranges determined from unreachable edges as a
> > > > >    side-effect of using the evrp engine.  We must handle these cases
> > > > >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> > > > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > > > >    walk.
> > > >   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > > > be surprised if too much code depended on that.  Presumably there's a
> > > > testcase in the suite which we don't handle well if we don't let DOM
> > > > refine/record global ranges?
> > > >
> > > > >
> > > > >    For the record, this is the case *vrp handled:
> > > > >
> > > > >       <bb C>:
> > > > >       ...
> > > > >       if (c_5(D) != 5)
> > > > >       goto <bb N>;
> > > > >       else
> > > > >       goto <bb M>;
> > > > >       <bb N>:
> > > > >       __builtin_unreachable ();
> > > > >       <bb M>:
> > > > >
> > > > >    If M dominates all uses of c_5, we can set the global range of c_5
> > > > >    to [5,5].
> > > > And that will only happen if M eventually loops back to the conditional,
> > > > right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> > > > exceedingly rare in practice an it looks really familiar.  This is
> > > > coming from a testcase, right?
> > >
> > > This is how we make use of assert() when it uses __builtin_unreachable.
> > >
> > > Note the difficulty here is that we have to do this at a very specific point
> > > in the pass pipeline (setting the global range), since the first time we'll
> > > fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> > > of the assert() is gone.
> > >
> > > So the idea was that with the IL present value-range analysis can determine
> > > the range omf c_5(D) on the false path but once we remove it (and we do
> > > want to remove it) we want to put a global range on c_5(D).
> > >
> > > That means the best of both worlds would be to detect this pattern at
> > > a specific point in the pass pipeline (somewhen before loop opts for sure)
> > > and do both - remove the IL _and_ set the global range.  The fear of
> > > doing this too early is of course that we lose the range by copy propagation
> > > or similar transforms.  The fear of doing this change at a single place only
> > > is that we miss it because there's a stray use before the conditional.
> > >
> > > Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
> > >
> > > In the end we want to perform dead code elimination here so maybe
> > > DCE is the best fit to do this (but as said we want to avoid doing this too
> > > early).
> >
> > I agree that doing both would be the ideal scenario.  However, I would
> > prefer this be done in a follow-up patch to minimize the amount of
> > stuff being changed in one go.
> >
> > To provide additional background, this was being done in evrp which
> > runs at -O2, but was also being done at -O1 as a side-effect of DOM
> > using the evrp engine and exporting global ranges.  For -O2, I don't
> > know how much DOM was originally (prior to ranger) contributing to
> > these assert global ranges,  since evrp was probably picking them up
> > first.  However, since evrp now runs in ranger mode, this
> > functionality has silently moved to DOM (without us realizing) for all
> > compilation levels.
>
> I see.
>
> > Do we care about this functionality at -O1?  ISTR at least one -O1
> > testcase failing in RTL land if exporting global ranges isn't done in
> > DOM.  I don't remember if it was a global range due to a
> > __builtin_unreachable or otherwise.
>
> Well, at -O1 we definitely didn't set the global range (did we?) but
> eventually pruned paths running into __builtin_unreachable ()
> anyway (would have to check at which point).

I think we always have:

evrp_range_analyzer analyzer (/*update_global_ranges=*/true);

??

DOM has been running as an incognito evrp even at -O1.  I don't know
if this has been by design, or as an unintended side-effect.  I mean,
it has been visiting all conditionals in the IL as well as calling
evrp_range_analyzer::record_ranges_from_stmt() on each statement it
tries to optimize.  And of course, updating global ranges along the
way.

>
> >
> > One more thing, we have batted around the idea of running a fast evrp
> > even at -O1 (ranger without looking at back edges, IIRC).  This would
> > mean that when the DOM threader becomes disentangled from DOM (when we
> > have frange and prange), there's very little DOM will be doing that
> > for instance PRE can't get.  So perhaps, if we remove DOM, the place
> > to put this optimization is in the fast evrp, and fold the conditional
> > as has been suggested?  Anyways... I'm digressing.
>
> The idea is to get rid of DOM once its threading is separated out.

Excellent.  There's no reason why we couldn't separate threading out
in this release.

>
> I think doing O (n * log n) value range analysis at -O1 would be welcome.
> We didn't get around doing that for EVRP which I think had such
> bound - the iterating VRP definitely didn't.  I'm not sure ranger really
> qualifies given gori and all it relies on.  The advantage of the simple
> whole-IL walk way was to reasonably easily guarantee such bound.
> That's unfortunately lost now :/

Andrew was mumbling something about a fast ranger mode for this
release that should be on par with legacy evrp.  IIRC it would be
purely DOM based, won't visit back edges, and there's no caching.  But
he'll have to expand on it when he returns from vacation.  I don't
know the details.

>
> > Doing this optimization in DOM at least keeps with GCC 12, but I'm
> > happy to move it to whatever is recommended.
>
> As said the intent of this trick is to preserve range information from
> asserts even when we removed the IL carrying it.  When we care
> about range info, which means when some VRP pass runs, which
> is why it was done in such a pass.  That is, we probably want to
> avoid altering the global range for -fno-tree-vrp.

Not just -fno-tree-vrp then.  There are other range consumers.  For
example, CCP, threading, etc.  Threading will look at global ranges
even in dumb mode, and CCP uses global ranges in a round about way--
by looking at global nonzero bits which the evrp code diligently sets:

        {
          set_ssa_range_info (vrs[i].first, vrs[i].second);
          maybe_set_nonzero_bits (pred_e, vrs[i].first);
        }

For example, in this particular testcase, DOM will set the global
range, and CCP will remove at least one of the conditionals by looking
at nonzero bits.

>
> > The test case is gcc.dg/builtin-unreachable-6a.c by the way.
>
> But 6.c checks the same but even w/ DOM disabled...

That's what I thought, but they actually test the opposite:

/* { dg-final { scan-tree-dump-times "__builtin_unreachable" 1 "fab1" } } */
/* { dg-final { scan-tree-dump-not "__builtin_unreachable" "fab1" } } */

The non-DOM, non-VRP, version checks that the unreachable is still
there.  Whereas the plain -O2 version checks that we've cleaned it up.

>
> > >
> > > > This is the only part of the patch that makes me go "ewwww", so I would
> > > > like to at least explore if we can rethink the value of that test.
> > > >
> > > > > There is a a regression on Wstringop-overflow-4.C which I'm planning
> > > > > on XFAILing.  It's another variant of the usual middle-end false
> > > > > positives: having no ranges produces no warnings, but slightly refined
> > > > > ranges, or worse-- isolating specific problematic cases in the
> > > > > threader causes flare-ups.
> > > > ACK.
> > > >
> > > >
> > > > >
> > > > > As an aside, as Richi has suggested, I think we should discuss
> > > > > restricting the threader's ability to thread highly unlikely paths.
> > > > > These cause no end of pain for middle-end warnings.  However,
> > > > > I don't know if this would conflict with path isolation for
> > > > > things like null dereferencing.  ISTR you were interested in this.
> > > > I've never though too  much about it.  You're thinking of Richi :-)
> > > >
> > > > It's a balance.  I bet if we stop threading those paths we're going to
> > > > see other issues start popping up like uninitialized warnings.
> > > >
> > > > It may be the case that it's really the constant propagation on an
> > > > isolated path that's more problematical for those warnings.  But I would
> > > > probably contend that what isolation is doing is showing how certain
> > > > constant values can flow into places where they're not expected and
> > > > could cause problems.  So I wouldn't necessary suggest throttling it.
> > > >
> > > > Happy to be proven wrong here!
> > >
> > > I think we need to revisit the whole costing of threading which is of course
> > > best done when we only have the backwards threader left.  I also always
> > > wanted to play with some kind of optimistic code generation & rollback
> > > on things like threading and unrolling, basically value-number the
> > > duplicated stmts on-the-fly and make it "stop" when we reach a defined
> > > threshold (and then throw away the result).  I never went around to
> > > do this for unrolling since there we can end up duplicating diverging
> > > control flow, but that at least doesn't happen with threading(?)
> >
> > WRT stopping after a threshold is reached,
> > back_threader_profitablity::profitable_path_p() bails after a number
> > of statements is reached.  If so, the path is not even put into the
> > queue.  So, no rolling back would be necessary.  Is this what you
> > meant, or something else?  The threaders (all variants) queue things
> > up and modify the IL at the end of the pass.
>
> I meant that as currently we have no idea about followup simplifications
> on the isolated path we conservatively account for all stmts in there
> and thus need an optimistically high threading --param.  We should
> be able to do better by simplifying copied stmts with the some now
> constant PHIs or conditionally derived values.  So we could do
> "unbound" analysis but when we instantiate stop the actual copying
> if simplification doesn't keep us within the desired bounds.
> For example unrolling tries to estimate what's going to be optimized
> away but if we actually did the simplification that would be more precise.

Very cool!

Aldy

>
> > >
> > > And just a note - please wait with installing this change - if approved - until
> > > after GCC 12 is actually released.
> >
> > Indeed.  This is far too invasive for the interim.  I just sent the
> > patch early so Jeff could get a chance to review it before I leave on
> > paternity leave.  But he's also volunteered to kick it over the goal
> > line if we can't get it in before May 9.
> >
> > Aldy
> >
>
  
Richard Biener May 2, 2022, 6:30 a.m. UTC | #7
On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Fri, Apr 29, 2022 at 12:13 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > >
> > > On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > > > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > > > > to the commit message.]
> > > > > And I just dropped it into the tester.  We should have the vast majority
> > > > > of targets tested by this time tomorrow.
> > > > >
> > > > > >
> > > > > > [Despite the verbosity of the message, this is actually a pretty
> > > > > > straightforward patch.  It should've gone in last cycle, but there
> > > > > > was a nagging regression I couldn't get to until after stage1
> > > > > > had closed.]
> > > > > You had other, non-work/gcc priorities.  No worries at missing gcc-12.
> > > > >
> > > > >
> > > > > >
> > > > > > There are 3 uses of EVRP in DOM that must be converted.
> > > > > > Unfortunately, they need to be converted in one go, so further
> > > > > > splitting of this patch would be problematic.
> > > > > ACK
> > > > >
> > > > >
> > > > > > There's nothing here earth shattering.  It's all pretty obvious in
> > > > > > retrospect, but I've added a short description of each use to aid in
> > > > > > reviewing:
> > > > > >
> > > > > > * Convert evrp use in cprop to ranger.
> > > > > >
> > > > > >    This is easy, as cprop in DOM was converted to the ranger API last
> > > > > >    cycle, so this is just a matter of using a ranger instead of an
> > > > > >    evrp_range_analyzer.
> > > > > Yea.  We may have covered this before, but what does ranger do with a
> > > > > conditional equivalence?
> > > > >
> > > > > ie if we have something like
> > > > >
> > > > > if (a == b)
> > > > >    {
> > > > >      blah blah
> > > > >    }
> > > > >
> > > > > Within the true arm we know there's an equivalence.  *But* we don't want
> > > > > to just blindly replace a with b or vice-versa.  The equivalence is
> > > > > primarily useful for simplification rather than propagation.
> > > > >
> > > > > In fact, we every much do not want to cprop in these cases.   It rarely
> > > > > helps in a meaningful way and there's no good way to know which is the
> > > > > better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> > > > > due to SSA_NAME recycling.
> > > > >
> > > > >
> > > > > >
> > > > > > * Convert evrp use in threader to ranger.
> > > > > >
> > > > > >    The idea here is to use the hybrid approach we used for the initial
> > > > > >    VRP threader conversion last cycle.  The DOM threader will continue
> > > > > >    using the forward threader infrastructure while continuing to query
> > > > > >    DOM data structures, and only if the conditional does not relsolve,
> > > > > >    using the ranger.  This gives us the best of both worlds, and is a
> > > > > >    proven approach.
> > > > > >
> > > > > >    Furthermore, as frange and prange come live in the next cycle, we
> > > > > >    can move away from the forward threader altogether, and just add
> > > > > >    another backward threader.  This will not only remove the last use
> > > > > >    of the forward threader, but will allow us to remove at least 1 or 2
> > > > > >    threader instances.
> > > > > Excellent.
> > > > >
> > > > > >
> > > > > > * Convert conditional folding to use the method used by the ranger and
> > > > > >    evrp.  Previously DOM was calling into the guts of
> > > > > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> > > > > >    using fold_cond() which rewrites the conditional and edges
> > > > > >    automatically.
> > > > > >
> > > > > >    When legacy is removed, simplify_using_ranges will be further
> > > > > >    cleaned up, and there will only be one entry point into simplifying
> > > > > >    a statement.
> > > > > Sure.
> > > > >
> > > > > >
> > > > > > * DOM was setting global ranges determined from unreachable edges as a
> > > > > >    side-effect of using the evrp engine.  We must handle these cases
> > > > > >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> > > > > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > > > > >    walk.
> > > > >   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > > > > be surprised if too much code depended on that.  Presumably there's a
> > > > > testcase in the suite which we don't handle well if we don't let DOM
> > > > > refine/record global ranges?
> > > > >
> > > > > >
> > > > > >    For the record, this is the case *vrp handled:
> > > > > >
> > > > > >       <bb C>:
> > > > > >       ...
> > > > > >       if (c_5(D) != 5)
> > > > > >       goto <bb N>;
> > > > > >       else
> > > > > >       goto <bb M>;
> > > > > >       <bb N>:
> > > > > >       __builtin_unreachable ();
> > > > > >       <bb M>:
> > > > > >
> > > > > >    If M dominates all uses of c_5, we can set the global range of c_5
> > > > > >    to [5,5].
> > > > > And that will only happen if M eventually loops back to the conditional,
> > > > > right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> > > > > exceedingly rare in practice an it looks really familiar.  This is
> > > > > coming from a testcase, right?
> > > >
> > > > This is how we make use of assert() when it uses __builtin_unreachable.
> > > >
> > > > Note the difficulty here is that we have to do this at a very specific point
> > > > in the pass pipeline (setting the global range), since the first time we'll
> > > > fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> > > > of the assert() is gone.
> > > >
> > > > So the idea was that with the IL present value-range analysis can determine
> > > > the range omf c_5(D) on the false path but once we remove it (and we do
> > > > want to remove it) we want to put a global range on c_5(D).
> > > >
> > > > That means the best of both worlds would be to detect this pattern at
> > > > a specific point in the pass pipeline (somewhen before loop opts for sure)
> > > > and do both - remove the IL _and_ set the global range.  The fear of
> > > > doing this too early is of course that we lose the range by copy propagation
> > > > or similar transforms.  The fear of doing this change at a single place only
> > > > is that we miss it because there's a stray use before the conditional.
> > > >
> > > > Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
> > > >
> > > > In the end we want to perform dead code elimination here so maybe
> > > > DCE is the best fit to do this (but as said we want to avoid doing this too
> > > > early).
> > >
> > > I agree that doing both would be the ideal scenario.  However, I would
> > > prefer this be done in a follow-up patch to minimize the amount of
> > > stuff being changed in one go.
> > >
> > > To provide additional background, this was being done in evrp which
> > > runs at -O2, but was also being done at -O1 as a side-effect of DOM
> > > using the evrp engine and exporting global ranges.  For -O2, I don't
> > > know how much DOM was originally (prior to ranger) contributing to
> > > these assert global ranges,  since evrp was probably picking them up
> > > first.  However, since evrp now runs in ranger mode, this
> > > functionality has silently moved to DOM (without us realizing) for all
> > > compilation levels.
> >
> > I see.
> >
> > > Do we care about this functionality at -O1?  ISTR at least one -O1
> > > testcase failing in RTL land if exporting global ranges isn't done in
> > > DOM.  I don't remember if it was a global range due to a
> > > __builtin_unreachable or otherwise.
> >
> > Well, at -O1 we definitely didn't set the global range (did we?) but
> > eventually pruned paths running into __builtin_unreachable ()
> > anyway (would have to check at which point).
>
> I think we always have:
>
> evrp_range_analyzer analyzer (/*update_global_ranges=*/true);
>
> ??
>
> DOM has been running as an incognito evrp even at -O1.  I don't know
> if this has been by design, or as an unintended side-effect.  I mean,
> it has been visiting all conditionals in the IL as well as calling
> evrp_range_analyzer::record_ranges_from_stmt() on each statement it
> tries to optimize.  And of course, updating global ranges along the
> way.

As said, it wasn't intended to be that way originally.  But yes, as we
separated out evrp_range_analyzer and made use of it in places not
guarded by -ftree-vrp this might have happened.

We might want to reconsider (based on now long-standing facts),
and maybe alter the -ftree-vrp documentation indicating that even
when not enabled effecive value-range propagation and optimization
might happen.

> >
> > >
> > > One more thing, we have batted around the idea of running a fast evrp
> > > even at -O1 (ranger without looking at back edges, IIRC).  This would
> > > mean that when the DOM threader becomes disentangled from DOM (when we
> > > have frange and prange), there's very little DOM will be doing that
> > > for instance PRE can't get.  So perhaps, if we remove DOM, the place
> > > to put this optimization is in the fast evrp, and fold the conditional
> > > as has been suggested?  Anyways... I'm digressing.
> >
> > The idea is to get rid of DOM once its threading is separated out.
>
> Excellent.  There's no reason why we couldn't separate threading out
> in this release.
>
> >
> > I think doing O (n * log n) value range analysis at -O1 would be welcome.
> > We didn't get around doing that for EVRP which I think had such
> > bound - the iterating VRP definitely didn't.  I'm not sure ranger really
> > qualifies given gori and all it relies on.  The advantage of the simple
> > whole-IL walk way was to reasonably easily guarantee such bound.
> > That's unfortunately lost now :/
>
> Andrew was mumbling something about a fast ranger mode for this
> release that should be on par with legacy evrp.  IIRC it would be
> purely DOM based, won't visit back edges, and there's no caching.  But
> he'll have to expand on it when he returns from vacation.  I don't
> know the details.

I would guess the stmt analysis building blocks (whatever API part of
ranger that is) can be used to produce something like that.  But in the
end it would be the old EVRP pass with the VRP stmt analysis it
re-used replaced with the appropriate ranger parts.

But yes, I'd welcome that.  I'd also like to revisit integration of some
of this with value-numbering which doesn't do a DOM walk but instead
a RPO walk.

> >
> > > Doing this optimization in DOM at least keeps with GCC 12, but I'm
> > > happy to move it to whatever is recommended.
> >
> > As said the intent of this trick is to preserve range information from
> > asserts even when we removed the IL carrying it.  When we care
> > about range info, which means when some VRP pass runs, which
> > is why it was done in such a pass.  That is, we probably want to
> > avoid altering the global range for -fno-tree-vrp.
>
> Not just -fno-tree-vrp then.  There are other range consumers.  For
> example, CCP, threading, etc.  Threading will look at global ranges
> even in dumb mode, and CCP uses global ranges in a round about way--
> by looking at global nonzero bits which the evrp code diligently sets:
>
>         {
>           set_ssa_range_info (vrs[i].first, vrs[i].second);
>           maybe_set_nonzero_bits (pred_e, vrs[i].first);
>         }
>
> For example, in this particular testcase, DOM will set the global
> range, and CCP will remove at least one of the conditionals by looking
> at nonzero bits.

Yes, as said we might want to reconsider and adjust documentation.
In theory we could gate the global range/nonzero setters on
flag_tree_vrp so they become no-ops when not enabled
(there is also flag_tree_bit_ccp which is documented to track
alignment, it fails to mention non-zero bits on integers).

But let's say the plan is to more conciously enable some range
propagation at -O1.

> > > The test case is gcc.dg/builtin-unreachable-6a.c by the way.
> >
> > But 6.c checks the same but even w/ DOM disabled...
>
> That's what I thought, but they actually test the opposite:
>
> /* { dg-final { scan-tree-dump-times "__builtin_unreachable" 1 "fab1" } } */
> /* { dg-final { scan-tree-dump-not "__builtin_unreachable" "fab1" } } */
>
> The non-DOM, non-VRP, version checks that the unreachable is still
> there.  Whereas the plain -O2 version checks that we've cleaned it up.

Oh ... I guess we should ensure that we either have the global range set
or the unreachable is still there.

> > > >
> > > > > This is the only part of the patch that makes me go "ewwww", so I would
> > > > > like to at least explore if we can rethink the value of that test.
> > > > >
> > > > > > There is a a regression on Wstringop-overflow-4.C which I'm planning
> > > > > > on XFAILing.  It's another variant of the usual middle-end false
> > > > > > positives: having no ranges produces no warnings, but slightly refined
> > > > > > ranges, or worse-- isolating specific problematic cases in the
> > > > > > threader causes flare-ups.
> > > > > ACK.
> > > > >
> > > > >
> > > > > >
> > > > > > As an aside, as Richi has suggested, I think we should discuss
> > > > > > restricting the threader's ability to thread highly unlikely paths.
> > > > > > These cause no end of pain for middle-end warnings.  However,
> > > > > > I don't know if this would conflict with path isolation for
> > > > > > things like null dereferencing.  ISTR you were interested in this.
> > > > > I've never though too  much about it.  You're thinking of Richi :-)
> > > > >
> > > > > It's a balance.  I bet if we stop threading those paths we're going to
> > > > > see other issues start popping up like uninitialized warnings.
> > > > >
> > > > > It may be the case that it's really the constant propagation on an
> > > > > isolated path that's more problematical for those warnings.  But I would
> > > > > probably contend that what isolation is doing is showing how certain
> > > > > constant values can flow into places where they're not expected and
> > > > > could cause problems.  So I wouldn't necessary suggest throttling it.
> > > > >
> > > > > Happy to be proven wrong here!
> > > >
> > > > I think we need to revisit the whole costing of threading which is of course
> > > > best done when we only have the backwards threader left.  I also always
> > > > wanted to play with some kind of optimistic code generation & rollback
> > > > on things like threading and unrolling, basically value-number the
> > > > duplicated stmts on-the-fly and make it "stop" when we reach a defined
> > > > threshold (and then throw away the result).  I never went around to
> > > > do this for unrolling since there we can end up duplicating diverging
> > > > control flow, but that at least doesn't happen with threading(?)
> > >
> > > WRT stopping after a threshold is reached,
> > > back_threader_profitablity::profitable_path_p() bails after a number
> > > of statements is reached.  If so, the path is not even put into the
> > > queue.  So, no rolling back would be necessary.  Is this what you
> > > meant, or something else?  The threaders (all variants) queue things
> > > up and modify the IL at the end of the pass.
> >
> > I meant that as currently we have no idea about followup simplifications
> > on the isolated path we conservatively account for all stmts in there
> > and thus need an optimistically high threading --param.  We should
> > be able to do better by simplifying copied stmts with the some now
> > constant PHIs or conditionally derived values.  So we could do
> > "unbound" analysis but when we instantiate stop the actual copying
> > if simplification doesn't keep us within the desired bounds.
> > For example unrolling tries to estimate what's going to be optimized
> > away but if we actually did the simplification that would be more precise.
>
> Very cool!

Yeah, if I only get to implement it ;)

Richard.

> Aldy
>
> >
> > > >
> > > > And just a note - please wait with installing this change - if approved - until
> > > > after GCC 12 is actually released.
> > >
> > > Indeed.  This is far too invasive for the interim.  I just sent the
> > > patch early so Jeff could get a chance to review it before I leave on
> > > paternity leave.  But he's also volunteered to kick it over the goal
> > > line if we can't get it in before May 9.
> > >
> > > Aldy
> > >
> >
>
  
Aldy Hernandez May 2, 2022, 7:17 a.m. UTC | #8
On Mon, May 2, 2022 at 8:30 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Fri, Apr 29, 2022 at 12:13 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > >
> > > > On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > > > > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > > > > > to the commit message.]
> > > > > > And I just dropped it into the tester.  We should have the vast majority
> > > > > > of targets tested by this time tomorrow.
> > > > > >
> > > > > > >
> > > > > > > [Despite the verbosity of the message, this is actually a pretty
> > > > > > > straightforward patch.  It should've gone in last cycle, but there
> > > > > > > was a nagging regression I couldn't get to until after stage1
> > > > > > > had closed.]
> > > > > > You had other, non-work/gcc priorities.  No worries at missing gcc-12.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > There are 3 uses of EVRP in DOM that must be converted.
> > > > > > > Unfortunately, they need to be converted in one go, so further
> > > > > > > splitting of this patch would be problematic.
> > > > > > ACK
> > > > > >
> > > > > >
> > > > > > > There's nothing here earth shattering.  It's all pretty obvious in
> > > > > > > retrospect, but I've added a short description of each use to aid in
> > > > > > > reviewing:
> > > > > > >
> > > > > > > * Convert evrp use in cprop to ranger.
> > > > > > >
> > > > > > >    This is easy, as cprop in DOM was converted to the ranger API last
> > > > > > >    cycle, so this is just a matter of using a ranger instead of an
> > > > > > >    evrp_range_analyzer.
> > > > > > Yea.  We may have covered this before, but what does ranger do with a
> > > > > > conditional equivalence?
> > > > > >
> > > > > > ie if we have something like
> > > > > >
> > > > > > if (a == b)
> > > > > >    {
> > > > > >      blah blah
> > > > > >    }
> > > > > >
> > > > > > Within the true arm we know there's an equivalence.  *But* we don't want
> > > > > > to just blindly replace a with b or vice-versa.  The equivalence is
> > > > > > primarily useful for simplification rather than propagation.
> > > > > >
> > > > > > In fact, we every much do not want to cprop in these cases.   It rarely
> > > > > > helps in a meaningful way and there's no good way to know which is the
> > > > > > better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> > > > > > due to SSA_NAME recycling.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > * Convert evrp use in threader to ranger.
> > > > > > >
> > > > > > >    The idea here is to use the hybrid approach we used for the initial
> > > > > > >    VRP threader conversion last cycle.  The DOM threader will continue
> > > > > > >    using the forward threader infrastructure while continuing to query
> > > > > > >    DOM data structures, and only if the conditional does not relsolve,
> > > > > > >    using the ranger.  This gives us the best of both worlds, and is a
> > > > > > >    proven approach.
> > > > > > >
> > > > > > >    Furthermore, as frange and prange come live in the next cycle, we
> > > > > > >    can move away from the forward threader altogether, and just add
> > > > > > >    another backward threader.  This will not only remove the last use
> > > > > > >    of the forward threader, but will allow us to remove at least 1 or 2
> > > > > > >    threader instances.
> > > > > > Excellent.
> > > > > >
> > > > > > >
> > > > > > > * Convert conditional folding to use the method used by the ranger and
> > > > > > >    evrp.  Previously DOM was calling into the guts of
> > > > > > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> > > > > > >    using fold_cond() which rewrites the conditional and edges
> > > > > > >    automatically.
> > > > > > >
> > > > > > >    When legacy is removed, simplify_using_ranges will be further
> > > > > > >    cleaned up, and there will only be one entry point into simplifying
> > > > > > >    a statement.
> > > > > > Sure.
> > > > > >
> > > > > > >
> > > > > > > * DOM was setting global ranges determined from unreachable edges as a
> > > > > > >    side-effect of using the evrp engine.  We must handle these cases
> > > > > > >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> > > > > > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > > > > > >    walk.
> > > > > >   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > > > > > be surprised if too much code depended on that.  Presumably there's a
> > > > > > testcase in the suite which we don't handle well if we don't let DOM
> > > > > > refine/record global ranges?
> > > > > >
> > > > > > >
> > > > > > >    For the record, this is the case *vrp handled:
> > > > > > >
> > > > > > >       <bb C>:
> > > > > > >       ...
> > > > > > >       if (c_5(D) != 5)
> > > > > > >       goto <bb N>;
> > > > > > >       else
> > > > > > >       goto <bb M>;
> > > > > > >       <bb N>:
> > > > > > >       __builtin_unreachable ();
> > > > > > >       <bb M>:
> > > > > > >
> > > > > > >    If M dominates all uses of c_5, we can set the global range of c_5
> > > > > > >    to [5,5].
> > > > > > And that will only happen if M eventually loops back to the conditional,
> > > > > > right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> > > > > > exceedingly rare in practice an it looks really familiar.  This is
> > > > > > coming from a testcase, right?
> > > > >
> > > > > This is how we make use of assert() when it uses __builtin_unreachable.
> > > > >
> > > > > Note the difficulty here is that we have to do this at a very specific point
> > > > > in the pass pipeline (setting the global range), since the first time we'll
> > > > > fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> > > > > of the assert() is gone.
> > > > >
> > > > > So the idea was that with the IL present value-range analysis can determine
> > > > > the range omf c_5(D) on the false path but once we remove it (and we do
> > > > > want to remove it) we want to put a global range on c_5(D).
> > > > >
> > > > > That means the best of both worlds would be to detect this pattern at
> > > > > a specific point in the pass pipeline (somewhen before loop opts for sure)
> > > > > and do both - remove the IL _and_ set the global range.  The fear of
> > > > > doing this too early is of course that we lose the range by copy propagation
> > > > > or similar transforms.  The fear of doing this change at a single place only
> > > > > is that we miss it because there's a stray use before the conditional.
> > > > >
> > > > > Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
> > > > >
> > > > > In the end we want to perform dead code elimination here so maybe
> > > > > DCE is the best fit to do this (but as said we want to avoid doing this too
> > > > > early).
> > > >
> > > > I agree that doing both would be the ideal scenario.  However, I would
> > > > prefer this be done in a follow-up patch to minimize the amount of
> > > > stuff being changed in one go.
> > > >
> > > > To provide additional background, this was being done in evrp which
> > > > runs at -O2, but was also being done at -O1 as a side-effect of DOM
> > > > using the evrp engine and exporting global ranges.  For -O2, I don't
> > > > know how much DOM was originally (prior to ranger) contributing to
> > > > these assert global ranges,  since evrp was probably picking them up
> > > > first.  However, since evrp now runs in ranger mode, this
> > > > functionality has silently moved to DOM (without us realizing) for all
> > > > compilation levels.
> > >
> > > I see.
> > >
> > > > Do we care about this functionality at -O1?  ISTR at least one -O1
> > > > testcase failing in RTL land if exporting global ranges isn't done in
> > > > DOM.  I don't remember if it was a global range due to a
> > > > __builtin_unreachable or otherwise.
> > >
> > > Well, at -O1 we definitely didn't set the global range (did we?) but
> > > eventually pruned paths running into __builtin_unreachable ()
> > > anyway (would have to check at which point).
> >
> > I think we always have:
> >
> > evrp_range_analyzer analyzer (/*update_global_ranges=*/true);
> >
> > ??
> >
> > DOM has been running as an incognito evrp even at -O1.  I don't know
> > if this has been by design, or as an unintended side-effect.  I mean,
> > it has been visiting all conditionals in the IL as well as calling
> > evrp_range_analyzer::record_ranges_from_stmt() on each statement it
> > tries to optimize.  And of course, updating global ranges along the
> > way.
>
> As said, it wasn't intended to be that way originally.  But yes, as we
> separated out evrp_range_analyzer and made use of it in places not
> guarded by -ftree-vrp this might have happened.
>
> We might want to reconsider (based on now long-standing facts),
> and maybe alter the -ftree-vrp documentation indicating that even
> when not enabled effecive value-range propagation and optimization
> might happen.
>
> > >
> > > >
> > > > One more thing, we have batted around the idea of running a fast evrp
> > > > even at -O1 (ranger without looking at back edges, IIRC).  This would
> > > > mean that when the DOM threader becomes disentangled from DOM (when we
> > > > have frange and prange), there's very little DOM will be doing that
> > > > for instance PRE can't get.  So perhaps, if we remove DOM, the place
> > > > to put this optimization is in the fast evrp, and fold the conditional
> > > > as has been suggested?  Anyways... I'm digressing.
> > >
> > > The idea is to get rid of DOM once its threading is separated out.
> >
> > Excellent.  There's no reason why we couldn't separate threading out
> > in this release.
> >
> > >
> > > I think doing O (n * log n) value range analysis at -O1 would be welcome.
> > > We didn't get around doing that for EVRP which I think had such
> > > bound - the iterating VRP definitely didn't.  I'm not sure ranger really
> > > qualifies given gori and all it relies on.  The advantage of the simple
> > > whole-IL walk way was to reasonably easily guarantee such bound.
> > > That's unfortunately lost now :/
> >
> > Andrew was mumbling something about a fast ranger mode for this
> > release that should be on par with legacy evrp.  IIRC it would be
> > purely DOM based, won't visit back edges, and there's no caching.  But
> > he'll have to expand on it when he returns from vacation.  I don't
> > know the details.
>
> I would guess the stmt analysis building blocks (whatever API part of
> ranger that is) can be used to produce something like that.  But in the
> end it would be the old EVRP pass with the VRP stmt analysis it
> re-used replaced with the appropriate ranger parts.
>
> But yes, I'd welcome that.  I'd also like to revisit integration of some
> of this with value-numbering which doesn't do a DOM walk but instead
> a RPO walk.
>
> > >
> > > > Doing this optimization in DOM at least keeps with GCC 12, but I'm
> > > > happy to move it to whatever is recommended.
> > >
> > > As said the intent of this trick is to preserve range information from
> > > asserts even when we removed the IL carrying it.  When we care
> > > about range info, which means when some VRP pass runs, which
> > > is why it was done in such a pass.  That is, we probably want to
> > > avoid altering the global range for -fno-tree-vrp.
> >
> > Not just -fno-tree-vrp then.  There are other range consumers.  For
> > example, CCP, threading, etc.  Threading will look at global ranges
> > even in dumb mode, and CCP uses global ranges in a round about way--
> > by looking at global nonzero bits which the evrp code diligently sets:
> >
> >         {
> >           set_ssa_range_info (vrs[i].first, vrs[i].second);
> >           maybe_set_nonzero_bits (pred_e, vrs[i].first);
> >         }
> >
> > For example, in this particular testcase, DOM will set the global
> > range, and CCP will remove at least one of the conditionals by looking
> > at nonzero bits.
>
> Yes, as said we might want to reconsider and adjust documentation.
> In theory we could gate the global range/nonzero setters on
> flag_tree_vrp so they become no-ops when not enabled
> (there is also flag_tree_bit_ccp which is documented to track
> alignment, it fails to mention non-zero bits on integers).

Well, we could just not update global ranges for !flag_tree_vrp.  In
the old world:

evrp_range_analyzer (/*update_global_ranges=*/ flag_tree_vrp)

and in the new world:

if (flag_tree_vrp)
  ranger->export_global_ranges().

This should fit in nicely with my plan to integrate nonzero bits into
the irange, because we won't have to worry about them independently

>
> But let's say the plan is to more conciously enable some range
> propagation at -O1.

Sounds like a plan.

Aldy
  
Andrew MacLeod May 9, 2022, 12:42 p.m. UTC | #9
On 5/2/22 02:30, Richard Biener wrote:
> On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>>
>> Andrew was mumbling something about a fast ranger mode for this
>> release that should be on par with legacy evrp.  IIRC it would be
>> purely DOM based, won't visit back edges, and there's no caching.  But
>> he'll have to expand on it when he returns from vacation.  I don't
>> know the details.
> I would guess the stmt analysis building blocks (whatever API part of
> ranger that is) can be used to produce something like that.  But in the
> end it would be the old EVRP pass with the VRP stmt analysis it
> re-used replaced with the appropriate ranger parts.
>
> But yes, I'd welcome that.  I'd also like to revisit integration of some
> of this with value-numbering which doesn't do a DOM walk but instead
> a RPO walk.


Yes, it will bear numerous similarities to the old EVRP mechanism 
approach.  It'd would use a similar "current value" vector, just using 
multi-ranges and wired into the range-ops/gori mechanism for calculating 
outgoing ranges on outgoing/incoming edges.  Its next on my list after I 
get a few outstanding things in.

An RPO walk should be trivial to work with as well.  Is there generic 
infrastructure for RPO like there is for DOM walks, or is it more wired 
into VN?  Regardless, we should be able to set it up to work from anywhere.

Andrew
  
Richard Biener May 9, 2022, 12:55 p.m. UTC | #10
On Mon, May 9, 2022 at 2:42 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 5/2/22 02:30, Richard Biener wrote:
> > On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >>
> >> Andrew was mumbling something about a fast ranger mode for this
> >> release that should be on par with legacy evrp.  IIRC it would be
> >> purely DOM based, won't visit back edges, and there's no caching.  But
> >> he'll have to expand on it when he returns from vacation.  I don't
> >> know the details.
> > I would guess the stmt analysis building blocks (whatever API part of
> > ranger that is) can be used to produce something like that.  But in the
> > end it would be the old EVRP pass with the VRP stmt analysis it
> > re-used replaced with the appropriate ranger parts.
> >
> > But yes, I'd welcome that.  I'd also like to revisit integration of some
> > of this with value-numbering which doesn't do a DOM walk but instead
> > a RPO walk.
>
>
> Yes, it will bear numerous similarities to the old EVRP mechanism
> approach.  It'd would use a similar "current value" vector, just using
> multi-ranges and wired into the range-ops/gori mechanism for calculating
> outgoing ranges on outgoing/incoming edges.  Its next on my list after I
> get a few outstanding things in.
>
> An RPO walk should be trivial to work with as well.  Is there generic
> infrastructure for RPO like there is for DOM walks, or is it more wired
> into VN?  Regardless, we should be able to set it up to work from anywhere.

It's currently wired into VN.  But once I see the domwalk implementation I can
trivially adapt that I think.  There's two interesting cases, the iterating and
the non-iterating VN which switch around tree-ssa-sccvn.cc:8001.  Currently
all context sensitive information is kept but tagged rather than unwound
like a domwalk would do in its after_dom_children hook.  That's a) a bit ugly
(and costly) and b) for relations it's too simple (what VN currently does).  So
my hope was to throw that out and replace it with something else - there's
vn_avail which is what models the DOM relationship.  I'll have to think about
what we need for relations (for ranges that vn_avail should do).

Richard.

> Andrew
>
>
  

Patch

diff --git a/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C b/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
index c80977d3043..eb4801918fc 100644
--- a/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
+++ b/gcc/testsuite/g++.dg/warn/Wstringop-overflow-4.C
@@ -161,6 +161,40 @@  void test_strcpy_new_int16_t (size_t n, const size_t vals[])
 
   ptrdiff_t r_dmin_dmax = SR (DIFF_MIN, DIFF_MAX);
   T (S (1), new int16_t[r_dmin_dmax]);
+  /* ?? The above SR(DIFF_MIN, DIFF_MAX) implies r_dmin_dmax can be
+     the entire domain, including negative numbers because ptrdiff_t
+     is a signed entity.
+
+     This causes a warning in the following line after the
+     DOM/threader changes for C++98.
+
+      <bb 2> [local count: 1073741824]:
+      _26 ={v} signed_value_source;                      ;; could be -1
+      r_dmin_dmax.1_8 = (sizetype) _26;
+      if (r_dmin_dmax.1_8 <= 4611686018427387900)        ;; placement new rounding
+        goto <bb 3>; [50.00%]
+      else
+        goto <bb 9>; [50.00%]
+
+      ...
+      ...
+
+      <bb 9> [local count: 536870912]:
+      # iftmp.0_39 = PHI <18446744073709551615(2)>
+      _41 = operator new [] (iftmp.0_39);
+      __builtin_memcpy (_41, "z", 2);
+      sink (_41);
+      _44 = _26 + 1;					;; _44 = 0
+      _45 = (sizetype) _44;				;; _45 = 0
+      if (_45 <= 4611686018427387900)
+        goto <bb 8>; [0.00%]
+      else
+        goto <bb 7>; [100.00%]
+
+      <bb 8> [local count: 0]:
+      iftmp.2_33 = _45 * 2;				;; iftmp.2_33 = 0
+      _34 = operator new [] (iftmp.2_33);		;; new [] (0)
+  */
   T (S (2), new int16_t[r_dmin_dmax + 1]);
   T (S (9), new int16_t[r_dmin_dmax * 2 + 1]);
 }
diff --git a/gcc/testsuite/gcc.dg/sancov/cmp0.c b/gcc/testsuite/gcc.dg/sancov/cmp0.c
index 8bbf06e9466..9fd7f5cccc8 100644
--- a/gcc/testsuite/gcc.dg/sancov/cmp0.c
+++ b/gcc/testsuite/gcc.dg/sancov/cmp0.c
@@ -1,6 +1,6 @@ 
 /* Basic test on number of inserted callbacks.  */
 /* { dg-do compile } */
-/* { dg-options "-fsanitize-coverage=trace-cmp -fdump-tree-optimized" } */
+/* { dg-options "-fsanitize-coverage=trace-cmp -fdump-tree-optimized -fno-thread-jumps" } */
 /* { dg-skip-if "different type layout" { avr-*-* } } */
 
 #if __SIZEOF_INT__ < 4
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c
index fae5bdef818..ede3274f59d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c
@@ -19,10 +19,9 @@  try_combine (rtx i1, rtx newpat)
   else if (i1 && foo ());
 }
 
-/* There should be four tests against i1.  One from the hash table
-   dumps, one from the EVRP analyzer one from EVRP evaluation and one
+/* There should be 3 tests against i1.  Two from DOM machinery and one
    in the code itself.  */
-/* { dg-final { scan-tree-dump-times "if .i1_" 4 "dom2"} } */
+/* { dg-final { scan-tree-dump-times "if .i1_" 3 "dom2"} } */
 
 /* There should be no actual jump threads realized by DOM.  The
    legitimize jump threads are handled in VRP and those discovered
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index b64e71dae22..aa06db5e223 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -11,7 +11,7 @@ 
    to change decisions in switch expansion which in turn can expose new
    jump threading opportunities.  Skip the later tests on aarch64.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" { target { ! aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 7"  "thread2" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 8"  "thread2" { target { ! aarch64*-*-* } } } } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 18"  "thread2" { target { aarch64*-*-* } } } } */
 
 enum STATE {
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-2.c
index 6b213d4da28..56c4bbf86d8 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-2.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-additional-options "-fno-tree-loop-vectorize" } */
+/* { dg-additional-options "-fno-tree-loop-vectorize -fno-tree-dominator-opts" } */
 /* { dg-require-effective-target lp64 } */
 
 double p[1000];
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c
index 599f718ecdc..67ee809826c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c
@@ -1,7 +1,11 @@ 
 /* { dg-do compile } */
-/* { dg-additional-options "-fno-tree-loop-vectorize" } */
+/* { dg-additional-options "-fno-tree-loop-vectorize -fno-tree-dominator-opts" } */
 /* { dg-require-effective-target lp64 } */
 
+/* A ranger based DOM causes many more SSA names to be exported, which
+   causes slp1 to vectorize more things.  Disabling DOM to avoid
+   disturbing this test.  */
+
 void
 f1 (double *p, double *q, unsigned int n)
 {
diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index 21745bf31d3..74f4eec91be 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -48,7 +48,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "tree-vrp.h"
 #include "vr-values.h"
-#include "gimple-ssa-evrp-analyze.h"
+#include "gimple-range.h"
+#include "gimple-range-path.h"
 #include "alias.h"
 
 /* This file implements optimizations on the dominator tree.  */
@@ -589,132 +590,64 @@  class dom_jt_state : public jt_state
 {
 public:
   dom_jt_state (const_and_copies *copies, avail_exprs_stack *avails,
-		evrp_range_analyzer *evrp)
-    : m_copies (copies), m_avails (avails), m_evrp (evrp)
+		gimple_ranger *ranger)
+    : m_copies (copies), m_avails (avails), m_ranger (ranger)
   {
   }
   void push (edge e) override
   {
     m_copies->push_marker ();
     m_avails->push_marker ();
-    m_evrp->push_marker ();
     jt_state::push (e);
   }
   void pop () override
   {
     m_copies->pop_to_marker ();
     m_avails->pop_to_marker ();
-    m_evrp->pop_to_marker ();
     jt_state::pop ();
   }
   void register_equivs_edge (edge e) override
   {
     record_temporary_equivalences (e, m_copies, m_avails);
   }
-  void record_ranges_from_stmt (gimple *stmt, bool temporary) override
-  {
-    m_evrp->record_ranges_from_stmt (stmt, temporary);
-  }
   void register_equiv (tree dest, tree src, bool update) override;
 private:
   const_and_copies *m_copies;
   avail_exprs_stack *m_avails;
-  evrp_range_analyzer *m_evrp;
+  gimple_ranger *m_ranger;
 };
 
 void
-dom_jt_state::register_equiv (tree dest, tree src, bool update)
+dom_jt_state::register_equiv (tree dest, tree src, bool)
 {
   m_copies->record_const_or_copy (dest, src);
-
-  /* If requested, update the value range associated with DST, using
-     the range from SRC.  */
-  if (update)
-    {
-      /* Get new VR we can pass to push_value_range.  */
-      value_range_equiv *new_vr = m_evrp->allocate_value_range_equiv ();
-      new (new_vr) value_range_equiv ();
-
-      /* There are three cases to consider:
-
-	 First if SRC is an SSA_NAME, then we can copy the value range
-	 from SRC into NEW_VR.
-
-	 Second if SRC is an INTEGER_CST, then we can just set NEW_VR
-	 to a singleton range.  Note that even if SRC is a constant we
-	 need to set a suitable output range so that VR_UNDEFINED
-	 ranges do not leak through.
-
-	 Otherwise set NEW_VR to varying.  This may be overly
-	 conservative.  */
-      if (TREE_CODE (src) == SSA_NAME)
-	new_vr->deep_copy (m_evrp->get_value_range (src));
-      else if (TREE_CODE (src) == INTEGER_CST)
-	new_vr->set (src);
-      else
-	new_vr->set_varying (TREE_TYPE (src));
-
-      /* This is a temporary range for DST, so push it.  */
-      m_evrp->push_value_range (dest, new_vr);
-    }
 }
 
-class dom_jt_simplifier : public jt_simplifier
+class dom_jt_simplifier : public hybrid_jt_simplifier
 {
 public:
-  dom_jt_simplifier (vr_values *v, avail_exprs_stack *avails)
-    : m_vr_values (v), m_avails (avails) { }
+  dom_jt_simplifier (avail_exprs_stack *avails, gimple_ranger *ranger,
+		     path_range_query *query)
+    : hybrid_jt_simplifier (ranger, query), m_avails (avails) { }
 
 private:
   tree simplify (gimple *, gimple *, basic_block, jt_state *) override;
-  vr_values *m_vr_values;
   avail_exprs_stack *m_avails;
 };
 
 tree
 dom_jt_simplifier::simplify (gimple *stmt, gimple *within_stmt,
-			     basic_block, jt_state *)
+			     basic_block bb, jt_state *state)
 {
   /* First see if the conditional is in the hash table.  */
   tree cached_lhs =  m_avails->lookup_avail_expr (stmt, false, true);
   if (cached_lhs)
     return cached_lhs;
 
-  if (gcond *cond_stmt = dyn_cast <gcond *> (stmt))
-    {
-      simplify_using_ranges simplifier (m_vr_values);
-      return simplifier.vrp_evaluate_conditional (gimple_cond_code (cond_stmt),
-						  gimple_cond_lhs (cond_stmt),
-						  gimple_cond_rhs (cond_stmt),
-						  within_stmt);
-    }
-  if (gswitch *switch_stmt = dyn_cast <gswitch *> (stmt))
-    {
-      tree op = gimple_switch_index (switch_stmt);
-      if (TREE_CODE (op) != SSA_NAME)
-	return NULL_TREE;
+  /* Otherwise call the ranger if possible.  */
+  if (state)
+    return hybrid_jt_simplifier::simplify (stmt, within_stmt, bb, state);
 
-      const value_range_equiv *vr = m_vr_values->get_value_range (op);
-      return find_case_label_range (switch_stmt, vr);
-    }
-  if (gassign *assign_stmt = dyn_cast <gassign *> (stmt))
-    {
-      tree lhs = gimple_assign_lhs (assign_stmt);
-      if (TREE_CODE (lhs) == SSA_NAME
-	  && (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-	      || POINTER_TYPE_P (TREE_TYPE (lhs)))
-	  && stmt_interesting_for_vrp (stmt))
-	{
-	  edge dummy_e;
-	  tree dummy_tree;
-	  value_range_equiv new_vr;
-	  m_vr_values->extract_range_from_stmt (stmt, &dummy_e, &dummy_tree,
-						&new_vr);
-	  tree singleton;
-	  if (new_vr.singleton_p (&singleton))
-	    return singleton;
-	}
-    }
   return NULL;
 }
 
@@ -724,12 +657,12 @@  public:
   dom_opt_dom_walker (cdi_direction direction,
 		      jump_threader *threader,
 		      jt_state *state,
-		      evrp_range_analyzer *analyzer,
+		      gimple_ranger *ranger,
 		      const_and_copies *const_and_copies,
 		      avail_exprs_stack *avail_exprs_stack)
     : dom_walker (direction, REACHABLE_BLOCKS)
     {
-      m_evrp_range_analyzer = analyzer;
+      m_ranger = ranger;
       m_state = state;
       m_dummy_cond = gimple_build_cond (NE_EXPR, integer_zero_node,
 					integer_zero_node, NULL, NULL);
@@ -756,11 +689,13 @@  private:
      value.  */
   edge optimize_stmt (basic_block, gimple_stmt_iterator *, bool *);
 
+  void set_global_ranges_from_unreachable_edges (basic_block);
 
   void test_for_singularity (gimple *, avail_exprs_stack *);
+  edge fold_cond (gcond *cond);
 
   jump_threader *m_threader;
-  evrp_range_analyzer *m_evrp_range_analyzer;
+  gimple_ranger *m_ranger;
   jt_state *m_state;
 };
 
@@ -857,18 +792,22 @@  pass_dominator::execute (function *fun)
     record_edge_info (bb);
 
   /* Recursively walk the dominator tree optimizing statements.  */
-  evrp_range_analyzer analyzer (true);
-  dom_jt_simplifier simplifier (&analyzer, avail_exprs_stack);
-  dom_jt_state state (const_and_copies, avail_exprs_stack, &analyzer);
+  gimple_ranger *ranger = enable_ranger (fun);
+  path_range_query path_query (/*resolve=*/true, ranger);
+  dom_jt_simplifier simplifier (avail_exprs_stack, ranger, &path_query);
+  dom_jt_state state (const_and_copies, avail_exprs_stack, ranger);
   jump_threader threader (&simplifier, &state);
   dom_opt_dom_walker walker (CDI_DOMINATORS,
 			     &threader,
 			     &state,
-			     &analyzer,
+			     ranger,
 			     const_and_copies,
 			     avail_exprs_stack);
   walker.walk (fun->cfg->x_entry_block_ptr);
 
+  ranger->export_global_ranges ();
+  disable_ranger (fun);
+
   /* Look for blocks where we cleared EDGE_EXECUTABLE on an outgoing
      edge.  When found, remove jump threads which contain any outgoing
      edge from the affected block.  */
@@ -1253,6 +1192,77 @@  record_equivalences_from_phis (basic_block bb)
     }
 }
 
+/* Return true if all uses of NAME are dominated by STMT or feed STMT
+   via a chain of single immediate uses.  */
+
+static bool
+all_uses_feed_or_dominated_by_stmt (tree name, gimple *stmt)
+{
+  use_operand_p use_p, use2_p;
+  imm_use_iterator iter;
+  basic_block stmt_bb = gimple_bb (stmt);
+
+  FOR_EACH_IMM_USE_FAST (use_p, iter, name)
+    {
+      gimple *use_stmt = USE_STMT (use_p), *use_stmt2;
+      if (use_stmt == stmt
+	  || is_gimple_debug (use_stmt)
+	  || (gimple_bb (use_stmt) != stmt_bb
+	      && dominated_by_p (CDI_DOMINATORS,
+				 gimple_bb (use_stmt), stmt_bb)))
+	continue;
+      while (use_stmt != stmt
+	     && is_gimple_assign (use_stmt)
+	     && TREE_CODE (gimple_assign_lhs (use_stmt)) == SSA_NAME
+	     && single_imm_use (gimple_assign_lhs (use_stmt),
+				&use2_p, &use_stmt2))
+	use_stmt = use_stmt2;
+      if (use_stmt != stmt)
+	return false;
+    }
+  return true;
+}
+
+/* Set global ranges that can be determined from the C->M edge:
+
+   <bb C>:
+   ...
+   if (something)
+     goto <bb N>;
+   else
+     goto <bb M>;
+   <bb N>:
+   __builtin_unreachable ();
+   <bb M>:
+*/
+
+void
+dom_opt_dom_walker::set_global_ranges_from_unreachable_edges (basic_block bb)
+{
+  edge pred_e = single_pred_edge_ignoring_loop_edges (bb, false);
+
+  if (!pred_e)
+    return;
+
+  gimple *stmt = last_stmt (pred_e->src);
+  tree name;
+  int_range_max r;
+  if (stmt
+      && gimple_code (stmt) == GIMPLE_COND
+      && (name = gimple_cond_lhs (stmt))
+      && TREE_CODE (name) == SSA_NAME
+      && irange::supports_type_p (TREE_TYPE (name))
+      && assert_unreachable_fallthru_edge_p (pred_e)
+      && all_uses_feed_or_dominated_by_stmt (name, stmt)
+      && m_ranger->range_on_edge (r, pred_e, name)
+      && !r.varying_p ()
+      && !r.undefined_p ())
+    {
+      update_global_range (r, name);
+      maybe_set_nonzero_bits (pred_e, name);
+    }
+}
+
 /* Record any equivalences created by the incoming edge to BB into
    CONST_AND_COPIES and AVAIL_EXPRS_STACK.  If BB has more than one
    incoming edge, then no equivalence is created.  */
@@ -1506,8 +1516,6 @@  dom_opt_dom_walker::before_dom_children (basic_block bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "\n\nOptimizing block #%d\n\n", bb->index);
 
-  m_evrp_range_analyzer->enter (bb);
-
   /* Push a marker on the stacks of local information so that we know how
      far to unwind when we finalize this block.  */
   m_avail_exprs_stack->push_marker ();
@@ -1515,6 +1523,7 @@  dom_opt_dom_walker::before_dom_children (basic_block bb)
 
   record_equivalences_from_incoming_edge (bb, m_const_and_copies,
 					  m_avail_exprs_stack);
+  set_global_ranges_from_unreachable_edges (bb);
 
   /* PHI nodes can create equivalences too.  */
   record_equivalences_from_phis (bb);
@@ -1543,7 +1552,6 @@  dom_opt_dom_walker::before_dom_children (basic_block bb)
 	  continue;
 	}
 
-      m_state->record_ranges_from_stmt (gsi_stmt (gsi), false);
       bool removed_p = false;
       taken_edge = this->optimize_stmt (bb, &gsi, &removed_p);
       if (!removed_p)
@@ -1591,7 +1599,6 @@  dom_opt_dom_walker::after_dom_children (basic_block bb)
   m_threader->thread_outgoing_edges (bb);
   m_avail_exprs_stack->pop_to_marker ();
   m_const_and_copies->pop_to_marker ();
-  m_evrp_range_analyzer->leave (bb);
 }
 
 /* Search for redundant computations in STMT.  If any are found, then
@@ -1833,7 +1840,7 @@  cprop_operand (gimple *stmt, use_operand_p op_p, range_query *query)
   val = SSA_NAME_VALUE (op);
   if (!val)
     {
-      value_range r;
+      int_range<2> r;
       tree single;
       if (query->range_of_expr (r, op, stmt) && r.singleton_p (&single))
 	val = single;
@@ -2074,6 +2081,24 @@  reduce_vector_comparison_to_scalar_comparison (gimple *stmt)
     }
 }
 
+/* If possible, rewrite the conditional as TRUE or FALSE, and return
+   the taken edge.  Otherwise, return NULL.  */
+
+edge
+dom_opt_dom_walker::fold_cond (gcond *cond)
+{
+  simplify_using_ranges simplify (m_ranger);
+  if (simplify.fold_cond (cond))
+    {
+      basic_block bb = gimple_bb (cond);
+      if (gimple_cond_true_p (cond))
+	return find_taken_edge (bb, boolean_true_node);
+      if (gimple_cond_false_p (cond))
+	return find_taken_edge (bb, boolean_false_node);
+    }
+  return NULL;
+}
+
 /* Optimize the statement in block BB pointed to by iterator SI.
 
    We try to perform some simplistic global redundancy elimination and
@@ -2122,7 +2147,7 @@  dom_opt_dom_walker::optimize_stmt (basic_block bb, gimple_stmt_iterator *si,
   opt_stats.num_stmts++;
 
   /* Const/copy propagate into USES, VUSES and the RHS of VDEFs.  */
-  cprop_into_stmt (stmt, m_evrp_range_analyzer);
+  cprop_into_stmt (stmt, m_ranger);
 
   /* If the statement has been modified with constant replacements,
      fold its RHS before checking for redundant computations.  */
@@ -2219,17 +2244,9 @@  dom_opt_dom_walker::optimize_stmt (basic_block bb, gimple_stmt_iterator *si,
 		 IL because we've attached range information to new
 		 SSA_NAMES.  */
 	      update_stmt_if_modified (stmt);
-	      edge taken_edge = NULL;
-	      simplify_using_ranges simpl (m_evrp_range_analyzer);
-	      simpl.vrp_visit_cond_stmt (as_a <gcond *> (stmt), &taken_edge);
+	      edge taken_edge = fold_cond (as_a <gcond *> (stmt));
 	      if (taken_edge)
 		{
-		  if (taken_edge->flags & EDGE_TRUE_VALUE)
-		    gimple_cond_make_true (as_a <gcond *> (stmt));
-		  else if (taken_edge->flags & EDGE_FALSE_VALUE)
-		    gimple_cond_make_false (as_a <gcond *> (stmt));
-		  else
-		    gcc_unreachable ();
 		  gimple_set_modified (stmt, true);
 		  update_stmt (stmt);
 		  cfg_altered = true;
diff --git a/gcc/tree-ssa-threadedge.cc b/gcc/tree-ssa-threadedge.cc
index 4eb65ca7cac..50251af8fc0 100644
--- a/gcc/tree-ssa-threadedge.cc
+++ b/gcc/tree-ssa-threadedge.cc
@@ -1377,7 +1377,9 @@  jt_state::register_equivs_stmt (gimple *stmt, basic_block bb,
 		SET_USE (use_p, tmp);
 	    }
 
-	  cached_lhs = simplifier->simplify (stmt, stmt, bb, this);
+	  /* Do not pass state to avoid calling the ranger with the
+	     temporarily altered IL.  */
+	  cached_lhs = simplifier->simplify (stmt, stmt, bb, /*state=*/NULL);
 
 	  /* Restore the statement's original uses/defs.  */
 	  i = 0;
diff --git a/gcc/vr-values.h b/gcc/vr-values.h
index f29441747e2..db0aa838bfe 100644
--- a/gcc/vr-values.h
+++ b/gcc/vr-values.h
@@ -41,6 +41,7 @@  public:
   // ?? These should be cleaned, merged, and made private.
   tree vrp_evaluate_conditional (tree_code, tree, tree, gimple *);
   void vrp_visit_cond_stmt (gcond *, edge *);
+  bool fold_cond (gcond *);
   tree vrp_evaluate_conditional_warnv_with_ops (gimple *stmt, enum tree_code,
 						tree, tree, bool,
 						bool *, bool *);
@@ -53,7 +54,6 @@  private:
   bool simplify_bit_ops_using_ranges (gimple_stmt_iterator *, gimple *);
   bool simplify_min_or_max_using_ranges (gimple_stmt_iterator *, gimple *);
   bool simplify_cond_using_ranges_1 (gcond *);
-  bool fold_cond (gcond *);
   bool simplify_switch_using_ranges (gswitch *);
   bool simplify_float_conversion_using_ranges (gimple_stmt_iterator *,
 					       gimple *);