Remove VRP threader passes in exchange for better threading pre-VRP.
Commit Message
This patch upgrades the pre-VRP threading passes to fully resolving
backward threaders, and removes the post-VRP threading passes altogether.
With it, we reduce the number of threaders in our pipeline from 9 to 7.
This will leave DOM as the only forward threader client. When the ranger
can handle floats, we should be able to upgrade the pre-DOM threaders to
fully resolving threaders and kill the embedded DOM threader.
The final numbers are:
prev: # threads in backward + vrp-threaders = 92624
now: # threads in backward threaders = 94275
Gain: +1.78%
prev: # total threads: 189495
now: # total threads: 193714
Gain: +2.22%
The numbers are not as great as my initial proposal, but I've
recently pushed all the work that got us to this point ;-).
And... the total compilation improves by 1.32%!
There's a regression on uninit-pred-7_a.c that I've yet to look at. I
want to make sure it's not a missing thread. If it is, I'll create a PR
and own it.
Also, the tree-ssa/phi_on_compare-*.c tests have all regressed. This
seems to be some special case the forward threader handles that the
backward threader does not (edge_forwards_cmp_to_conditional_jump*).
I haven't dug deep to see if this is solveable within our
infrastructure, but a cursory look shows that even though the VRP
threader threads this, the *.optimized dump ends with more conditional
jumps than without the optimization. I'd like to punt on this for
now, because DOM actually catches this through its lone use of the
forward threader (I've adjusted the tests). However, we will need to
address this sooner or later, if indeed it's still improving the final
assembly.
Even though we have been incrementally stressing all the pieces of this
intricate puzzle, I do expect fall out. My plan from here until stage1
ends is to stop new development in the threader(s), and focus on bug
fixing and improving the developer's debugging experience.
OK pending another round of tests on x86-64 and ppc64le Linux?
gcc/ChangeLog:
* passes.def: Replace the pass_thread_jumps before VRP* with
pass_thread_jumps_full. Remove all pass_vrp_threader instances.
libgomp/ChangeLog:
* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes.
* testsuite/libgomp.graphite/force-parallel-8.c: Same.
gcc/testsuite/ChangeLog:
* gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
* gcc.dg/old-style-asm-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
* gcc.dg/tree-ssa/pr20701.c: Same.
* gcc.dg/tree-ssa/pr21001.c: Same.
* gcc.dg/tree-ssa/pr21294.c: Same.
* gcc.dg/tree-ssa/pr21417.c: Same.
* gcc.dg/tree-ssa/pr21559.c: Same.
* gcc.dg/tree-ssa/pr21563.c: Same.
* gcc.dg/tree-ssa/pr49039.c: Same.
* gcc.dg/tree-ssa/pr59597.c: Same.
* gcc.dg/tree-ssa/pr61839_1.c: Same.
* gcc.dg/tree-ssa/pr61839_3.c: Same.
* gcc.dg/tree-ssa/pr66752-3.c: Same.
* gcc.dg/tree-ssa/pr68198.c: Same.
* gcc.dg/tree-ssa/pr77445-2.c: Same.
* gcc.dg/tree-ssa/pr77445.c: Same.
* gcc.dg/tree-ssa/ranger-threader-1.c: Same.
* gcc.dg/tree-ssa/ranger-threader-2.c: Same.
* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
* gcc.dg/tree-ssa/vrp02.c: Same.
* gcc.dg/tree-ssa/vrp03.c: Same.
* gcc.dg/tree-ssa/vrp05.c: Same.
* gcc.dg/tree-ssa/vrp06.c: Same.
* gcc.dg/tree-ssa/vrp07.c: Same.
* gcc.dg/tree-ssa/vrp08.c: Same.
* gcc.dg/tree-ssa/vrp09.c: Same.
* gcc.dg/tree-ssa/vrp106.c: Same.
* gcc.dg/tree-ssa/vrp33.c: Same.
---
gcc/passes.def | 6 ++----
gcc/testsuite/gcc.dg/loop-unswitch-2.c | 2 +-
gcc/testsuite/gcc.dg/old-style-asm-1.c | 5 +----
gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c | 9 +++++++--
gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/pr20701.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/pr21001.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/pr21294.c | 3 +--
gcc/testsuite/gcc.dg/tree-ssa/pr21417.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/pr21559.c | 7 +------
gcc/testsuite/gcc.dg/tree-ssa/pr21563.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/pr49039.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/pr59597.c | 11 ++++++-----
gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c | 6 +++---
gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 6 +++---
gcc/testsuite/gcc.dg/tree-ssa/pr68198.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 10 +++++-----
gcc/testsuite/gcc.dg/tree-ssa/pr77445.c | 6 +++---
gcc/testsuite/gcc.dg/tree-ssa/ranger-threader-1.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/ranger-threader-2.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/ranger-threader-4.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-1.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-11.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-12.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-14.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-16.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 11 +++++++----
gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c | 5 ++---
gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-backedge.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/ssa-vrp-thread-1.c | 4 ++--
gcc/testsuite/gcc.dg/tree-ssa/vrp02.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/vrp03.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/vrp05.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/vrp06.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/vrp07.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/vrp08.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/vrp09.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/vrp33.c | 2 +-
gcc/tree-ssa-threadbackward.c | 2 +-
libgomp/testsuite/libgomp.graphite/force-parallel-4.c | 2 +-
libgomp/testsuite/libgomp.graphite/force-parallel-8.c | 2 +-
45 files changed, 86 insertions(+), 89 deletions(-)
Comments
>
>
> And... the total compilation improves by 1.32%!
>
This last number is compilation speed, not number of threads.
Aldy
On 10/28/2021 9:24 AM, Aldy Hernandez wrote:
> This patch upgrades the pre-VRP threading passes to fully resolving
> backward threaders, and removes the post-VRP threading passes altogether.
> With it, we reduce the number of threaders in our pipeline from 9 to 7.
>
> This will leave DOM as the only forward threader client. When the ranger
> can handle floats, we should be able to upgrade the pre-DOM threaders to
> fully resolving threaders and kill the embedded DOM threader.
>
> The final numbers are:
>
> prev: # threads in backward + vrp-threaders = 92624
> now: # threads in backward threaders = 94275
> Gain: +1.78%
>
> prev: # total threads: 189495
> now: # total threads: 193714
> Gain: +2.22%
>
> The numbers are not as great as my initial proposal, but I've
> recently pushed all the work that got us to this point ;-).
>
> And... the total compilation improves by 1.32%!
>
> There's a regression on uninit-pred-7_a.c that I've yet to look at. I
> want to make sure it's not a missing thread. If it is, I'll create a PR
> and own it.
>
> Also, the tree-ssa/phi_on_compare-*.c tests have all regressed. This
> seems to be some special case the forward threader handles that the
> backward threader does not (edge_forwards_cmp_to_conditional_jump*).
> I haven't dug deep to see if this is solveable within our
> infrastructure, but a cursory look shows that even though the VRP
> threader threads this, the *.optimized dump ends with more conditional
> jumps than without the optimization. I'd like to punt on this for
> now, because DOM actually catches this through its lone use of the
> forward threader (I've adjusted the tests). However, we will need to
> address this sooner or later, if indeed it's still improving the final
> assembly.
>
> Even though we have been incrementally stressing all the pieces of this
> intricate puzzle, I do expect fall out. My plan from here until stage1
> ends is to stop new development in the threader(s), and focus on bug
> fixing and improving the developer's debugging experience.
>
> OK pending another round of tests on x86-64 and ppc64le Linux?
>
> gcc/ChangeLog:
>
> * passes.def: Replace the pass_thread_jumps before VRP* with
> pass_thread_jumps_full. Remove all pass_vrp_threader instances.
>
> libgomp/ChangeLog:
>
> * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes.
> * testsuite/libgomp.graphite/force-parallel-8.c: Same.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
> * gcc.dg/old-style-asm-1.c: Same.
> * gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
> * gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
> * gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
> * gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
> * gcc.dg/tree-ssa/pr20701.c: Same.
> * gcc.dg/tree-ssa/pr21001.c: Same.
> * gcc.dg/tree-ssa/pr21294.c: Same.
> * gcc.dg/tree-ssa/pr21417.c: Same.
> * gcc.dg/tree-ssa/pr21559.c: Same.
> * gcc.dg/tree-ssa/pr21563.c: Same.
> * gcc.dg/tree-ssa/pr49039.c: Same.
> * gcc.dg/tree-ssa/pr59597.c: Same.
> * gcc.dg/tree-ssa/pr61839_1.c: Same.
> * gcc.dg/tree-ssa/pr61839_3.c: Same.
> * gcc.dg/tree-ssa/pr66752-3.c: Same.
> * gcc.dg/tree-ssa/pr68198.c: Same.
> * gcc.dg/tree-ssa/pr77445-2.c: Same.
> * gcc.dg/tree-ssa/pr77445.c: Same.
> * gcc.dg/tree-ssa/ranger-threader-1.c: Same.
> * gcc.dg/tree-ssa/ranger-threader-2.c: Same.
> * gcc.dg/tree-ssa/ranger-threader-4.c: Same.
> * gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
> * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
> * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
> * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
> * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
> * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
> * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
> * gcc.dg/tree-ssa/ssa-thread-14.c: Same.
> * gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
> * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
> * gcc.dg/tree-ssa/vrp02.c: Same.
> * gcc.dg/tree-ssa/vrp03.c: Same.
> * gcc.dg/tree-ssa/vrp05.c: Same.
> * gcc.dg/tree-ssa/vrp06.c: Same.
> * gcc.dg/tree-ssa/vrp07.c: Same.
> * gcc.dg/tree-ssa/vrp08.c: Same.
> * gcc.dg/tree-ssa/vrp09.c: Same.
> * gcc.dg/tree-ssa/vrp106.c: Same.
> * gcc.dg/tree-ssa/vrp33.c: Same.
OK. And yes, there will probably be fallout. Fully expected and we'll
deal with it.
jeff
On Thu, Oct 28, 2021 at 8:34 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 10/28/2021 9:24 AM, Aldy Hernandez wrote:
> > This patch upgrades the pre-VRP threading passes to fully resolving
> > backward threaders, and removes the post-VRP threading passes altogether.
> > With it, we reduce the number of threaders in our pipeline from 9 to 7.
> >
> > This will leave DOM as the only forward threader client. When the ranger
> > can handle floats, we should be able to upgrade the pre-DOM threaders to
> > fully resolving threaders and kill the embedded DOM threader.
> >
> > The final numbers are:
> >
> > prev: # threads in backward + vrp-threaders = 92624
> > now: # threads in backward threaders = 94275
> > Gain: +1.78%
> >
> > prev: # total threads: 189495
> > now: # total threads: 193714
> > Gain: +2.22%
> >
> > The numbers are not as great as my initial proposal, but I've
> > recently pushed all the work that got us to this point ;-).
> >
> > And... the total compilation improves by 1.32%!
> >
> > There's a regression on uninit-pred-7_a.c that I've yet to look at. I
> > want to make sure it's not a missing thread. If it is, I'll create a PR
> > and own it.
> >
> > Also, the tree-ssa/phi_on_compare-*.c tests have all regressed. This
> > seems to be some special case the forward threader handles that the
> > backward threader does not (edge_forwards_cmp_to_conditional_jump*).
> > I haven't dug deep to see if this is solveable within our
> > infrastructure, but a cursory look shows that even though the VRP
> > threader threads this, the *.optimized dump ends with more conditional
> > jumps than without the optimization. I'd like to punt on this for
> > now, because DOM actually catches this through its lone use of the
> > forward threader (I've adjusted the tests). However, we will need to
> > address this sooner or later, if indeed it's still improving the final
> > assembly.
> >
> > Even though we have been incrementally stressing all the pieces of this
> > intricate puzzle, I do expect fall out. My plan from here until stage1
> > ends is to stop new development in the threader(s), and focus on bug
> > fixing and improving the developer's debugging experience.
> >
> > OK pending another round of tests on x86-64 and ppc64le Linux?
> >
> > gcc/ChangeLog:
> >
> > * passes.def: Replace the pass_thread_jumps before VRP* with
> > pass_thread_jumps_full. Remove all pass_vrp_threader instances.
> >
> > libgomp/ChangeLog:
> >
> > * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes.
> > * testsuite/libgomp.graphite/force-parallel-8.c: Same.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
> > * gcc.dg/old-style-asm-1.c: Same.
> > * gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
> > * gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
> > * gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
> > * gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
> > * gcc.dg/tree-ssa/pr20701.c: Same.
> > * gcc.dg/tree-ssa/pr21001.c: Same.
> > * gcc.dg/tree-ssa/pr21294.c: Same.
> > * gcc.dg/tree-ssa/pr21417.c: Same.
> > * gcc.dg/tree-ssa/pr21559.c: Same.
> > * gcc.dg/tree-ssa/pr21563.c: Same.
> > * gcc.dg/tree-ssa/pr49039.c: Same.
> > * gcc.dg/tree-ssa/pr59597.c: Same.
> > * gcc.dg/tree-ssa/pr61839_1.c: Same.
> > * gcc.dg/tree-ssa/pr61839_3.c: Same.
> > * gcc.dg/tree-ssa/pr66752-3.c: Same.
> > * gcc.dg/tree-ssa/pr68198.c: Same.
> > * gcc.dg/tree-ssa/pr77445-2.c: Same.
> > * gcc.dg/tree-ssa/pr77445.c: Same.
> > * gcc.dg/tree-ssa/ranger-threader-1.c: Same.
> > * gcc.dg/tree-ssa/ranger-threader-2.c: Same.
> > * gcc.dg/tree-ssa/ranger-threader-4.c: Same.
> > * gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
> > * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
> > * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
> > * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
> > * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
> > * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
> > * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
> > * gcc.dg/tree-ssa/ssa-thread-14.c: Same.
> > * gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
> > * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
> > * gcc.dg/tree-ssa/vrp02.c: Same.
> > * gcc.dg/tree-ssa/vrp03.c: Same.
> > * gcc.dg/tree-ssa/vrp05.c: Same.
> > * gcc.dg/tree-ssa/vrp06.c: Same.
> > * gcc.dg/tree-ssa/vrp07.c: Same.
> > * gcc.dg/tree-ssa/vrp08.c: Same.
> > * gcc.dg/tree-ssa/vrp09.c: Same.
> > * gcc.dg/tree-ssa/vrp106.c: Same.
> > * gcc.dg/tree-ssa/vrp33.c: Same.
> OK. And yes, there will probably be fallout. Fully expected and we'll
> deal with it.
Btw, in case the "fully resolving" mode is slower than not fully resolving
please consider gating it on -fexpensive-optimizations (aka -O2+), thus
run the passes in not fully resolving modes at-O1.
Btw, there were quite a few big compile-time hogs with the vrp_threader
passes, not sure if this solves those.
Richard.
> jeff
>
On Fri, Oct 29, 2021 at 9:30 AM Richard Biener
<richard.guenther@gmail.com> wrote:
> Btw, in case the "fully resolving" mode is slower than not fully resolving
> please consider gating it on -fexpensive-optimizations (aka -O2+), thus
> run the passes in not fully resolving modes at-O1.
Sorry for the awkward naming. I couldn't find a better name :-/.
Suggestions welcome.
The fast mode assumes any unknown ranges on entry to a path to be
VARYING, whereas the fully resolving mode will ask the ranger, so the
fully resolving mode will indeed be slower. Though, I haven't
measured how much. However, we are gaining some time in total
compilation speed (1.32%) by replacing two threaders with one.
>
> Btw, there were quite a few big compile-time hogs with the vrp_threader
> passes, not sure if this solves those.
Sorry for not commenting on your spec ltrans report. I was waiting
until this went in to get a better feel of whether it was the path
solver, the forward threader, or something else. When I commit this
patch we'll get the forward threader out of the set of variables to
examine. The forward threader, for instance, has very few knobs
limiting its behavior, and coupled with a smarter solver, who knows
what's going on.
It is possible we may need to add a few knobs (or re-add some of the
ones I removed??), since the backward threader can find a whole slew
of paths that the forward threader could never find.
Aldy
On Fri, Oct 29, 2021 at 10:06 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Fri, Oct 29, 2021 at 9:30 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>
> > Btw, in case the "fully resolving" mode is slower than not fully resolving
> > please consider gating it on -fexpensive-optimizations (aka -O2+), thus
> > run the passes in not fully resolving modes at-O1.
>
> Sorry for the awkward naming. I couldn't find a better name :-/.
> Suggestions welcome.
>
> The fast mode assumes any unknown ranges on entry to a path to be
> VARYING, whereas the fully resolving mode will ask the ranger, so the
> fully resolving mode will indeed be slower. Though, I haven't
> measured how much. However, we are gaining some time in total
> compilation speed (1.32%) by replacing two threaders with one.
OK. Just again, -O1 is to favor compile-speed and should crunch through
those incredibly stupi^Wlarge machine-generated sources without problems.
But from your comment it doesn't sound like something completely unreasonable
or slow.
> >
> > Btw, there were quite a few big compile-time hogs with the vrp_threader
> > passes, not sure if this solves those.
>
> Sorry for not commenting on your spec ltrans report. I was waiting
> until this went in to get a better feel of whether it was the path
> solver, the forward threader, or something else. When I commit this
> patch we'll get the forward threader out of the set of variables to
> examine. The forward threader, for instance, has very few knobs
> limiting its behavior, and coupled with a smarter solver, who knows
> what's going on.
>
> It is possible we may need to add a few knobs (or re-add some of the
> ones I removed??), since the backward threader can find a whole slew
> of paths that the forward threader could never find.
Yeah, sure. I'll wait unless this change is in and will re-measure and update
the PR.
Richard.
> Aldy
>
On Fri, Oct 29, 2021 at 10:10 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Oct 29, 2021 at 10:06 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Fri, Oct 29, 2021 at 9:30 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >
> > > Btw, in case the "fully resolving" mode is slower than not fully resolving
> > > please consider gating it on -fexpensive-optimizations (aka -O2+), thus
> > > run the passes in not fully resolving modes at-O1.
> >
> > Sorry for the awkward naming. I couldn't find a better name :-/.
> > Suggestions welcome.
> >
> > The fast mode assumes any unknown ranges on entry to a path to be
> > VARYING, whereas the fully resolving mode will ask the ranger, so the
> > fully resolving mode will indeed be slower. Though, I haven't
> > measured how much. However, we are gaining some time in total
> > compilation speed (1.32%) by replacing two threaders with one.
>
> OK. Just again, -O1 is to favor compile-speed and should crunch through
> those incredibly stupi^Wlarge machine-generated sources without problems.
> But from your comment it doesn't sound like something completely unreasonable
> or slow.
It shouldn't be a problem. Andrew has worked hard at handling those
large CFGs, and I'm just leveraging his work. The backward threader
also has a limit of 10 blocks look-back. But if it becomes a problem,
I'm more than happy to gate the fully resolving threader with
fexpensive-optimizations, but we will lose threading ability at -O1.
I assume that's OK?
FWIW, Andrew has mentioned providing a fast mode for the ranger for
precisely those huge CFGs. Perhaps when that's ready, we could use
that mode for -O1.
>
> > >
> > > Btw, there were quite a few big compile-time hogs with the vrp_threader
> > > passes, not sure if this solves those.
> >
> > Sorry for not commenting on your spec ltrans report. I was waiting
> > until this went in to get a better feel of whether it was the path
> > solver, the forward threader, or something else. When I commit this
> > patch we'll get the forward threader out of the set of variables to
> > examine. The forward threader, for instance, has very few knobs
> > limiting its behavior, and coupled with a smarter solver, who knows
> > what's going on.
> >
> > It is possible we may need to add a few knobs (or re-add some of the
> > ones I removed??), since the backward threader can find a whole slew
> > of paths that the forward threader could never find.
>
> Yeah, sure. I'll wait unless this change is in and will re-measure and update
> the PR.
I'm working through a regression on ppc64, but I should be able to
push later today.
Thanks.
Aldy
On Fri, Oct 29, 2021 at 10:10 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Oct 29, 2021 at 10:06 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Fri, Oct 29, 2021 at 9:30 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >
> > > Btw, in case the "fully resolving" mode is slower than not fully resolving
> > > please consider gating it on -fexpensive-optimizations (aka -O2+), thus
> > > run the passes in not fully resolving modes at-O1.
> >
> > Sorry for the awkward naming. I couldn't find a better name :-/.
> > Suggestions welcome.
> >
> > The fast mode assumes any unknown ranges on entry to a path to be
> > VARYING, whereas the fully resolving mode will ask the ranger, so the
> > fully resolving mode will indeed be slower. Though, I haven't
> > measured how much. However, we are gaining some time in total
> > compilation speed (1.32%) by replacing two threaders with one.
>
> OK. Just again, -O1 is to favor compile-speed and should crunch through
> those incredibly stupi^Wlarge machine-generated sources without problems.
> But from your comment it doesn't sound like something completely unreasonable
> or slow.
Oh, I just noticed...we already key off of -fexpensive-optimizations.
Duh. The only backward threader that runs at -O1 is ethread, which
does not fully resolve. So I think we're good.
But your comment still applies when we kill the DOM threader and
replace it with a pre-DOM fully resolving threader, since DOM does run
at -O1. Ughhh.. I really hate that DOM is an evrp pass in disguise
but at -O1.
Aldy
On Thu, 2021-10-28 at 17:24 +0200, Aldy Hernandez via Gcc-patches
wrote:
[...snip...]
> gcc/ChangeLog:
>
> * passes.def: Replace the pass_thread_jumps before VRP* with
> pass_thread_jumps_full. Remove all pass_vrp_threader
> instances.
Given that you're deleting all pass_vrp_threader instances, will you be
deleting make_pass_vrp_threader and class pass_vrp_threader once the
dust settles? (and thus execute_vrp_threader, etc?)
Dave
Yes as well as anything ASSERT related in the forward threader. That'll be
a follow-up patch.
Aldy
On Fri, Oct 29, 2021, 22:58 David Malcolm <dmalcolm@redhat.com> wrote:
> On Thu, 2021-10-28 at 17:24 +0200, Aldy Hernandez via Gcc-patches
> wrote:
>
> [...snip...]
>
> > gcc/ChangeLog:
> >
> > * passes.def: Replace the pass_thread_jumps before VRP* with
> > pass_thread_jumps_full. Remove all pass_vrp_threader
> > instances.
>
> Given that you're deleting all pass_vrp_threader instances, will you be
> deleting make_pass_vrp_threader and class pass_vrp_threader once the
> dust settles? (and thus execute_vrp_threader, etc?)
>
> Dave
>
>
@@ -210,9 +210,8 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_return_slot);
NEXT_PASS (pass_fre, true /* may_iterate */);
NEXT_PASS (pass_merge_phi);
- NEXT_PASS (pass_thread_jumps);
+ NEXT_PASS (pass_thread_jumps_full);
NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
- NEXT_PASS (pass_vrp_threader);
NEXT_PASS (pass_dse);
NEXT_PASS (pass_dce);
/* pass_stdarg is always run and at this point we execute
@@ -336,9 +335,8 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_thread_jumps);
NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
NEXT_PASS (pass_strlen);
- NEXT_PASS (pass_thread_jumps);
+ NEXT_PASS (pass_thread_jumps_full);
NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
- NEXT_PASS (pass_vrp_threader);
/* Run CCP to compute alignment and nonzero bits. */
NEXT_PASS (pass_ccp, true /* nonzero_p */);
NEXT_PASS (pass_warn_restrict);
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details -fdisable-tree-thread2 -fdisable-tree-thread3" } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details -fno-thread-jumps" } */
void foo (float **a, float **b, float *c, int n, int m, int l)
{
@@ -1,9 +1,6 @@
/* PR inline-asm/8832 */
/* { dg-do compile } */
-/* { dg-options "-O2 -dP -fdisable-tree-ethread -fdisable-tree-thread1 -fdisable-tree-thread2 -fdisable-tree-thread3 -fdisable-tree-thread4" } */
-
-/* Note: Threader will duplicate BBs and replace one conditional branch by an
- unconditional one. */
+/* { dg-options "-O2 -dP -fno-thread-jumps" } */
/* Verify that GCC doesn't optimize
old style asm instructions. */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */
+/* { dg-options "-Ofast -fdump-tree-dom2" } */
void g (int);
void g1 (int);
@@ -27,4 +27,9 @@ f (long a, long b, long c, long d, long x)
g (a);
}
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */
+/* This is actually a regression. The backward threader cannot thread
+ the above scenario, but it is being caught by the DOM threader
+ which still uses the forward threader. We should implement this
+ optimization in the backward threader before killing the forward
+ threader. Similarly for the other phi_on_compare-*.c tests. */
+/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "dom2" } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */
+/* { dg-options "-Ofast -fdump-tree-dom2" } */
void g (void);
void g1 (void);
@@ -20,4 +20,4 @@ f (long a, long b, long c, long d, int x)
}
}
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */
+/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "dom2" } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */
+/* { dg-options "-Ofast -fdump-tree-dom2" } */
void g (void);
void g1 (void);
@@ -22,4 +22,4 @@ f (long a, long b, long c, long d, int x)
}
}
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */
+/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "dom2" } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp-thread1" } */
+/* { dg-options "-Ofast -fdump-tree-dom2" } */
void g (int);
void g1 (int);
@@ -37,4 +37,4 @@ f (long a, long b, long c, long d, int x)
g (c + d);
}
-/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "vrp-thread1" } } */
+/* { dg-final { scan-tree-dump-times "Removing basic block" 1 "dom2" } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fno-early-inlining -fdelete-null-pointer-checks -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fno-early-inlining -fdelete-null-pointer-checks -fno-thread-jumps" } */
typedef struct {
int code;
@@ -5,7 +5,7 @@
range information out of the conditional. */
/* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-dominator-opts -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details" } */
+/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fno-thread-jumps -fdump-tree-vrp1-details" } */
/* { dg-additional-options "-fdisable-tree-ethread -fdisable-tree-thread1" } */
int
@@ -4,8 +4,7 @@
allows us to eliminate the second "if" statement. */
/* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-dominator-opts -fdisable-tree-evrp -fdump-tree-vrp1-details" } */
-/* { dg-additional-options "-fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fno-tree-dominator-opts -fdisable-tree-evrp -fdisable-tree-ethread -fdisable-tree-threadfull1 -fdump-tree-vrp1-details" } */
struct f {
int i;
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-thread3 -fdump-tree-thread4-details" } */
+/* { dg-options "-O2 -fdump-tree-thread2-details" } */
struct tree_common
{
@@ -49,5 +49,5 @@ L23:
/* We should thread the backedge to the top of the loop; ie we only
execute the if (expr->common.code != 142) test once per loop
iteration. */
-/* { dg-final { scan-tree-dump-times "jump thread" 1 "thread4" } } */
+/* { dg-final { scan-tree-dump-times "jump thread" 1 "thread2" } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-vrp-thread1-details" } */
+/* { dg-options "-O2 -fdump-tree-evrp-details" } */
static int blocksize = 4096;
@@ -34,8 +34,3 @@ void foo (void)
/* First, we should simplify the bits < 0 test within the loop. */
/* { dg-final { scan-tree-dump-times "Simplified relational" 1 "evrp" } } */
-
-/* We used to check for 3 threaded jumps here, but they all would
- rotate the loop. */
-
-
@@ -2,7 +2,7 @@
Make sure VRP folds the second "if" statement. */
/* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-dominator-opts -fdisable-tree-evrp -fdump-tree-vrp1-details -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fno-thread-jumps -fdisable-tree-evrp -fdump-tree-vrp1-details" } */
int
foo (int a)
@@ -1,6 +1,6 @@
/* PR tree-optimization/49039 */
/* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fno-thread-jumps" } */
extern void bar (void);
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-vrp-thread1-details" } */
+/* { dg-options "-Ofast -fdisable-tree-cunrolli -fdump-tree-threadfull1-details" } */
typedef unsigned short u16;
typedef unsigned char u8;
@@ -56,7 +56,8 @@ main (int argc, char argv[])
return crc;
}
-/* None of the threads we can get in vrp-thread1 are valid. They all
- cross or rotate loops. */
-/* { dg-final { scan-tree-dump-not "Registering jump thread" "vrp-thread1" } } */
-/* { dg-final { scan-tree-dump-not "joiner" "vrp-thread1" } } */
+/* We used to have no threads in vrp-thread1 because all the attempted
+ ones would cross loops. Now we get 30+ threads before VRP because
+ of loop unrolling. A better option is to disable unrolling and
+ test for the original 4 threads that this test was testing. */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 4 "threadfull1" } } */
@@ -1,6 +1,6 @@
/* PR tree-optimization/61839. */
/* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp-thread1 -fdisable-tree-evrp -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fdisable-tree-ethread -fdisable-tree-threadfull1 -fdump-tree-vrp1 -fdump-tree-optimized" } */
/* { dg-require-effective-target int32plus } */
__attribute__ ((noinline))
@@ -38,11 +38,11 @@ int main ()
}
/* Scan for c = 972195717) >> [0, 1] in function foo. */
-/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1 "vrp-thread1" } } */
+/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1 "vrp1" } } */
/* Previously we were checking for two ?: with constant PHI arguments,
but now we collapse them into one. */
/* Scan for c = 972195717) >> [2, 3] in function bar. */
-/* { dg-final { scan-tree-dump-times "243048929 : 121524464" 1 "vrp-thread1" } } */
+/* { dg-final { scan-tree-dump-times "243048929 : 121524464" 1 "vrp1" } } */
/* { dg-final { scan-tree-dump-times "486097858" 0 "optimized" } } */
@@ -1,6 +1,6 @@
/* PR tree-optimization/61839. */
/* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp-thread1 -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdump-tree-vrp -fdump-tree-optimized -fdisable-tree-ethread -fdisable-tree-threadfull1" } */
__attribute__ ((noinline))
int foo (int a, unsigned b)
@@ -22,5 +22,5 @@ int main ()
}
/* Scan for c [12, 13] << 8 in function foo. */
-/* { dg-final { scan-tree-dump-times "3072 : 3328" 1 "vrp-thread1" } } */
+/* { dg-final { scan-tree-dump-times "3072 : 3328" 1 "vrp1" } } */
/* { dg-final { scan-tree-dump-times "3072" 0 "optimized" } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread4" } */
+/* { dg-options "-O2 -fdump-tree-threadfull1-details -fdump-tree-thread2" } */
extern int status, pt;
extern int count;
@@ -35,7 +35,7 @@ foo (int N, int c, int b, int *a)
/* There are 2 jump threading opportunities (which don't cross loops),
all of which will be realized, which will eliminate testing of
FLAG, completely. */
-/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Registering jump" 2 "threadfull1"} } */
/* We used to remove references to FLAG by DCE2, but this was
depending on early threaders threading through loop boundaries
@@ -43,4 +43,4 @@ foo (int N, int c, int b, int *a)
run after loop optimizations , can successfully eliminate the
references to FLAG. Verify that ther are no references by the late
threading passes. */
-/* { dg-final { scan-tree-dump-not "if .flag" "thread4"} } */
+/* { dg-final { scan-tree-dump-not "if .flag" "thread2"} } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdisable-tree-ethread" } */
+/* { dg-options "-O2 -fdump-tree-threadfull1-details -fdisable-tree-ethread" } */
extern void abort (void);
@@ -38,4 +38,4 @@ c_finish_omp_clauses (tree clauses)
}
/* There are 3 jump threading opportunities. */
-/* { dg-final { scan-tree-dump-times "Registering jump" 3 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Registering jump" 3 "threadfull1"} } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-thread-details-blocks-stats" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-thread-details-blocks-stats -fdump-tree-threadfull1-blocks-stats -fdump-tree-threadfull2-blocks-stats" } */
typedef enum STATES {
START=0,
INVALID,
@@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) {
aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
to change decisions in switch expansion which in turn can expose new
jump threading opportunities. Skip the later tests on aarch64. */
-/* { dg-final { scan-tree-dump "Jumps threaded: \[7-9\]" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: \[7-9\]" "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
-/* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
-/* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump-not "optimizing for size" "thread4" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-not "optimizing for size" "threadfull1" } } */
+/* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-not "optimizing for size" "threadfull2" { target { ! aarch64*-*-* } } } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread3-details-blocks -fno-early-inlining -fno-tree-vrp -fno-tree-dominator-opts" } */
+/* { dg-options "-O2 -fno-early-inlining -fno-tree-vrp -fno-tree-dominator-opts -fdump-tree-thread2-details-blocks" } */
static int a;
static int b;
@@ -25,5 +25,5 @@ main (int argc)
if (b)
test2 ();
}
-/* { dg-final { scan-tree-dump-times "Registering jump thread" 2 "thread3" } } */
-/* { dg-final { scan-tree-dump-not "Invalid sum" "thread3" } } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 2 "thread2" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "thread2" } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details --param logical-op-non-short-circuit=1" } */
+/* { dg-options "-O2 -fdump-tree-threadfull1-details --param logical-op-non-short-circuit=1" } */
// Copied from ssa-dom-thread-11.c
@@ -17,4 +17,4 @@ mark_target_live_regs (int b, int block, int bb_tick)
/* When the first two conditionals in the first IF are true, but
the third conditional is false, then there's a jump threading
opportunity to bypass the second IF statement. */
-/* { dg-final { scan-tree-dump-times "Registering.*jump thread" 1 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Registering.*jump thread" 1 "threadfull1"} } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread2-details -w" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -w" } */
// Copied from ssa-dom-thread-12.c.
@@ -36,4 +36,4 @@ scan_function (gimple stmt)
that stmt->num_ops - 3 != 0. When that test is false, we can derive
a value for stmt->num_ops. That in turn allows us to thread the jump
for the conditional at the start of the call to gimple_op. */
-/* { dg-final { scan-tree-dump-times "Registering.*jump thread" 1 "thread2"} } */
+/* { dg-final { scan-tree-dump-times "Registering.*jump thread" 1 "thread1"} } */
@@ -1,6 +1,6 @@
/* { dg-do compile } */
-/* { dg-additional-options "-O2 -fdump-tree-vrp-details -fdump-tree-thread1-details --param logical-op-non-short-circuit=1" } */
-/* { dg-final { scan-tree-dump-times "Registering jump" 8 "thread1" } } */
+/* { dg-additional-options "-O2 -fdump-tree-threadfull1-details --param logical-op-non-short-circuit=1" } */
+/* { dg-final { scan-tree-dump-times "Registering jump" 8 "threadfull1" } } */
/* Copied from ssa-thread-14. */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom2-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fdisable-tree-threadfull1 -fdump-tree-dom2-details" } */
void t(void);
void q(void);
void q1(void);
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details --param logical-op-non-short-circuit=1 -fdisable-tree-thread1 -fdisable-tree-thread2 -fdisable-tree-vrp-thread1 " } */
+/* { dg-options "-O2 -fdump-tree-dom2-details --param logical-op-non-short-circuit=1 -fdisable-tree-thread1 -fdisable-tree-thread2 -fdisable-tree-threadfull1" } */
static int *bb_ticks;
extern void frob (void);
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details -w -fdisable-tree-thread2" } */
+/* { dg-options "-O2 -fdump-tree-dom2-details -w -fdisable-tree-thread1" } */
typedef long unsigned int size_t;
union tree_node;
typedef union tree_node *tree;
@@ -1,6 +1,6 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-dom2-details -w --param logical-op-non-short-circuit=1" } */
-/* { dg-additional-options "-fdisable-tree-thread1 -fdisable-tree-ethread -fdisable-tree-thread2" } */
+/* { dg-additional-options "-fdisable-tree-thread1 -fdisable-tree-ethread -fdisable-tree-threadfull1" } */
enum optab_methods
{
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details -w --param logical-op-non-short-circuit=1 -fdisable-tree-vrp-thread1" } */
+/* { dg-options "-O2 -fdump-tree-dom2-details -w --param logical-op-non-short-circuit=1 -fdisable-tree-threadfull1" } */
unsigned char
validate_subreg (unsigned int offset, unsigned int isize, unsigned int osize, int zz, int qq)
{
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
+/* { dg-options "-O2 -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
void foo();
void bla();
@@ -26,4 +26,4 @@ void thread_latch_through_header (void)
case. And we want to thread through the header as well. These
are both caught by threading in DOM. */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread2"} } */
@@ -1,15 +1,18 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp-thread2-stats -fno-guess-branch-probability" } */
+/* { dg-options "-O2 -fdump-tree-dom2-stats -fdump-tree-thread2-stats -fdump-tree-dom3-stats -fno-guess-branch-probability" } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
+/* We were previously checking for no threads in vrp-thread2, but now
+ that we have merged the post and pre threaders, we get a dozen
+ threads before VRP2. */
+
/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
to change decisions in switch expansion which in turn can expose new
jump threading opportunities. Skip the later tests on aarch64. */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom3" { target { ! aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded" "vrp-thread2" { target { ! aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 11" "thread3" { target { ! aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread3" { target { aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 11" "thread2" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread2" { target { aarch64*-*-* } } } } */
enum STATE {
S0=0,
@@ -1,7 +1,6 @@
/* { dg-do compile } */
-/* { dg-additional-options "-O2 -fdump-tree-vrp-thread1-details --param logical-op-non-short-circuit=1" } */
-/* { dg-additional-options "-fdisable-tree-thread1" } */
-/* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp-thread1" } } */
+/* { dg-additional-options "-O2 --param logical-op-non-short-circuit=1 -fdump-tree-threadfull1-details" } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 8 "threadfull1" } } */
void foo (void);
void bar (void);
@@ -1,5 +1,5 @@
// { dg-do compile }
-// { dg-options "-O2 -fdisable-tree-ethread -fdisable-tree-thread1 -fdisable-tree-thread2 -fno-tree-dominator-opts -fdump-tree-thread3-details" }
+// { dg-options "-O2 -fdisable-tree-ethread -fdisable-tree-thread1 -fdisable-tree-thread2 -fno-tree-dominator-opts -fdump-tree-threadfull2-details" }
// Test that we can thread jumps across the backedge of a loop through
// the switch statement to a particular case.
@@ -29,4 +29,4 @@ int foo (unsigned int x, int s)
return s;
}
-// { dg-final { scan-tree-dump "Registering jump thread:.*normal \\(back\\)" "thread3" } }
+// { dg-final { scan-tree-dump "Registering jump thread:.*normal \\(back\\)" "threadfull2" } }
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fdump-tree-threadfull1-details -fdelete-null-pointer-checks" } */
/* { dg-skip-if "" keeps_null_pointer_checks } */
void oof (void);
@@ -29,5 +29,5 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
/* ARM Cortex-M defined LOGICAL_OP_NON_SHORT_CIRCUIT to false,
so skip below test. */
-/* { dg-final { scan-tree-dump-times "Registering jump thread" 1 "thread1" { target { ! arm_cortex_m } } } } */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 1 "threadfull1" { target { ! arm_cortex_m } } } } */
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdelete-null-pointer-checks -fdisable-tree-evrp -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdelete-null-pointer-checks -fdisable-tree-evrp -fno-thread-jumps" } */
struct A
{
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fno-thread-jumps" } */
struct A
{
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fno-early-inlining -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fno-early-inlining -fno-thread-jumps" } */
inline int ten()
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fno-thread-jumps" } */
int baz (void);
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details -fdelete-null-pointer-checks -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details -fdelete-null-pointer-checks -fno-thread-jumps" } */
int
foo (int i, int *p)
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details -fdisable-tree-thread1 -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1-details -fno-thread-jumps -fdelete-null-pointer-checks" } */
/* Compile with -fno-tree-fre -O2 to prevent CSEing *p. */
int
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1 -std=gnu89 -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fno-tree-fre -fdisable-tree-evrp -fdump-tree-vrp1 -std=gnu89 -fno-thread-jumps" } */
foo (int *p)
{
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-fre -fdisable-tree-evrp -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-fre -fdisable-tree-evrp -fno-thread-jumps" } */
/* This is from PR14052. */
@@ -955,7 +955,7 @@ const pass_data pass_data_thread_jumps =
const pass_data pass_data_thread_jumps_full =
{
GIMPLE_PASS,
- "thread-full",
+ "threadfull",
OPTGROUP_NONE,
TV_TREE_SSA_THREAD_JUMPS,
( PROP_cfg | PROP_ssa ),
@@ -1,5 +1,5 @@
/* Autopar with IF conditions. */
-/* { dg-additional-options "-fdisable-tree-thread1 -fdisable-tree-vrp-thread1" } */
+/* { dg-additional-options "-fno-thread-jumps" } */
void abort();
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-fdisable-tree-thread1 -fdisable-tree-vrp-thread1 --param max-stores-to-sink=0" } */
+/* { dg-additional-options "-fno-thread-jumps --param max-stores-to-sink=0" } */
#define N 1500