[2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

Message ID Z3u/1cAM/NVAr8pO@arm.com
State New
Headers
Series vect, cfgloopmanip: Fix profile consistency of early break loops [PR117790] |

Commit Message

Alex Coplan Jan. 6, 2025, 11:34 a.m. UTC
  As it stands, scale_loop_profile doesn't correctly handle loops with
multiple exits.  In particular, in the case where the expected niters
exceeds iteration_bound, scale_loop_profile attempts to reduce the
number of iterations with a call to scale_loop_frequencies, which
multiplies the count of each BB by a given probability.  This
transformation preserves the relationships between the counts of the BBs
within the loop (and thus the edge probabilities stay the same) but this
cannot possibly work for loops with multiple exits, since in order for
the expected niters to reduce (and counts along exit edges to remain the
same), the exit edge probabilities must increase, thus decreasing the
probabilities of the internal edges, meaning that the ratios of the
counts of the BBs inside the loop must change.  So we need a different
approach (not a straightforward multiplicative scaling) to adjust the
expected niters of a loop with multiple exits.

This patch introduces a new helper, flow_scale_loop_freqs, which can be
used to correctly scale the profile of a loop with multiple exits.  It
is parameterized by a probability (with which to scale the header and
therefore the expected niters) and a lambda which gives the desired
counts for the exit edges.  In this patch, to make things simpler,
flow_scale_loop_freqs only handles loop shapes without internal control
flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
whether a given loop meets these criteria.  This restriction is
reasonable since this patch is motivated by fixing the profile
consistency for early break vectorization, and we don't currently
vectorize loops with internal control flow.  We also fall back to a
multiplicative scaling (the status quo) for loops that
flow_scale_loop_freqs can't handle, so the patch should be a net
improvement.

We wrap the call to flow_scale_loop_freqs in a helper
scale_loop_freqs_with_exit_counts which handles the above-mentioned
fallback.  This wrapper is still generic in that it accepts a lambda to
allow overriding the desired exit edge counts.  We specialize this with
another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
counts along exit edges fixed), which is then used to implement the
niters-scaling case of scale_loop_profile, thus fixing this path through
the function for loops with multiple exits.

Finally, we expose two new wrapper functions in cfgloopmanip.h for use
in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
is a variant of scale_loop_profile which assumes we want to keep the
counts along exit edges of the loop fixed through both parts of the
transformation (including the initial probability scale).
scale_loop_freqs_with_new_exit_count is intended to be used in a
subsequent patch when adding a skip edge around the epilog, where the
reduction of count entering the loop is mirrored by a reduced count
along a given exit edge.

Bootstrapped/regtested as a series on aarch64-linux-gnu,
x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

	PR tree-optimization/117790
	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
	(flow_scale_loop_freqs): New.
	(scale_loop_freqs_with_exit_counts): New.
	(scale_loop_freqs_hold_exit_counts): New.
	(scale_loop_profile): Refactor to use the newly-added
	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
	correctly handle reducing the expected niters for loops with multiple
	exits.
	(scale_loop_freqs_with_new_exit_count): New.
	(scale_loop_profile_1): New.
	(scale_loop_profile_hold_exit_counts): New.
	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
	(scale_loop_freqs_with_new_exit_count): New.
---
 gcc/cfgloopmanip.cc | 309 ++++++++++++++++++++++++++++++++++++++++----
 gcc/cfgloopmanip.h  |   7 +
 2 files changed, 294 insertions(+), 22 deletions(-)
  

Comments

Bernhard Reutner-Fischer Jan. 8, 2025, 9:42 a.m. UTC | #1
On 6 January 2025 12:34:45 CET, Alex Coplan <alex.coplan@arm.com> wrote:

nit:
s/caling/calling/

thanks
  
Tamar Christina Jan. 15, 2025, 2:07 p.m. UTC | #2
Ping

> -----Original Message-----
> From: Alex Coplan <Alex.Coplan@arm.com>
> Sent: Monday, January 6, 2025 11:35 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>; Tamar
> Christina <Tamar.Christina@arm.com>
> Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> loops [PR117790]
> 
> As it stands, scale_loop_profile doesn't correctly handle loops with
> multiple exits.  In particular, in the case where the expected niters
> exceeds iteration_bound, scale_loop_profile attempts to reduce the
> number of iterations with a call to scale_loop_frequencies, which
> multiplies the count of each BB by a given probability.  This
> transformation preserves the relationships between the counts of the BBs
> within the loop (and thus the edge probabilities stay the same) but this
> cannot possibly work for loops with multiple exits, since in order for
> the expected niters to reduce (and counts along exit edges to remain the
> same), the exit edge probabilities must increase, thus decreasing the
> probabilities of the internal edges, meaning that the ratios of the
> counts of the BBs inside the loop must change.  So we need a different
> approach (not a straightforward multiplicative scaling) to adjust the
> expected niters of a loop with multiple exits.
> 
> This patch introduces a new helper, flow_scale_loop_freqs, which can be
> used to correctly scale the profile of a loop with multiple exits.  It
> is parameterized by a probability (with which to scale the header and
> therefore the expected niters) and a lambda which gives the desired
> counts for the exit edges.  In this patch, to make things simpler,
> flow_scale_loop_freqs only handles loop shapes without internal control
> flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> whether a given loop meets these criteria.  This restriction is
> reasonable since this patch is motivated by fixing the profile
> consistency for early break vectorization, and we don't currently
> vectorize loops with internal control flow.  We also fall back to a
> multiplicative scaling (the status quo) for loops that
> flow_scale_loop_freqs can't handle, so the patch should be a net
> improvement.
> 
> We wrap the call to flow_scale_loop_freqs in a helper
> scale_loop_freqs_with_exit_counts which handles the above-mentioned
> fallback.  This wrapper is still generic in that it accepts a lambda to
> allow overriding the desired exit edge counts.  We specialize this with
> another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> counts along exit edges fixed), which is then used to implement the
> niters-scaling case of scale_loop_profile, thus fixing this path through
> the function for loops with multiple exits.
> 
> Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> is a variant of scale_loop_profile which assumes we want to keep the
> counts along exit edges of the loop fixed through both parts of the
> transformation (including the initial probability scale).
> scale_loop_freqs_with_new_exit_count is intended to be used in a
> subsequent patch when adding a skip edge around the epilog, where the
> reduction of count entering the loop is mirrored by a reduced count
> along a given exit edge.
> 
> Bootstrapped/regtested as a series on aarch64-linux-gnu,
> x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/117790
> 	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> 	(flow_scale_loop_freqs): New.
> 	(scale_loop_freqs_with_exit_counts): New.
> 	(scale_loop_freqs_hold_exit_counts): New.
> 	(scale_loop_profile): Refactor to use the newly-added
> 	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> 	correctly handle reducing the expected niters for loops with multiple
> 	exits.
> 	(scale_loop_freqs_with_new_exit_count): New.
> 	(scale_loop_profile_1): New.
> 	(scale_loop_profile_hold_exit_counts): New.
> 	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> 	(scale_loop_freqs_with_new_exit_count): New.
> ---
>  gcc/cfgloopmanip.cc | 309 ++++++++++++++++++++++++++++++++++++++++--
> --
>  gcc/cfgloopmanip.h  |   7 +
>  2 files changed, 294 insertions(+), 22 deletions(-)
  
Tamar Christina Jan. 24, 2025, 9:17 a.m. UTC | #3
ping

> -----Original Message-----
> From: Tamar Christina
> Sent: Wednesday, January 15, 2025 2:08 PM
> To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> loops [PR117790]
> 
> Ping
> 
> > -----Original Message-----
> > From: Alex Coplan <Alex.Coplan@arm.com>
> > Sent: Monday, January 6, 2025 11:35 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>; Tamar
> > Christina <Tamar.Christina@arm.com>
> > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > loops [PR117790]
> >
> > As it stands, scale_loop_profile doesn't correctly handle loops with
> > multiple exits.  In particular, in the case where the expected niters
> > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > number of iterations with a call to scale_loop_frequencies, which
> > multiplies the count of each BB by a given probability.  This
> > transformation preserves the relationships between the counts of the BBs
> > within the loop (and thus the edge probabilities stay the same) but this
> > cannot possibly work for loops with multiple exits, since in order for
> > the expected niters to reduce (and counts along exit edges to remain the
> > same), the exit edge probabilities must increase, thus decreasing the
> > probabilities of the internal edges, meaning that the ratios of the
> > counts of the BBs inside the loop must change.  So we need a different
> > approach (not a straightforward multiplicative scaling) to adjust the
> > expected niters of a loop with multiple exits.
> >
> > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > used to correctly scale the profile of a loop with multiple exits.  It
> > is parameterized by a probability (with which to scale the header and
> > therefore the expected niters) and a lambda which gives the desired
> > counts for the exit edges.  In this patch, to make things simpler,
> > flow_scale_loop_freqs only handles loop shapes without internal control
> > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > whether a given loop meets these criteria.  This restriction is
> > reasonable since this patch is motivated by fixing the profile
> > consistency for early break vectorization, and we don't currently
> > vectorize loops with internal control flow.  We also fall back to a
> > multiplicative scaling (the status quo) for loops that
> > flow_scale_loop_freqs can't handle, so the patch should be a net
> > improvement.
> >
> > We wrap the call to flow_scale_loop_freqs in a helper
> > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > fallback.  This wrapper is still generic in that it accepts a lambda to
> > allow overriding the desired exit edge counts.  We specialize this with
> > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > counts along exit edges fixed), which is then used to implement the
> > niters-scaling case of scale_loop_profile, thus fixing this path through
> > the function for loops with multiple exits.
> >
> > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > is a variant of scale_loop_profile which assumes we want to keep the
> > counts along exit edges of the loop fixed through both parts of the
> > transformation (including the initial probability scale).
> > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > subsequent patch when adding a skip edge around the epilog, where the
> > reduction of count entering the loop is mirrored by a reduced count
> > along a given exit edge.
> >
> > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/117790
> > 	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > 	(flow_scale_loop_freqs): New.
> > 	(scale_loop_freqs_with_exit_counts): New.
> > 	(scale_loop_freqs_hold_exit_counts): New.
> > 	(scale_loop_profile): Refactor to use the newly-added
> > 	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > 	correctly handle reducing the expected niters for loops with multiple
> > 	exits.
> > 	(scale_loop_freqs_with_new_exit_count): New.
> > 	(scale_loop_profile_1): New.
> > 	(scale_loop_profile_hold_exit_counts): New.
> > 	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > 	(scale_loop_freqs_with_new_exit_count): New.
> > ---
> >  gcc/cfgloopmanip.cc | 309
> ++++++++++++++++++++++++++++++++++++++++--
> > --
> >  gcc/cfgloopmanip.h  |   7 +
> >  2 files changed, 294 insertions(+), 22 deletions(-)
  
Tamar Christina Feb. 3, 2025, 2:46 p.m. UTC | #4
Ping

> -----Original Message-----
> From: Tamar Christina
> Sent: Friday, January 24, 2025 9:18 AM
> To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> loops [PR117790]
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina
> > Sent: Wednesday, January 15, 2025 2:08 PM
> > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-
> exit
> > loops [PR117790]
> >
> > Ping
> >
> > > -----Original Message-----
> > > From: Alex Coplan <Alex.Coplan@arm.com>
> > > Sent: Monday, January 6, 2025 11:35 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>;
> Tamar
> > > Christina <Tamar.Christina@arm.com>
> > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > loops [PR117790]
> > >
> > > As it stands, scale_loop_profile doesn't correctly handle loops with
> > > multiple exits.  In particular, in the case where the expected niters
> > > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > > number of iterations with a call to scale_loop_frequencies, which
> > > multiplies the count of each BB by a given probability.  This
> > > transformation preserves the relationships between the counts of the BBs
> > > within the loop (and thus the edge probabilities stay the same) but this
> > > cannot possibly work for loops with multiple exits, since in order for
> > > the expected niters to reduce (and counts along exit edges to remain the
> > > same), the exit edge probabilities must increase, thus decreasing the
> > > probabilities of the internal edges, meaning that the ratios of the
> > > counts of the BBs inside the loop must change.  So we need a different
> > > approach (not a straightforward multiplicative scaling) to adjust the
> > > expected niters of a loop with multiple exits.
> > >
> > > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > > used to correctly scale the profile of a loop with multiple exits.  It
> > > is parameterized by a probability (with which to scale the header and
> > > therefore the expected niters) and a lambda which gives the desired
> > > counts for the exit edges.  In this patch, to make things simpler,
> > > flow_scale_loop_freqs only handles loop shapes without internal control
> > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > > whether a given loop meets these criteria.  This restriction is
> > > reasonable since this patch is motivated by fixing the profile
> > > consistency for early break vectorization, and we don't currently
> > > vectorize loops with internal control flow.  We also fall back to a
> > > multiplicative scaling (the status quo) for loops that
> > > flow_scale_loop_freqs can't handle, so the patch should be a net
> > > improvement.
> > >
> > > We wrap the call to flow_scale_loop_freqs in a helper
> > > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > > fallback.  This wrapper is still generic in that it accepts a lambda to
> > > allow overriding the desired exit edge counts.  We specialize this with
> > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > > counts along exit edges fixed), which is then used to implement the
> > > niters-scaling case of scale_loop_profile, thus fixing this path through
> > > the function for loops with multiple exits.
> > >
> > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > > is a variant of scale_loop_profile which assumes we want to keep the
> > > counts along exit edges of the loop fixed through both parts of the
> > > transformation (including the initial probability scale).
> > > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > > subsequent patch when adding a skip edge around the epilog, where the
> > > reduction of count entering the loop is mirrored by a reduced count
> > > along a given exit edge.
> > >
> > > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > > 	PR tree-optimization/117790
> > > 	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > > 	(flow_scale_loop_freqs): New.
> > > 	(scale_loop_freqs_with_exit_counts): New.
> > > 	(scale_loop_freqs_hold_exit_counts): New.
> > > 	(scale_loop_profile): Refactor to use the newly-added
> > > 	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > > 	correctly handle reducing the expected niters for loops with multiple
> > > 	exits.
> > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > 	(scale_loop_profile_1): New.
> > > 	(scale_loop_profile_hold_exit_counts): New.
> > > 	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > ---
> > >  gcc/cfgloopmanip.cc | 309
> > ++++++++++++++++++++++++++++++++++++++++--
> > > --
> > >  gcc/cfgloopmanip.h  |   7 +
> > >  2 files changed, 294 insertions(+), 22 deletions(-)
  
Alex Coplan Feb. 12, 2025, 11:20 a.m. UTC | #5
Ping

On 03/02/2025 14:46, Tamar Christina wrote:
> Ping
> 
> > -----Original Message-----
> > From: Tamar Christina
> > Sent: Friday, January 24, 2025 9:18 AM
> > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > loops [PR117790]
> > 
> > ping
> > 
> > > -----Original Message-----
> > > From: Tamar Christina
> > > Sent: Wednesday, January 15, 2025 2:08 PM
> > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-
> > exit
> > > loops [PR117790]
> > >
> > > Ping
> > >
> > > > -----Original Message-----
> > > > From: Alex Coplan <Alex.Coplan@arm.com>
> > > > Sent: Monday, January 6, 2025 11:35 AM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>;
> > Tamar
> > > > Christina <Tamar.Christina@arm.com>
> > > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > > loops [PR117790]
> > > >
> > > > As it stands, scale_loop_profile doesn't correctly handle loops with
> > > > multiple exits.  In particular, in the case where the expected niters
> > > > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > > > number of iterations with a call to scale_loop_frequencies, which
> > > > multiplies the count of each BB by a given probability.  This
> > > > transformation preserves the relationships between the counts of the BBs
> > > > within the loop (and thus the edge probabilities stay the same) but this
> > > > cannot possibly work for loops with multiple exits, since in order for
> > > > the expected niters to reduce (and counts along exit edges to remain the
> > > > same), the exit edge probabilities must increase, thus decreasing the
> > > > probabilities of the internal edges, meaning that the ratios of the
> > > > counts of the BBs inside the loop must change.  So we need a different
> > > > approach (not a straightforward multiplicative scaling) to adjust the
> > > > expected niters of a loop with multiple exits.
> > > >
> > > > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > > > used to correctly scale the profile of a loop with multiple exits.  It
> > > > is parameterized by a probability (with which to scale the header and
> > > > therefore the expected niters) and a lambda which gives the desired
> > > > counts for the exit edges.  In this patch, to make things simpler,
> > > > flow_scale_loop_freqs only handles loop shapes without internal control
> > > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > > > whether a given loop meets these criteria.  This restriction is
> > > > reasonable since this patch is motivated by fixing the profile
> > > > consistency for early break vectorization, and we don't currently
> > > > vectorize loops with internal control flow.  We also fall back to a
> > > > multiplicative scaling (the status quo) for loops that
> > > > flow_scale_loop_freqs can't handle, so the patch should be a net
> > > > improvement.
> > > >
> > > > We wrap the call to flow_scale_loop_freqs in a helper
> > > > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > > > fallback.  This wrapper is still generic in that it accepts a lambda to
> > > > allow overriding the desired exit edge counts.  We specialize this with
> > > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > > > counts along exit edges fixed), which is then used to implement the
> > > > niters-scaling case of scale_loop_profile, thus fixing this path through
> > > > the function for loops with multiple exits.
> > > >
> > > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > > > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > > > is a variant of scale_loop_profile which assumes we want to keep the
> > > > counts along exit edges of the loop fixed through both parts of the
> > > > transformation (including the initial probability scale).
> > > > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > > > subsequent patch when adding a skip edge around the epilog, where the
> > > > reduction of count entering the loop is mirrored by a reduced count
> > > > along a given exit edge.
> > > >
> > > > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > > > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	PR tree-optimization/117790
> > > > 	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > > > 	(flow_scale_loop_freqs): New.
> > > > 	(scale_loop_freqs_with_exit_counts): New.
> > > > 	(scale_loop_freqs_hold_exit_counts): New.
> > > > 	(scale_loop_profile): Refactor to use the newly-added
> > > > 	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > > > 	correctly handle reducing the expected niters for loops with multiple
> > > > 	exits.
> > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > 	(scale_loop_profile_1): New.
> > > > 	(scale_loop_profile_hold_exit_counts): New.
> > > > 	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > ---
> > > >  gcc/cfgloopmanip.cc | 309
> > > ++++++++++++++++++++++++++++++++++++++++--
> > > > --
> > > >  gcc/cfgloopmanip.h  |   7 +
> > > >  2 files changed, 294 insertions(+), 22 deletions(-)
>
  
Alex Coplan Feb. 19, 2025, 12:15 p.m. UTC | #6
Ping^5 for patches 2-4:
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672677.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672678.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672679.html

On 12/02/2025 11:20, Alex Coplan wrote:
> Ping
> 
> On 03/02/2025 14:46, Tamar Christina wrote:
> > Ping
> > 
> > > -----Original Message-----
> > > From: Tamar Christina
> > > Sent: Friday, January 24, 2025 9:18 AM
> > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > loops [PR117790]
> > > 
> > > ping
> > > 
> > > > -----Original Message-----
> > > > From: Tamar Christina
> > > > Sent: Wednesday, January 15, 2025 2:08 PM
> > > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-
> > > exit
> > > > loops [PR117790]
> > > >
> > > > Ping
> > > >
> > > > > -----Original Message-----
> > > > > From: Alex Coplan <Alex.Coplan@arm.com>
> > > > > Sent: Monday, January 6, 2025 11:35 AM
> > > > > To: gcc-patches@gcc.gnu.org
> > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>;
> > > Tamar
> > > > > Christina <Tamar.Christina@arm.com>
> > > > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > > > loops [PR117790]
> > > > >
> > > > > As it stands, scale_loop_profile doesn't correctly handle loops with
> > > > > multiple exits.  In particular, in the case where the expected niters
> > > > > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > > > > number of iterations with a call to scale_loop_frequencies, which
> > > > > multiplies the count of each BB by a given probability.  This
> > > > > transformation preserves the relationships between the counts of the BBs
> > > > > within the loop (and thus the edge probabilities stay the same) but this
> > > > > cannot possibly work for loops with multiple exits, since in order for
> > > > > the expected niters to reduce (and counts along exit edges to remain the
> > > > > same), the exit edge probabilities must increase, thus decreasing the
> > > > > probabilities of the internal edges, meaning that the ratios of the
> > > > > counts of the BBs inside the loop must change.  So we need a different
> > > > > approach (not a straightforward multiplicative scaling) to adjust the
> > > > > expected niters of a loop with multiple exits.
> > > > >
> > > > > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > > > > used to correctly scale the profile of a loop with multiple exits.  It
> > > > > is parameterized by a probability (with which to scale the header and
> > > > > therefore the expected niters) and a lambda which gives the desired
> > > > > counts for the exit edges.  In this patch, to make things simpler,
> > > > > flow_scale_loop_freqs only handles loop shapes without internal control
> > > > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > > > > whether a given loop meets these criteria.  This restriction is
> > > > > reasonable since this patch is motivated by fixing the profile
> > > > > consistency for early break vectorization, and we don't currently
> > > > > vectorize loops with internal control flow.  We also fall back to a
> > > > > multiplicative scaling (the status quo) for loops that
> > > > > flow_scale_loop_freqs can't handle, so the patch should be a net
> > > > > improvement.
> > > > >
> > > > > We wrap the call to flow_scale_loop_freqs in a helper
> > > > > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > > > > fallback.  This wrapper is still generic in that it accepts a lambda to
> > > > > allow overriding the desired exit edge counts.  We specialize this with
> > > > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > > > > counts along exit edges fixed), which is then used to implement the
> > > > > niters-scaling case of scale_loop_profile, thus fixing this path through
> > > > > the function for loops with multiple exits.
> > > > >
> > > > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > > > > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > > > > is a variant of scale_loop_profile which assumes we want to keep the
> > > > > counts along exit edges of the loop fixed through both parts of the
> > > > > transformation (including the initial probability scale).
> > > > > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > > > > subsequent patch when adding a skip edge around the epilog, where the
> > > > > reduction of count entering the loop is mirrored by a reduced count
> > > > > along a given exit edge.
> > > > >
> > > > > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > > > > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	PR tree-optimization/117790
> > > > > 	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > > > > 	(flow_scale_loop_freqs): New.
> > > > > 	(scale_loop_freqs_with_exit_counts): New.
> > > > > 	(scale_loop_freqs_hold_exit_counts): New.
> > > > > 	(scale_loop_profile): Refactor to use the newly-added
> > > > > 	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > > > > 	correctly handle reducing the expected niters for loops with multiple
> > > > > 	exits.
> > > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > > 	(scale_loop_profile_1): New.
> > > > > 	(scale_loop_profile_hold_exit_counts): New.
> > > > > 	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > > ---
> > > > >  gcc/cfgloopmanip.cc | 309
> > > > ++++++++++++++++++++++++++++++++++++++++--
> > > > > --
> > > > >  gcc/cfgloopmanip.h  |   7 +
> > > > >  2 files changed, 294 insertions(+), 22 deletions(-)
> >
  
Alex Coplan March 6, 2025, 10:48 a.m. UTC | #7
Ping^6

On 19/02/2025 12:15, Alex Coplan wrote:
> Ping^5 for patches 2-4:
> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672677.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672678.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672679.html
> 
> On 12/02/2025 11:20, Alex Coplan wrote:
> > Ping
> > 
> > On 03/02/2025 14:46, Tamar Christina wrote:
> > > Ping
> > > 
> > > > -----Original Message-----
> > > > From: Tamar Christina
> > > > Sent: Friday, January 24, 2025 9:18 AM
> > > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > > loops [PR117790]
> > > > 
> > > > ping
> > > > 
> > > > > -----Original Message-----
> > > > > From: Tamar Christina
> > > > > Sent: Wednesday, January 15, 2025 2:08 PM
> > > > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-
> > > > exit
> > > > > loops [PR117790]
> > > > >
> > > > > Ping
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Alex Coplan <Alex.Coplan@arm.com>
> > > > > > Sent: Monday, January 6, 2025 11:35 AM
> > > > > > To: gcc-patches@gcc.gnu.org
> > > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>;
> > > > Tamar
> > > > > > Christina <Tamar.Christina@arm.com>
> > > > > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > > > > loops [PR117790]
> > > > > >
> > > > > > As it stands, scale_loop_profile doesn't correctly handle loops with
> > > > > > multiple exits.  In particular, in the case where the expected niters
> > > > > > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > > > > > number of iterations with a call to scale_loop_frequencies, which
> > > > > > multiplies the count of each BB by a given probability.  This
> > > > > > transformation preserves the relationships between the counts of the BBs
> > > > > > within the loop (and thus the edge probabilities stay the same) but this
> > > > > > cannot possibly work for loops with multiple exits, since in order for
> > > > > > the expected niters to reduce (and counts along exit edges to remain the
> > > > > > same), the exit edge probabilities must increase, thus decreasing the
> > > > > > probabilities of the internal edges, meaning that the ratios of the
> > > > > > counts of the BBs inside the loop must change.  So we need a different
> > > > > > approach (not a straightforward multiplicative scaling) to adjust the
> > > > > > expected niters of a loop with multiple exits.
> > > > > >
> > > > > > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > > > > > used to correctly scale the profile of a loop with multiple exits.  It
> > > > > > is parameterized by a probability (with which to scale the header and
> > > > > > therefore the expected niters) and a lambda which gives the desired
> > > > > > counts for the exit edges.  In this patch, to make things simpler,
> > > > > > flow_scale_loop_freqs only handles loop shapes without internal control
> > > > > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > > > > > whether a given loop meets these criteria.  This restriction is
> > > > > > reasonable since this patch is motivated by fixing the profile
> > > > > > consistency for early break vectorization, and we don't currently
> > > > > > vectorize loops with internal control flow.  We also fall back to a
> > > > > > multiplicative scaling (the status quo) for loops that
> > > > > > flow_scale_loop_freqs can't handle, so the patch should be a net
> > > > > > improvement.
> > > > > >
> > > > > > We wrap the call to flow_scale_loop_freqs in a helper
> > > > > > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > > > > > fallback.  This wrapper is still generic in that it accepts a lambda to
> > > > > > allow overriding the desired exit edge counts.  We specialize this with
> > > > > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > > > > > counts along exit edges fixed), which is then used to implement the
> > > > > > niters-scaling case of scale_loop_profile, thus fixing this path through
> > > > > > the function for loops with multiple exits.
> > > > > >
> > > > > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > > > > > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > > > > > is a variant of scale_loop_profile which assumes we want to keep the
> > > > > > counts along exit edges of the loop fixed through both parts of the
> > > > > > transformation (including the initial probability scale).
> > > > > > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > > > > > subsequent patch when adding a skip edge around the epilog, where the
> > > > > > reduction of count entering the loop is mirrored by a reduced count
> > > > > > along a given exit edge.
> > > > > >
> > > > > > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > > > > > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> > > > > >
> > > > > > Thanks,
> > > > > > Alex
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 	PR tree-optimization/117790
> > > > > > 	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > > > > > 	(flow_scale_loop_freqs): New.
> > > > > > 	(scale_loop_freqs_with_exit_counts): New.
> > > > > > 	(scale_loop_freqs_hold_exit_counts): New.
> > > > > > 	(scale_loop_profile): Refactor to use the newly-added
> > > > > > 	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > > > > > 	correctly handle reducing the expected niters for loops with multiple
> > > > > > 	exits.
> > > > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > > > 	(scale_loop_profile_1): New.
> > > > > > 	(scale_loop_profile_hold_exit_counts): New.
> > > > > > 	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > > > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > > > ---
> > > > > >  gcc/cfgloopmanip.cc | 309
> > > > > ++++++++++++++++++++++++++++++++++++++++--
> > > > > > --
> > > > > >  gcc/cfgloopmanip.h  |   7 +
> > > > > >  2 files changed, 294 insertions(+), 22 deletions(-)
> > >
  
Alex Coplan March 21, 2025, 10:04 a.m. UTC | #8
Ping^7

On 06/03/2025 10:48, Alex Coplan wrote:
> Ping^6
> 
> On 19/02/2025 12:15, Alex Coplan wrote:
> > Ping^5 for patches 2-4:
> > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672677.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672678.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672679.html
> > 
> > On 12/02/2025 11:20, Alex Coplan wrote:
> > > Ping
> > > 
> > > On 03/02/2025 14:46, Tamar Christina wrote:
> > > > Ping
> > > > 
> > > > > -----Original Message-----
> > > > > From: Tamar Christina
> > > > > Sent: Friday, January 24, 2025 9:18 AM
> > > > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > > > loops [PR117790]
> > > > > 
> > > > > ping
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Tamar Christina
> > > > > > Sent: Wednesday, January 15, 2025 2:08 PM
> > > > > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > > > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-
> > > > > exit
> > > > > > loops [PR117790]
> > > > > >
> > > > > > Ping
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Alex Coplan <Alex.Coplan@arm.com>
> > > > > > > Sent: Monday, January 6, 2025 11:35 AM
> > > > > > > To: gcc-patches@gcc.gnu.org
> > > > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>;
> > > > > Tamar
> > > > > > > Christina <Tamar.Christina@arm.com>
> > > > > > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > > > > > loops [PR117790]
> > > > > > >
> > > > > > > As it stands, scale_loop_profile doesn't correctly handle loops with
> > > > > > > multiple exits.  In particular, in the case where the expected niters
> > > > > > > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > > > > > > number of iterations with a call to scale_loop_frequencies, which
> > > > > > > multiplies the count of each BB by a given probability.  This
> > > > > > > transformation preserves the relationships between the counts of the BBs
> > > > > > > within the loop (and thus the edge probabilities stay the same) but this
> > > > > > > cannot possibly work for loops with multiple exits, since in order for
> > > > > > > the expected niters to reduce (and counts along exit edges to remain the
> > > > > > > same), the exit edge probabilities must increase, thus decreasing the
> > > > > > > probabilities of the internal edges, meaning that the ratios of the
> > > > > > > counts of the BBs inside the loop must change.  So we need a different
> > > > > > > approach (not a straightforward multiplicative scaling) to adjust the
> > > > > > > expected niters of a loop with multiple exits.
> > > > > > >
> > > > > > > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > > > > > > used to correctly scale the profile of a loop with multiple exits.  It
> > > > > > > is parameterized by a probability (with which to scale the header and
> > > > > > > therefore the expected niters) and a lambda which gives the desired
> > > > > > > counts for the exit edges.  In this patch, to make things simpler,
> > > > > > > flow_scale_loop_freqs only handles loop shapes without internal control
> > > > > > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > > > > > > whether a given loop meets these criteria.  This restriction is
> > > > > > > reasonable since this patch is motivated by fixing the profile
> > > > > > > consistency for early break vectorization, and we don't currently
> > > > > > > vectorize loops with internal control flow.  We also fall back to a
> > > > > > > multiplicative scaling (the status quo) for loops that
> > > > > > > flow_scale_loop_freqs can't handle, so the patch should be a net
> > > > > > > improvement.
> > > > > > >
> > > > > > > We wrap the call to flow_scale_loop_freqs in a helper
> > > > > > > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > > > > > > fallback.  This wrapper is still generic in that it accepts a lambda to
> > > > > > > allow overriding the desired exit edge counts.  We specialize this with
> > > > > > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > > > > > > counts along exit edges fixed), which is then used to implement the
> > > > > > > niters-scaling case of scale_loop_profile, thus fixing this path through
> > > > > > > the function for loops with multiple exits.
> > > > > > >
> > > > > > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > > > > > > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > > > > > > is a variant of scale_loop_profile which assumes we want to keep the
> > > > > > > counts along exit edges of the loop fixed through both parts of the
> > > > > > > transformation (including the initial probability scale).
> > > > > > > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > > > > > > subsequent patch when adding a skip edge around the epilog, where the
> > > > > > > reduction of count entering the loop is mirrored by a reduced count
> > > > > > > along a given exit edge.
> > > > > > >
> > > > > > > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > > > > > > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Alex
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > 	PR tree-optimization/117790
> > > > > > > 	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > > > > > > 	(flow_scale_loop_freqs): New.
> > > > > > > 	(scale_loop_freqs_with_exit_counts): New.
> > > > > > > 	(scale_loop_freqs_hold_exit_counts): New.
> > > > > > > 	(scale_loop_profile): Refactor to use the newly-added
> > > > > > > 	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > > > > > > 	correctly handle reducing the expected niters for loops with multiple
> > > > > > > 	exits.
> > > > > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > > > > 	(scale_loop_profile_1): New.
> > > > > > > 	(scale_loop_profile_hold_exit_counts): New.
> > > > > > > 	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > > > > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > > > > ---
> > > > > > >  gcc/cfgloopmanip.cc | 309
> > > > > > ++++++++++++++++++++++++++++++++++++++++--
> > > > > > > --
> > > > > > >  gcc/cfgloopmanip.h  |   7 +
> > > > > > >  2 files changed, 294 insertions(+), 22 deletions(-)
> > > >
  
Alex Coplan April 14, 2025, 1:26 p.m. UTC | #9
Ping^8 for patches 2-4:
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672677.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672678.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672679.html

On 21/03/2025 10:04, Alex Coplan wrote:
> Ping^7
> 
> On 06/03/2025 10:48, Alex Coplan wrote:
> > Ping^6
> > 
> > On 19/02/2025 12:15, Alex Coplan wrote:
> > > Ping^5 for patches 2-4:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672677.html
> > > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672678.html
> > > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/672679.html
> > > 
> > > On 12/02/2025 11:20, Alex Coplan wrote:
> > > > Ping
> > > > 
> > > > On 03/02/2025 14:46, Tamar Christina wrote:
> > > > > Ping
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Tamar Christina
> > > > > > Sent: Friday, January 24, 2025 9:18 AM
> > > > > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > > > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > > > > loops [PR117790]
> > > > > > 
> > > > > > ping
> > > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: Tamar Christina
> > > > > > > Sent: Wednesday, January 15, 2025 2:08 PM
> > > > > > > To: Alex Coplan <Alex.Coplan@arm.com>; gcc-patches@gcc.gnu.org
> > > > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>
> > > > > > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-
> > > > > > exit
> > > > > > > loops [PR117790]
> > > > > > >
> > > > > > > Ping
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Alex Coplan <Alex.Coplan@arm.com>
> > > > > > > > Sent: Monday, January 6, 2025 11:35 AM
> > > > > > > > To: gcc-patches@gcc.gnu.org
> > > > > > > > Cc: Richard Biener <rguenther@suse.de>; Jan Hubicka <hubicka@ucw.cz>;
> > > > > > Tamar
> > > > > > > > Christina <Tamar.Christina@arm.com>
> > > > > > > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit
> > > > > > > > loops [PR117790]
> > > > > > > >
> > > > > > > > As it stands, scale_loop_profile doesn't correctly handle loops with
> > > > > > > > multiple exits.  In particular, in the case where the expected niters
> > > > > > > > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > > > > > > > number of iterations with a call to scale_loop_frequencies, which
> > > > > > > > multiplies the count of each BB by a given probability.  This
> > > > > > > > transformation preserves the relationships between the counts of the BBs
> > > > > > > > within the loop (and thus the edge probabilities stay the same) but this
> > > > > > > > cannot possibly work for loops with multiple exits, since in order for
> > > > > > > > the expected niters to reduce (and counts along exit edges to remain the
> > > > > > > > same), the exit edge probabilities must increase, thus decreasing the
> > > > > > > > probabilities of the internal edges, meaning that the ratios of the
> > > > > > > > counts of the BBs inside the loop must change.  So we need a different
> > > > > > > > approach (not a straightforward multiplicative scaling) to adjust the
> > > > > > > > expected niters of a loop with multiple exits.
> > > > > > > >
> > > > > > > > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > > > > > > > used to correctly scale the profile of a loop with multiple exits.  It
> > > > > > > > is parameterized by a probability (with which to scale the header and
> > > > > > > > therefore the expected niters) and a lambda which gives the desired
> > > > > > > > counts for the exit edges.  In this patch, to make things simpler,
> > > > > > > > flow_scale_loop_freqs only handles loop shapes without internal control
> > > > > > > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > > > > > > > whether a given loop meets these criteria.  This restriction is
> > > > > > > > reasonable since this patch is motivated by fixing the profile
> > > > > > > > consistency for early break vectorization, and we don't currently
> > > > > > > > vectorize loops with internal control flow.  We also fall back to a
> > > > > > > > multiplicative scaling (the status quo) for loops that
> > > > > > > > flow_scale_loop_freqs can't handle, so the patch should be a net
> > > > > > > > improvement.
> > > > > > > >
> > > > > > > > We wrap the call to flow_scale_loop_freqs in a helper
> > > > > > > > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > > > > > > > fallback.  This wrapper is still generic in that it accepts a lambda to
> > > > > > > > allow overriding the desired exit edge counts.  We specialize this with
> > > > > > > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > > > > > > > counts along exit edges fixed), which is then used to implement the
> > > > > > > > niters-scaling case of scale_loop_profile, thus fixing this path through
> > > > > > > > the function for loops with multiple exits.
> > > > > > > >
> > > > > > > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > > > > > > > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > > > > > > > is a variant of scale_loop_profile which assumes we want to keep the
> > > > > > > > counts along exit edges of the loop fixed through both parts of the
> > > > > > > > transformation (including the initial probability scale).
> > > > > > > > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > > > > > > > subsequent patch when adding a skip edge around the epilog, where the
> > > > > > > > reduction of count entering the loop is mirrored by a reduced count
> > > > > > > > along a given exit edge.
> > > > > > > >
> > > > > > > > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > > > > > > > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Alex
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > 	PR tree-optimization/117790
> > > > > > > > 	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > > > > > > > 	(flow_scale_loop_freqs): New.
> > > > > > > > 	(scale_loop_freqs_with_exit_counts): New.
> > > > > > > > 	(scale_loop_freqs_hold_exit_counts): New.
> > > > > > > > 	(scale_loop_profile): Refactor to use the newly-added
> > > > > > > > 	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > > > > > > > 	correctly handle reducing the expected niters for loops with multiple
> > > > > > > > 	exits.
> > > > > > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > > > > > 	(scale_loop_profile_1): New.
> > > > > > > > 	(scale_loop_profile_hold_exit_counts): New.
> > > > > > > > 	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > > > > > > > 	(scale_loop_freqs_with_new_exit_count): New.
> > > > > > > > ---
> > > > > > > >  gcc/cfgloopmanip.cc | 309
> > > > > > > ++++++++++++++++++++++++++++++++++++++++--
> > > > > > > > --
> > > > > > > >  gcc/cfgloopmanip.h  |   7 +
> > > > > > > >  2 files changed, 294 insertions(+), 22 deletions(-)
> > > > >
  
Jan Hubicka April 15, 2025, 5:26 p.m. UTC | #10
Hi,
> gcc/ChangeLog:
> 
>	PR tree-optimization/117790
>	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
>	(flow_scale_loop_freqs): New.
>	(scale_loop_freqs_with_exit_counts): New.
>	(scale_loop_freqs_hold_exit_counts): New.
>	(scale_loop_profile): Refactor to use the newly-added
>	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
>	correctly handle reducing the expected niters for loops with multiple
>	exits.
>	(scale_loop_freqs_with_new_exit_count): New.
>	(scale_loop_profile_1): New.
>	(scale_loop_profile_hold_exit_counts): New.
>	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
>	(scale_loop_freqs_with_new_exit_count): New.
+template<typename ExitCountFn>
+static bool
+can_flow_scale_loop_freqs_p (class loop *loop,
+			     ExitCountFn get_exit_count)
+{
+  basic_block bb = loop->header;
+
+  const profile_count count_in = loop_count_in (loop);
+  profile_count exit_count = profile_count::zero ();
+
+  while (bb != loop->latch)
+    {
+      /* Punt if any of the BB counts are uninitialized.  */
+      if (!bb->count.initialized_p ())
+	return false;
+
+      bool found_exit = false;
+      edge internal_edge = nullptr;
+      for (auto e : bb->succs)
+	if (flow_bb_inside_loop_p (loop, e->dest))
+	  {
+	    if (internal_edge)
+	      return false;
+	    internal_edge = e;

This assumes that there are at most 2 edges out which is not always the
case (i.e. for EH and switch).  I suppose vectorizer never calls it
there but probably you want to test that there are precisely two edges
in can_flow_scale_loop_freqs and if not drop message to dump file, so in
case we encounter such loops we notice.

+	  }
+	else
+	  {
+	    if (found_exit)
+	      return false;
+	    found_exit = true;
+	    exit_count += get_exit_count (e);
+	  }
+
+      bb = internal_edge->dest;
+    }
+
+  /* Punt if any exit edge had an uninitialized count.  */
+  if (!exit_count.initialized_p ())
+    return false;
You already early rturn once you hit bb with uninitialized count, so
perhaps you can move this check just after call of get_exit_count?

+      const profile_count new_exit_count = get_exit_count (exit_edge);
+      profile_probability new_exit_prob;
+      if (new_block_count.nonzero_p ())
+	new_exit_prob = new_exit_count.probability_in (new_block_count);

If new_exit_count > new_block_count probability_in will return 1.  I
guess there is not much to do, but pehraps logging inconsistency into
dump is not a bad idea here.

+      else
+	{
+	  /* NEW_BLOCK_COUNT is zero, so the only way we can make the profile
+	     consistent is if NEW_EXIT_COUNT is zero too.  */
+	  if (dump_file && new_exit_count.nonzero_p ())
+	    fprintf (dump_file,
+		     ";; flow_scale_loop_freqs wants non-zero exit count "
+		     "but bb count is zero/uninit: profile is inconsistent\n");
+
+	  /* Arbitrarily set the exit probability to 0.  */
+	  new_exit_prob = profile_probability::never ();

never is kind of strong hint to optimize the other patch (it has
RELIABLE reliability). Since we have no info I would just keep the
probability which was there before.
+	}

Patch is OK with these changes.

Honza
  
Jan Hubicka April 15, 2025, 5:35 p.m. UTC | #11
> Hi,
> > gcc/ChangeLog:
> > 
> >	PR tree-optimization/117790
> >	* cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> >	(flow_scale_loop_freqs): New.
> >	(scale_loop_freqs_with_exit_counts): New.
> >	(scale_loop_freqs_hold_exit_counts): New.
> >	(scale_loop_profile): Refactor to use the newly-added
> >	scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> >	correctly handle reducing the expected niters for loops with multiple
> >	exits.
> >	(scale_loop_freqs_with_new_exit_count): New.
> >	(scale_loop_profile_1): New.
> >	(scale_loop_profile_hold_exit_counts): New.
> >	* cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> >	(scale_loop_freqs_with_new_exit_count): New.
> +template<typename ExitCountFn>
> +static bool
> +can_flow_scale_loop_freqs_p (class loop *loop,
> +			     ExitCountFn get_exit_count)
> +{
> +  basic_block bb = loop->header;
> +
> +  const profile_count count_in = loop_count_in (loop);
> +  profile_count exit_count = profile_count::zero ();
> +
> +  while (bb != loop->latch)
> +    {
> +      /* Punt if any of the BB counts are uninitialized.  */
> +      if (!bb->count.initialized_p ())
> +	return false;
> +
> +      bool found_exit = false;
> +      edge internal_edge = nullptr;
> +      for (auto e : bb->succs)
> +	if (flow_bb_inside_loop_p (loop, e->dest))
> +	  {
> +	    if (internal_edge)
> +	      return false;
> +	    internal_edge = e;
> 
> This assumes that there are at most 2 edges out which is not always the
> case (i.e. for EH and switch).  I suppose vectorizer never calls it
> there but probably you want to test that there are precisely two edges
> in can_flow_scale_loop_freqs and if not drop message to dump file, so in
> case we encounter such loops we notice.

Also forgot to write. Ohter interesting case is when loop has inner
loop.  In this case there will be 2 edges but both of them internal.
It is also possible that the exit of loop sits inside inner loop.

Honza
  
Jan Hubicka April 15, 2025, 5:48 p.m. UTC | #12
Hi,
> gcc/ChangeLog:
> 
>	PR tree-optimization/117790
>	* tree-vect-loop.cc (scale_profile_for_vect_loop): Use
>	scale_loop_profile_hold_exit_counts instead of scale_loop_profile.  Drop
>	the exit edge parameter, since the code now handles multiple exits.
>	Adjust the caller ...
>	(vect_transform_loop): ... here.
>
>gcc/testsuite/ChangeLog:
>
>	PR tree-optimization/117790
>	* gcc.dg/vect/vect-early-break-profile-2.c: New test.
>
>
>-  if (entry_count.nonzero_p ())
>-    set_edge_probability_and_rescale_others
>-	    (exit_e,
>-	     entry_count.probability_in (loop->header->count / vf));
>-  /* Avoid producing very large exit probability when we do not have
>-     sensible profile.  */
>-  else if (exit_e->probability < profile_probability::always () / (vf * 2))

This is handling relatively common case wehre we decide to vectorize
loop with, say, factor of 32 and have no profile-feedback.
In this case if the loop trip count is unknown at early loop, we will
esitmate it to iterate few times (approx 3-5 as that is average
iteration count of random loop based on some measurements).

The fact that we want to vecotirze by factor 32 implies that vectorizer
does not take this info seriously and its heuristics thinks better.
In this case we do not wan to drop the loop to 0 iteraitons as that
would result of poor code layout and regalloc.

I don't think you kept this logic in the new code?

Honza
>-    set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
>-  loop->latch->count = single_pred_edge (loop->latch)->count ();
>-
>-  scale_loop_profile (loop, profile_probability::always () / vf,
>-		      get_likely_max_loop_iterations_int (loop));
>+  const auto likely_max_niters = get_likely_max_loop_iterations_int (loop);
>+  scale_loop_profile_hold_exit_counts (loop,
>+				       profile_probability::always () / vf,
>+				       likely_max_niters);
  
Alex Coplan April 28, 2025, 3:33 p.m. UTC | #13
On 15/04/2025 19:48, Jan Hubicka wrote:
> Hi,
> > gcc/ChangeLog:
> > 
> >	PR tree-optimization/117790
> >	* tree-vect-loop.cc (scale_profile_for_vect_loop): Use
> >	scale_loop_profile_hold_exit_counts instead of scale_loop_profile.  Drop
> >	the exit edge parameter, since the code now handles multiple exits.
> >	Adjust the caller ...
> >	(vect_transform_loop): ... here.
> >
> >gcc/testsuite/ChangeLog:
> >
> >	PR tree-optimization/117790
> >	* gcc.dg/vect/vect-early-break-profile-2.c: New test.
> >
> >
> >-  if (entry_count.nonzero_p ())
> >-    set_edge_probability_and_rescale_others
> >-	    (exit_e,
> >-	     entry_count.probability_in (loop->header->count / vf));
> >-  /* Avoid producing very large exit probability when we do not have
> >-     sensible profile.  */
> >-  else if (exit_e->probability < profile_probability::always () / (vf * 2))
> 
> This is handling relatively common case wehre we decide to vectorize
> loop with, say, factor of 32 and have no profile-feedback.
> In this case if the loop trip count is unknown at early loop, we will
> esitmate it to iterate few times (approx 3-5 as that is average
> iteration count of random loop based on some measurements).
> 
> The fact that we want to vecotirze by factor 32 implies that vectorizer
> does not take this info seriously and its heuristics thinks better.
> In this case we do not wan to drop the loop to 0 iteraitons as that
> would result of poor code layout and regalloc.
> 
> I don't think you kept this logic in the new code?

To be honest, I didn't really follow the logic here.  Thinking about the
single-exit case (which the current code is designed to handle), both
the body of the if and the else if logically implement the same update:

  P'(exit) = vf * P(exit)

I suppose the code in the body of the if is just a more numerically stable way
of doing this?

Also, AFAICT, the condition of the else if is trying to make sure that:

  vf * P(exit) < 1/2
  i.e.
  P'(exit) < 1/2

so it ensures we don't inflate the exit probably above 1/2 (equivalently
we don't reduce the expected niters below 2).  But I don't understand
why this guard isn't applied to the whole if-else chain.  Why should we
allow inflating the exit probability above 1/2 if entry_count.nonzero_p (),
but otherwise not?

I dropped this logic in the 4/4 patch since it seemed
inconsistent, and in general the updating of exit probabilities
is already handled by scale_loop_profile_hold_exit_counts.

Can you explain the intent behind the current logic, so I know what
behaviour we want to preserve?  A concrete testcase would be useful,
too.  FWIW, I tried doing a testsuite run on aarch64 with the following patch:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3de1ea646b4..71787301e6d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12092,6 +12092,9 @@ scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned vf, bool fl
      sensible profile.  */
   else if (exit_e->probability < profile_probability::always () / (vf * 2))
     set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
+  else
+    gcc_unreachable ();
+

on top of current trunk, and the unreachable wasn't hit once.

Thanks,
Alex

> 
> Honza
> >-    set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
> >-  loop->latch->count = single_pred_edge (loop->latch)->count ();
> >-
> >-  scale_loop_profile (loop, profile_probability::always () / vf,
> >-		      get_likely_max_loop_iterations_int (loop));
> >+  const auto likely_max_niters = get_likely_max_loop_iterations_int (loop);
> >+  scale_loop_profile_hold_exit_counts (loop,
> >+				       profile_probability::always () / vf,
> >+				       likely_max_niters);
>
  
Alex Coplan June 13, 2025, 9:50 a.m. UTC | #14
On 28/04/2025 16:33, Alex Coplan wrote:
> On 15/04/2025 19:48, Jan Hubicka wrote:
> > Hi,
> > > gcc/ChangeLog:
> > > 
> > >	PR tree-optimization/117790
> > >	* tree-vect-loop.cc (scale_profile_for_vect_loop): Use
> > >	scale_loop_profile_hold_exit_counts instead of scale_loop_profile.  Drop
> > >	the exit edge parameter, since the code now handles multiple exits.
> > >	Adjust the caller ...
> > >	(vect_transform_loop): ... here.
> > >
> > >gcc/testsuite/ChangeLog:
> > >
> > >	PR tree-optimization/117790
> > >	* gcc.dg/vect/vect-early-break-profile-2.c: New test.
> > >
> > >
> > >-  if (entry_count.nonzero_p ())
> > >-    set_edge_probability_and_rescale_others
> > >-	    (exit_e,
> > >-	     entry_count.probability_in (loop->header->count / vf));
> > >-  /* Avoid producing very large exit probability when we do not have
> > >-     sensible profile.  */
> > >-  else if (exit_e->probability < profile_probability::always () / (vf * 2))
> > 
> > This is handling relatively common case wehre we decide to vectorize
> > loop with, say, factor of 32 and have no profile-feedback.
> > In this case if the loop trip count is unknown at early loop, we will
> > esitmate it to iterate few times (approx 3-5 as that is average
> > iteration count of random loop based on some measurements).
> > 
> > The fact that we want to vecotirze by factor 32 implies that vectorizer
> > does not take this info seriously and its heuristics thinks better.
> > In this case we do not wan to drop the loop to 0 iteraitons as that
> > would result of poor code layout and regalloc.
> > 
> > I don't think you kept this logic in the new code?
> 
> To be honest, I didn't really follow the logic here.  Thinking about the
> single-exit case (which the current code is designed to handle), both
> the body of the if and the else if logically implement the same update:
> 
>   P'(exit) = vf * P(exit)
> 
> I suppose the code in the body of the if is just a more numerically stable way
> of doing this?
> 
> Also, AFAICT, the condition of the else if is trying to make sure that:
> 
>   vf * P(exit) < 1/2
>   i.e.
>   P'(exit) < 1/2
> 
> so it ensures we don't inflate the exit probably above 1/2 (equivalently
> we don't reduce the expected niters below 2).  But I don't understand
> why this guard isn't applied to the whole if-else chain.  Why should we
> allow inflating the exit probability above 1/2 if entry_count.nonzero_p (),
> but otherwise not?
> 
> I dropped this logic in the 4/4 patch since it seemed
> inconsistent, and in general the updating of exit probabilities
> is already handled by scale_loop_profile_hold_exit_counts.
> 
> Can you explain the intent behind the current logic, so I know what
> behaviour we want to preserve?  A concrete testcase would be useful,
> too.  FWIW, I tried doing a testsuite run on aarch64 with the following patch:
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3de1ea646b4..71787301e6d 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -12092,6 +12092,9 @@ scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned vf, bool fl
>       sensible profile.  */
>    else if (exit_e->probability < profile_probability::always () / (vf * 2))
>      set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
> +  else
> +    gcc_unreachable ();
> +
> 
> on top of current trunk, and the unreachable wasn't hit once.

Ping on this?  I think this discussion being unresolved is the main
thing blocking the series here:
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684930.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684931.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684932.html

Thanks,
Alex

> 
> Thanks,
> Alex
> 
> > 
> > Honza
> > >-    set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
> > >-  loop->latch->count = single_pred_edge (loop->latch)->count ();
> > >-
> > >-  scale_loop_profile (loop, profile_probability::always () / vf,
> > >-		      get_likely_max_loop_iterations_int (loop));
> > >+  const auto likely_max_niters = get_likely_max_loop_iterations_int (loop);
> > >+  scale_loop_profile_hold_exit_counts (loop,
> > >+				       profile_probability::always () / vf,
> > >+				       likely_max_niters);
> >
  
Jan Hubicka June 16, 2025, 5:25 p.m. UTC | #15
> > I don't think you kept this logic in the new code?
I really apologize for late reply. I missed that you wait for it.
> 
> To be honest, I didn't really follow the logic here.  Thinking about the
> single-exit case (which the current code is designed to handle), both
> the body of the if and the else if logically implement the same update:
> 
>   P'(exit) = vf * P(exit)
> 
> I suppose the code in the body of the if is just a more numerically stable way
> of doing this?

Here the code just trying to be more robust for broken profiles.

I.e. exit probability should correspond to the iteration count
determined by header_count/entry_count. If entry_count is non-zero and
we can divie, it forcingly set exit probability to it, so the
profile will end up being flow consistent even if it was not before.

In the remaining case, If entry_count happens to be 0 we only can scale exit_count.
The second test should prevent the situation wehre multiplying it by vf
will result in value >=1 which will end up predicting the loop as never
iterating.

It is relatively common that VF > iteration count as predicted by
profile, since static profile estimation is not very good on it and it
predicts most of loops to iterate approx 3x since that is the expected
loop iterations of a random loop in spec.
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3de1ea646b4..71787301e6d 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -12092,6 +12092,9 @@ scale_profile_for_vect_loop (class loop *loop, edge exit_e, unsigned vf, bool fl
>       sensible profile.  */
>    else if (exit_e->probability < profile_probability::always () / (vf * 2))
>      set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
> +  else
> +    gcc_unreachable ();
> +
> 
> on top of current trunk, and the unreachable wasn't hit once.

I suppose this happens because entry_count needs to be non-zero for
maybe_hot_loop predicate to return true, so we do not vectorize in case
of this profile breakage.  Not sure if it is a good idea - i.e. for
auto-fdo 0s on wrong spots are quite common.  I will look into it.

Honza
> 
> Thanks,
> Alex
> 
> > 
> > Honza
> > >-    set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf);
> > >-  loop->latch->count = single_pred_edge (loop->latch)->count ();
> > >-
> > >-  scale_loop_profile (loop, profile_probability::always () / vf,
> > >-		      get_likely_max_loop_iterations_int (loop));
> > >+  const auto likely_max_niters = get_likely_max_loop_iterations_int (loop);
> > >+  scale_loop_profile_hold_exit_counts (loop,
> > >+				       profile_probability::always () / vf,
> > >+				       likely_max_niters);
> >
  

Patch

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 534e556e1e4..1714f312d42 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -677,17 +677,243 @@  update_loop_exit_probability_scale_dom_bbs (class loop *loop,
   return exit_edge;
 }
 
-/* Scale profile in LOOP by P.
-   If ITERATION_BOUND is not -1, scale even further if loop is predicted
-   to iterate too many times.
-   Before caling this function, preheader block profile should be already
-   scaled to final count.  This is necessary because loop iterations are
-   determined by comparing header edge count to latch ege count and thus
-   they need to be scaled synchronously.  */
+/* Check if LOOP is a simple loop for scaling purposes; that is, it
+   has no internal control flow and at most one exit from each BB.
+
+   Also check if the sum of the new exit counts given by
+   GET_EXIT_COUNT will differ significantly from the count coming in to
+   the loop.  Return false if so, since flow_scale_loop_freqs relies on
+   loops having consistent profiles in this sense.  */
+template<typename ExitCountFn>
+static bool
+can_flow_scale_loop_freqs_p (class loop *loop,
+			     ExitCountFn get_exit_count)
+{
+  basic_block bb = loop->header;
+
+  const profile_count count_in = loop_count_in (loop);
+  profile_count exit_count = profile_count::zero ();
+
+  while (bb != loop->latch)
+    {
+      /* Punt if any of the BB counts are uninitialized.  */
+      if (!bb->count.initialized_p ())
+	return false;
+
+      bool found_exit = false;
+      edge internal_edge = nullptr;
+      for (auto e : bb->succs)
+	if (flow_bb_inside_loop_p (loop, e->dest))
+	  {
+	    if (internal_edge)
+	      return false;
+	    internal_edge = e;
+	  }
+	else
+	  {
+	    if (found_exit)
+	      return false;
+	    found_exit = true;
+	    exit_count += get_exit_count (e);
+	  }
+
+      bb = internal_edge->dest;
+    }
+
+  /* Punt if any exit edge had an uninitialized count.  */
+  if (!exit_count.initialized_p ())
+    return false;
+
+  /* In a consistent profile we must have count in = count out,
+     return false if this property doesn't hold for this loop,
+     as we rely on it in flow_scale_loop_freqs.  */
+  if (exit_count.differs_from_p (count_in))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file,
+		   ";; loop %d has inconsistent profile (count in = ",
+		   loop->num);
+	  count_in.dump (dump_file);
+	  fprintf (dump_file, ", count out = ");
+	  exit_count.dump (dump_file);
+	  fprintf (dump_file, "), cannot use count-flowing adjustment\n");
+	}
+
+      return false;
+    }
+
+  return true;
+}
+
+/* This function assumes LOOP satisfies can_flow_scale_loop_freqs_p.
+   For such a loop, adjust its profile so that the header now executes
+   PROB*N times, and adjust the exit probabilities such that the new exit
+   counts are those given by GET_EXIT_COUNT (E) for each exit edge E, flowing
+   the resulting counts along internal edges to make the profile consistent.  */
+
+template<typename ExitCountFn>
+static void
+flow_scale_loop_freqs (class loop *loop,
+		       profile_probability prob,
+		       ExitCountFn get_exit_count)
+{
+  basic_block bb = loop->header;
+  profile_count new_block_count = bb->count.apply_probability (prob);
+
+  while (1)
+    {
+      edge internal_edge = nullptr;
+      edge exit_edge = nullptr;
+      for (auto edge : bb->succs)
+	{
+	  if (flow_bb_inside_loop_p (loop, edge->dest))
+	    {
+	      gcc_assert (!internal_edge);
+	      internal_edge = edge;
+	    }
+	  else
+	    {
+	      gcc_assert (!exit_edge);
+	      exit_edge = edge;
+	    }
+	}
+
+      if (!exit_edge)
+	{
+	  /* If there is no exit edge, then there must be exactly one outgoing
+	     (loop-internal) edge for this bb.  No probabilities need
+	     updating.  */
+	  bb->count = new_block_count;
+	  if (bb == loop->latch)
+	    return;
+
+	  bb = internal_edge->dest;
+	  continue;
+	}
+
+      const profile_count new_exit_count = get_exit_count (exit_edge);
+      profile_probability new_exit_prob;
+      if (new_block_count.nonzero_p ())
+	new_exit_prob = new_exit_count.probability_in (new_block_count);
+      else
+	{
+	  /* NEW_BLOCK_COUNT is zero, so the only way we can make the profile
+	     consistent is if NEW_EXIT_COUNT is zero too.  */
+	  if (dump_file && new_exit_count.nonzero_p ())
+	    fprintf (dump_file,
+		     ";; flow_scale_loop_freqs wants non-zero exit count "
+		     "but bb count is zero/uninit: profile is inconsistent\n");
+
+	  /* Arbitrarily set the exit probability to 0.  */
+	  new_exit_prob = profile_probability::never ();
+	}
+
+      set_edge_probability_and_rescale_others (exit_edge, new_exit_prob);
+      bb->count = new_block_count;
+      new_block_count = internal_edge->count ();
+      bb = internal_edge->dest;
+    }
+}
+
+/* Attempt to adjust the profile of LOOP such that the header executes P*N
+   times where N is the current (unadjusted) expected iteration count.  Also
+   adjust the exit probabilities such that each exit edge E has count
+   GET_NEW_EXIT_COUNT (E).
+
+   Where possible, this function will use flow_scale_loop_freqs in order to
+   handle loops with multiple exits correctly.  Otherwise, we fall back to a
+   simple multiplicative scaling of the BB freqs.  */
+
+template<typename ExitCountFn>
+static void
+scale_loop_freqs_with_exit_counts (class loop *loop,
+				   profile_probability p,
+				   ExitCountFn get_new_exit_count)
+{
+  if (can_flow_scale_loop_freqs_p (loop, get_new_exit_count))
+    {
+      flow_scale_loop_freqs (loop, p, get_new_exit_count);
+      return;
+    }
+
+  scale_loop_frequencies (loop, p);
+  if (auto scale_e = loop_exit_for_scaling (loop))
+    {
+      if (scale_e->src->count.nonzero_p ())
+	scale_e->probability
+	  = get_new_exit_count (scale_e).probability_in (scale_e->src->count);
+    }
+  else
+    {
+      /* Multiplicative scaling of the loop frequencies for a loop with
+	 multiple exits works if we are simply reducing the
+	 number of times the loop is entered, but not to reduce the
+	 iteration count of the loop.  */
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file,
+		   ";; Multiplicative scale by ");
+	  p.dump (dump_file);
+	  fprintf (dump_file,
+		   " for loop %d with multiple exits: "
+		   "profile may become inconsistent\n",
+		   loop->num);
+	}
+
+      auto_vec<edge> exits = get_loop_exit_edges (loop);
+      for (auto edge : exits)
+	if (edge->src->count.nonzero_p ())
+	  edge->probability
+	    = get_new_exit_count (edge).probability_in (edge->src->count);
+    }
+}
+
+/* Adjust the profile of LOOP such that it is expected to iterate P*N times
+   where N is the (unadjusted) expected iteration count.  Adjust the exit
+   probabilities such that exit edge counts remain fixed before/after the
+   transformation.  */
+
+static void
+scale_loop_freqs_hold_exit_counts (class loop *loop,
+				   profile_probability p)
+{
+  auto id_counts = [](edge e) { return e->count (); };
+  scale_loop_freqs_with_exit_counts (loop, p, id_counts);
+}
+
+/* Adjust the profile of LOOP such that it is expected to iterate P*N times
+   where N is the (unadjusted) expected iteration count.  Adjust the exit
+   probabilities such that exit edge counts remain fixed before/after the
+   transformation, except for EXIT_E which is set to have count NEW_COUNT after
+   the transformation.  */
 
 void
-scale_loop_profile (class loop *loop, profile_probability p,
-		    gcov_type iteration_bound)
+scale_loop_freqs_with_new_exit_count (class loop *loop,
+				      profile_probability p,
+				      edge exit_e,
+				      profile_count new_count)
+{
+  auto get_new_counts = [=](edge e)
+    {
+      if (e == exit_e)
+	return new_count;
+      else
+	return e->count ();
+    };
+  scale_loop_freqs_with_exit_counts (loop, p, get_new_counts);
+}
+
+/* Outline of scale_loop_profile implementation.  Used by
+   scale_loop_profile{,_hold_exit_counts}.  */
+
+template<typename ScaleLoopFreqs, typename ScaleNiters>
+static void
+scale_loop_profile_1 (class loop *loop,
+		      profile_probability p,
+		      gcov_type iteration_bound,
+		      ScaleLoopFreqs freq_scaler,
+		      ScaleNiters niters_scaler)
 {
   if (!(p == profile_probability::always ()))
     {
@@ -700,7 +926,7 @@  scale_loop_profile (class loop *loop, profile_probability p,
 	}
 
       /* Scale the probabilities.  */
-      scale_loop_frequencies (loop, p);
+      freq_scaler (loop, p);
     }
 
   if (iteration_bound == -1)
@@ -737,18 +963,57 @@  scale_loop_profile (class loop *loop, profile_probability p,
       fprintf (dump_file, " to reach upper bound %i\n",
 	       (int)iteration_bound);
     }
-  /* Finally attempt to fix exit edge probability.  */
-  edge exit_edge = loop_exit_for_scaling (loop);
-
-  /* In a consistent profile unadjusted_exit_count should be same as count_in,
-     however to preserve as much of the original info, avoid recomputing
-     it.  */
-  profile_count unadjusted_exit_count = profile_count::uninitialized ();
-  if (exit_edge)
-    unadjusted_exit_count = exit_edge->count ();
-  scale_loop_frequencies (loop, scale_prob);
-  update_loop_exit_probability_scale_dom_bbs (loop, exit_edge,
-					      unadjusted_exit_count);
+
+  niters_scaler (loop, scale_prob);
+}
+
+/* Scale profile in LOOP by P.
+   If ITERATION_BOUND is not -1, scale even further if loop is predicted
+   to iterate too many times.
+   Before caling this function, preheader block profile should be already
+   scaled to final count.  This is necessary because loop iterations are
+   determined by comparing header edge count to latch ege count and thus
+   they need to be scaled synchronously.  */
+
+void
+scale_loop_profile (class loop *loop, profile_probability p,
+		    gcov_type iteration_bound)
+{
+  auto scale_niters = [](class loop *loop, profile_probability scale_prob)
+    {
+      edge exit_edge = loop_exit_for_scaling (loop);
+
+      profile_count unadjusted_exit_count = profile_count::uninitialized ();
+      if (exit_edge)
+	unadjusted_exit_count = exit_edge->count ();
+      else
+	{
+	  scale_loop_freqs_hold_exit_counts (loop, scale_prob);
+	  return;
+	}
+
+      scale_loop_frequencies (loop, scale_prob);
+      update_loop_exit_probability_scale_dom_bbs (loop, exit_edge,
+						  unadjusted_exit_count);
+    };
+
+  scale_loop_profile_1 (loop, p, iteration_bound,
+			&scale_loop_frequencies,
+			scale_niters);
+}
+
+/* This function is like scale_loop_profile but additionally takes care of
+   updating the exit edge probabilities in order to hold the exit edge counts
+   constant before/after any transformation.  */
+
+void
+scale_loop_profile_hold_exit_counts (class loop *loop,
+				     profile_probability p,
+				     gcov_type iteration_bound)
+{
+  scale_loop_profile_1 (loop, p, iteration_bound,
+			&scale_loop_freqs_hold_exit_counts,
+			&scale_loop_freqs_hold_exit_counts);
 }
 
 /* Recompute dominance information for basic blocks outside LOOP.  */
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index 42def2fe40d..faa874c56d5 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -41,6 +41,13 @@  extern void place_new_loop (struct function *, class loop *);
 extern void add_loop (class loop *, class loop *);
 extern void scale_loop_frequencies (class loop *, profile_probability);
 extern void scale_loop_profile (class loop *, profile_probability, gcov_type);
+extern void scale_loop_profile_hold_exit_counts (class loop *,
+						 profile_probability,
+						 gcov_type);
+extern void scale_loop_freqs_with_new_exit_count (class loop *,
+						  profile_probability,
+						  edge, profile_count);
+
 extern edge create_empty_if_region_on_edge (edge, tree);
 extern class loop *create_empty_loop_on_edge (edge, tree, tree, tree, tree,
 					       tree *, tree *, class loop *);