Fix setting of call graph node AutoFDO count [PR116743]

Message ID MN2PR21MB13923DBD9C353C2663B13132911F2@MN2PR21MB1392.namprd21.prod.outlook.com
State New
Headers
Series Fix setting of call graph node AutoFDO count [PR116743] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-arm fail Patch failed to apply

Commit Message

Eugene Rozenfeld Jan. 13, 2025, 9:46 p.m. UTC
  We are initializing both the call graph node count and
the entry block count of the function with the head_count value
from the profile.

Count propagation algorithm may refine the entry block count
and we may end up with a case where the call graph node count
is set to 0 but the entry block count is non-zero. That becomes
a problem because we have this code in execute_fixup_cfg:

profile_count num = node->count;
profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
bool scale = num.initialized_p () && !(num == den);

Here if num is 0 but den is not 0, scale becomes true and we
lose the counts in

if (scale)
  bb->count = bb->count.apply_scale (num, den);

This is what happened the issue reported in PR116743
(a 10% regression in MySQL HAMMERDB tests).
3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
AutoFDO count propagation, which caused the mismatch between
the call graph node count (zero) and the entry block count (non-zero)
and subsequent loss of counts as described above.

The fix is to update the call graph node count once we've done count propagation.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
                PR gcov-profile/116743
                * auto-profile.c (afdo_annotate_cfg): Fix mismatch between the call graph node count
                and the entry block count.
---
gcc/auto-profile.cc | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

       /* Calculate, propagate count and probability information on CFG.  */
       afdo_calculate_branch_prob (&annotated_bb);
     }
+  cgraph_node::get(current_function_decl)->count
+      = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count;
   update_max_bb_count ();
   profile_status_for_fn (cfun) = PROFILE_READ;
   if (flag_value_profile_transformations)
--
2.34.1
  

Comments

Richard Biener Jan. 14, 2025, 7:21 a.m. UTC | #1
On Mon, Jan 13, 2025 at 10:47 PM Eugene Rozenfeld
<Eugene.Rozenfeld@microsoft.com> wrote:
>
> We are initializing both the call graph node count and
>
> the entry block count of the function with the head_count value
>
> from the profile.
>
>
>
> Count propagation algorithm may refine the entry block count
>
> and we may end up with a case where the call graph node count
>
> is set to 0 but the entry block count is non-zero. That becomes
>
> a problem because we have this code in execute_fixup_cfg:
>
>
>
> profile_count num = node->count;
>
> profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>
> bool scale = num.initialized_p () && !(num == den);
>
>
>
> Here if num is 0 but den is not 0, scale becomes true and we
>
> lose the counts in
>
>
>
> if (scale)
>
>   bb->count = bb->count.apply_scale (num, den);
>
>
>
> This is what happened the issue reported in PR116743
>
> (a 10% regression in MySQL HAMMERDB tests).
>
> 3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
>
> AutoFDO count propagation, which caused the mismatch between
>
> the call graph node count (zero) and the entry block count (non-zero)
>
> and subsequent loss of counts as described above.
>
>
>
> The fix is to update the call graph node count once we've done count propagation.
>
>
>
> Tested on x86_64-pc-linux-gnu.

OK.

Thanks,
Richard.

>
>
> gcc/ChangeLog:
>
>                 PR gcov-profile/116743
>
>                 * auto-profile.c (afdo_annotate_cfg): Fix mismatch between the call graph node count
>
>                 and the entry block count.
>
> ---
>
> gcc/auto-profile.cc | 4 ++--
>
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
>
> index 5d0e8afb9a1..aa4d1634f01 100644
>
> --- a/gcc/auto-profile.cc
>
> +++ b/gcc/auto-profile.cc
>
> @@ -1538,8 +1538,6 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
>
>    if (s == NULL)
>
>      return;
>
> -  cgraph_node::get (current_function_decl)->count
>
> -     = profile_count::from_gcov_type (s->head_count ()).afdo ();
>
>    ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
>
>       = profile_count::from_gcov_type (s->head_count ()).afdo ();
>
>    EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo ();
>
> @@ -1578,6 +1576,8 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
>
>        /* Calculate, propagate count and probability information on CFG.  */
>
>        afdo_calculate_branch_prob (&annotated_bb);
>
>      }
>
> +  cgraph_node::get(current_function_decl)->count
>
> +      = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count;
>
>    update_max_bb_count ();
>
>    profile_status_for_fn (cfun) = PROFILE_READ;
>
>    if (flag_value_profile_transformations)
>
> --
>
> 2.34.1
>
>
  
Eugene Rozenfeld Jan. 16, 2025, 2:17 a.m. UTC | #2
I committed the patch to trunk. Is it ok to backport to gcc-12, gcc-13, and gcc-14?

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Monday, January 13, 2025 11:22 PM
To: Eugene Rozenfeld <Eugene.Rozenfeld@microsoft.com>
Cc: GCC-Patches-ML <gcc-patches@gcc.gnu.org>; Jan Hubicka <hubicka@ucw.cz>; rvmallad@amazon.com
Subject: [EXTERNAL] Re: [PATCH] Fix setting of call graph node AutoFDO count [PR116743]

On Mon, Jan 13, 2025 at 10:47 PM Eugene Rozenfeld <Eugene.Rozenfeld@microsoft.com> wrote:
>
> We are initializing both the call graph node count and
>
> the entry block count of the function with the head_count value
>
> from the profile.
>
>
>
> Count propagation algorithm may refine the entry block count
>
> and we may end up with a case where the call graph node count
>
> is set to 0 but the entry block count is non-zero. That becomes
>
> a problem because we have this code in execute_fixup_cfg:
>
>
>
> profile_count num = node->count;
>
> profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>
> bool scale = num.initialized_p () && !(num == den);
>
>
>
> Here if num is 0 but den is not 0, scale becomes true and we
>
> lose the counts in
>
>
>
> if (scale)
>
>   bb->count = bb->count.apply_scale (num, den);
>
>
>
> This is what happened the issue reported in PR116743
>
> (a 10% regression in MySQL HAMMERDB tests).
>
> 3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
>
> AutoFDO count propagation, which caused the mismatch between
>
> the call graph node count (zero) and the entry block count (non-zero)
>
> and subsequent loss of counts as described above.
>
>
>
> The fix is to update the call graph node count once we've done count propagation.
>
>
>
> Tested on x86_64-pc-linux-gnu.

OK.

Thanks,
Richard.

>
>
> gcc/ChangeLog:
>
>                 PR gcov-profile/116743
>
>                 * auto-profile.c (afdo_annotate_cfg): Fix mismatch 
> between the call graph node count
>
>                 and the entry block count.
>
> ---
>
> gcc/auto-profile.cc | 4 ++--
>
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
>
> index 5d0e8afb9a1..aa4d1634f01 100644
>
> --- a/gcc/auto-profile.cc
>
> +++ b/gcc/auto-profile.cc
>
> @@ -1538,8 +1538,6 @@ afdo_annotate_cfg (const stmt_set 
> &promoted_stmts)
>
>    if (s == NULL)
>
>      return;
>
> -  cgraph_node::get (current_function_decl)->count
>
> -     = profile_count::from_gcov_type (s->head_count ()).afdo ();
>
>    ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
>
>       = profile_count::from_gcov_type (s->head_count ()).afdo ();
>
>    EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo 
> ();
>
> @@ -1578,6 +1576,8 @@ afdo_annotate_cfg (const stmt_set 
> &promoted_stmts)
>
>        /* Calculate, propagate count and probability information on 
> CFG.  */
>
>        afdo_calculate_branch_prob (&annotated_bb);
>
>      }
>
> +  cgraph_node::get(current_function_decl)->count
>
> +      = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count;
>
>    update_max_bb_count ();
>
>    profile_status_for_fn (cfun) = PROFILE_READ;
>
>    if (flag_value_profile_transformations)
>
> --
>
> 2.34.1
>
>
  
Richard Biener Jan. 16, 2025, 8:32 a.m. UTC | #3
On Thu, Jan 16, 2025 at 3:17 AM Eugene Rozenfeld
<Eugene.Rozenfeld@microsoft.com> wrote:
>
> I committed the patch to trunk. Is it ok to backport to gcc-12, gcc-13, and gcc-14?

Yes.

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, January 13, 2025 11:22 PM
> To: Eugene Rozenfeld <Eugene.Rozenfeld@microsoft.com>
> Cc: GCC-Patches-ML <gcc-patches@gcc.gnu.org>; Jan Hubicka <hubicka@ucw.cz>; rvmallad@amazon.com
> Subject: [EXTERNAL] Re: [PATCH] Fix setting of call graph node AutoFDO count [PR116743]
>
> On Mon, Jan 13, 2025 at 10:47 PM Eugene Rozenfeld <Eugene.Rozenfeld@microsoft.com> wrote:
> >
> > We are initializing both the call graph node count and
> >
> > the entry block count of the function with the head_count value
> >
> > from the profile.
> >
> >
> >
> > Count propagation algorithm may refine the entry block count
> >
> > and we may end up with a case where the call graph node count
> >
> > is set to 0 but the entry block count is non-zero. That becomes
> >
> > a problem because we have this code in execute_fixup_cfg:
> >
> >
> >
> > profile_count num = node->count;
> >
> > profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> >
> > bool scale = num.initialized_p () && !(num == den);
> >
> >
> >
> > Here if num is 0 but den is not 0, scale becomes true and we
> >
> > lose the counts in
> >
> >
> >
> > if (scale)
> >
> >   bb->count = bb->count.apply_scale (num, den);
> >
> >
> >
> > This is what happened the issue reported in PR116743
> >
> > (a 10% regression in MySQL HAMMERDB tests).
> >
> > 3d9e6767939e9658260e2506e81ec32b37cba041 made an improvement in
> >
> > AutoFDO count propagation, which caused the mismatch between
> >
> > the call graph node count (zero) and the entry block count (non-zero)
> >
> > and subsequent loss of counts as described above.
> >
> >
> >
> > The fix is to update the call graph node count once we've done count propagation.
> >
> >
> >
> > Tested on x86_64-pc-linux-gnu.
>
> OK.
>
> Thanks,
> Richard.
>
> >
> >
> > gcc/ChangeLog:
> >
> >                 PR gcov-profile/116743
> >
> >                 * auto-profile.c (afdo_annotate_cfg): Fix mismatch
> > between the call graph node count
> >
> >                 and the entry block count.
> >
> > ---
> >
> > gcc/auto-profile.cc | 4 ++--
> >
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >
> >
> > diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> >
> > index 5d0e8afb9a1..aa4d1634f01 100644
> >
> > --- a/gcc/auto-profile.cc
> >
> > +++ b/gcc/auto-profile.cc
> >
> > @@ -1538,8 +1538,6 @@ afdo_annotate_cfg (const stmt_set
> > &promoted_stmts)
> >
> >    if (s == NULL)
> >
> >      return;
> >
> > -  cgraph_node::get (current_function_decl)->count
> >
> > -     = profile_count::from_gcov_type (s->head_count ()).afdo ();
> >
> >    ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
> >
> >       = profile_count::from_gcov_type (s->head_count ()).afdo ();
> >
> >    EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo
> > ();
> >
> > @@ -1578,6 +1576,8 @@ afdo_annotate_cfg (const stmt_set
> > &promoted_stmts)
> >
> >        /* Calculate, propagate count and probability information on
> > CFG.  */
> >
> >        afdo_calculate_branch_prob (&annotated_bb);
> >
> >      }
> >
> > +  cgraph_node::get(current_function_decl)->count
> >
> > +      = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count;
> >
> >    update_max_bb_count ();
> >
> >    profile_status_for_fn (cfun) = PROFILE_READ;
> >
> >    if (flag_value_profile_transformations)
> >
> > --
> >
> > 2.34.1
> >
> >
  

Patch

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 5d0e8afb9a1..aa4d1634f01 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1538,8 +1538,6 @@  afdo_annotate_cfg (const stmt_set &promoted_stmts)
   if (s == NULL)
     return;
-  cgraph_node::get (current_function_decl)->count
-     = profile_count::from_gcov_type (s->head_count ()).afdo ();
   ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
      = profile_count::from_gcov_type (s->head_count ()).afdo ();
   EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo ();
@@ -1578,6 +1576,8 @@  afdo_annotate_cfg (const stmt_set &promoted_stmts)