ira: Cap callee-saved register cost scale to 300
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
fail
|
Test failed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
Don't increase callee-saved register cost by 1000x, which leads to that
callee-saved registers aren't used to preserve local variable values
across calls, by capping the scale to 300.
PR rtl-optimization/111673
PR rtl-optimization/115932
PR rtl-optimization/116028
PR rtl-optimization/117081
PR rtl-optimization/118497
* ira-color.cc (assign_hard_reg): Cap callee-saved register cost
scale to 300.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
---
gcc/ira-color.cc | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
Comments
> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.tools@gmail.com>:
>
> Don't increase callee-saved register cost by 1000x, which leads to that
> callee-saved registers aren't used to preserve local variable values
> across calls, by capping the scale to 300.
> PR rtl-optimization/111673
> PR rtl-optimization/115932
> PR rtl-optimization/116028
> PR rtl-optimization/117081
> PR rtl-optimization/118497
> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> scale to 300.
>
> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> ---
> gcc/ira-color.cc | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 0699b349a1a..707ff188250 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> /* We need to save/restore the hard register in
> epilogue/prologue. Therefore we increase the cost. */
> {
> + int scale;
> + if (optimize_size)
> + scale = 1;
> + else
> + {
> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> + /* Don't increase callee-saved register cost by 1000x,
> + which leads to that callee-saved registers aren't
> + used to preserve local variable values across calls,
> + by capping the scale to 300. */
> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> + scale = 300;
That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected to scale this down to [0, 300] or is MAX a magic value?
> + }
> rclass = REGNO_REG_CLASS (hard_regno);
> add_cost = ((ira_memory_move_cost[mode][rclass][0]
> + ira_memory_move_cost[mode][rclass][1])
> * saved_nregs / hard_regno_nregs (hard_regno,
> mode) - 1)
> - * (optimize_size ? 1 :
> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> + * scale;
> cost += add_cost;
> full_cost += add_cost;
> }
> --
> 2.48.1
>
On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
>
>
> > Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.tools@gmail.com>:
> >
> > Don't increase callee-saved register cost by 1000x, which leads to that
> > callee-saved registers aren't used to preserve local variable values
> > across calls, by capping the scale to 300.
>
> > PR rtl-optimization/111673
> > PR rtl-optimization/115932
> > PR rtl-optimization/116028
> > PR rtl-optimization/117081
> > PR rtl-optimization/118497
> > * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> > scale to 300.
> >
> > Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> > ---
> > gcc/ira-color.cc | 16 ++++++++++++++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > index 0699b349a1a..707ff188250 100644
> > --- a/gcc/ira-color.cc
> > +++ b/gcc/ira-color.cc
> > @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> > /* We need to save/restore the hard register in
> > epilogue/prologue. Therefore we increase the cost. */
> > {
> > + int scale;
> > + if (optimize_size)
> > + scale = 1;
> > + else
> > + {
> > + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > + /* Don't increase callee-saved register cost by 1000x,
> > + which leads to that callee-saved registers aren't
> > + used to preserve local variable values across calls,
> > + by capping the scale to 300. */
> > + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > + scale = 300;
>
> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected to scale this down to [0, 300] or is MAX a magic value?
There are
* The weights for each insn varies from 0 to REG_FREQ_BASE.
This constant does not need to be high, as in infrequently executed
regions we want to count instructions equivalently to optimize for
size instead of speed. */
#define REG_FREQ_MAX 1000
/* Compute register frequency from the BB frequency. When optimizing for size,
or profile driven feedback is available and the function is never executed,
frequency is always equivalent. Otherwise rescale the basic block
frequency. */
#define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) \
|| !cfun->cfg->count_max.initialized_p ()) \
? REG_FREQ_MAX \
: ((bb)->count.to_frequency (cfun) \
* REG_FREQ_MAX / BB_FREQ_MAX) \
? ((bb)->count.to_frequency (cfun) \
* REG_FREQ_MAX / BB_FREQ_MAX) \
: 1)
1000 is the default. If it isn't 1000, it isn't the default. I only want
to get a more reasonable default scale, instead of 1000. Lower
scale will fail the PR rtl-optimization/111673 test on powerpc64.
> > + }
> > rclass = REGNO_REG_CLASS (hard_regno);
> > add_cost = ((ira_memory_move_cost[mode][rclass][0]
> > + ira_memory_move_cost[mode][rclass][1])
> > * saved_nregs / hard_regno_nregs (hard_regno,
> > mode) - 1)
> > - * (optimize_size ? 1 :
> > - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> > + * scale;
> > cost += add_cost;
> > full_cost += add_cost;
> > }
> > --
> > 2.48.1
> >
> Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.tools@gmail.com>:
>
> On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>>
>>
>>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.tools@gmail.com>:
>>>
>>> Don't increase callee-saved register cost by 1000x, which leads to that
>>> callee-saved registers aren't used to preserve local variable values
>>> across calls, by capping the scale to 300.
>>
>>> PR rtl-optimization/111673
>>> PR rtl-optimization/115932
>>> PR rtl-optimization/116028
>>> PR rtl-optimization/117081
>>> PR rtl-optimization/118497
>>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
>>> scale to 300.
>>>
>>> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
>>> ---
>>> gcc/ira-color.cc | 16 ++++++++++++++--
>>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
>>> index 0699b349a1a..707ff188250 100644
>>> --- a/gcc/ira-color.cc
>>> +++ b/gcc/ira-color.cc
>>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>>> /* We need to save/restore the hard register in
>>> epilogue/prologue. Therefore we increase the cost. */
>>> {
>>> + int scale;
>>> + if (optimize_size)
>>> + scale = 1;
>>> + else
>>> + {
>>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>>> + /* Don't increase callee-saved register cost by 1000x,
>>> + which leads to that callee-saved registers aren't
>>> + used to preserve local variable values across calls,
>>> + by capping the scale to 300. */
>>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
>>> + scale = 300;
>>
>> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected to scale this down to [0, 300] or is MAX a magic value?
>
> There are
>
> * The weights for each insn varies from 0 to REG_FREQ_BASE.
> This constant does not need to be high, as in infrequently executed
> regions we want to count instructions equivalently to optimize for
> size instead of speed. */
> #define REG_FREQ_MAX 1000
>
> /* Compute register frequency from the BB frequency. When optimizing for size,
> or profile driven feedback is available and the function is never executed,
> frequency is always equivalent. Otherwise rescale the basic block
> frequency. */
> #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) \
> || !cfun->cfg->count_max.initialized_p ()) \
> ? REG_FREQ_MAX \
> : ((bb)->count.to_frequency (cfun) \
> * REG_FREQ_MAX / BB_FREQ_MAX) \
> ? ((bb)->count.to_frequency (cfun) \
> * REG_FREQ_MAX / BB_FREQ_MAX) \
> : 1)
>
> 1000 is the default. If it isn't 1000, it isn't the default. I only want
> to get a more reasonable default scale, instead of 1000. Lower
> scale will fail the PR rtl-optimization/111673 test on powerpc64.
I see. Why not adjust the above macro then? That would be a bit more obvious. Like use MAX/2 or so?
>
>
>>> + }
>>> rclass = REGNO_REG_CLASS (hard_regno);
>>> add_cost = ((ira_memory_move_cost[mode][rclass][0]
>>> + ira_memory_move_cost[mode][rclass][1])
>>> * saved_nregs / hard_regno_nregs (hard_regno,
>>> mode) - 1)
>>> - * (optimize_size ? 1 :
>>> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
>>> + * scale;
>>> cost += add_cost;
>>> full_cost += add_cost;
>>> }
>>> --
>>> 2.48.1
>>>
>
>
>
> --
> H.J.
On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
>
>
> > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.tools@gmail.com>:
> >
> > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >>
> >>
> >>
> >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.tools@gmail.com>:
> >>>
> >>> Don't increase callee-saved register cost by 1000x, which leads to that
> >>> callee-saved registers aren't used to preserve local variable values
> >>> across calls, by capping the scale to 300.
> >>
> >>> PR rtl-optimization/111673
> >>> PR rtl-optimization/115932
> >>> PR rtl-optimization/116028
> >>> PR rtl-optimization/117081
> >>> PR rtl-optimization/118497
> >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> >>> scale to 300.
> >>>
> >>> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> >>> ---
> >>> gcc/ira-color.cc | 16 ++++++++++++++--
> >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> >>> index 0699b349a1a..707ff188250 100644
> >>> --- a/gcc/ira-color.cc
> >>> +++ b/gcc/ira-color.cc
> >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> >>> /* We need to save/restore the hard register in
> >>> epilogue/prologue. Therefore we increase the cost. */
> >>> {
> >>> + int scale;
> >>> + if (optimize_size)
> >>> + scale = 1;
> >>> + else
> >>> + {
> >>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> >>> + /* Don't increase callee-saved register cost by 1000x,
> >>> + which leads to that callee-saved registers aren't
> >>> + used to preserve local variable values across calls,
> >>> + by capping the scale to 300. */
> >>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> >>> + scale = 300;
> >>
> >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected to scale this down to [0, 300] or is MAX a magic value?
> >
> > There are
> >
> > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> > This constant does not need to be high, as in infrequently executed
> > regions we want to count instructions equivalently to optimize for
> > size instead of speed. */
> > #define REG_FREQ_MAX 1000
> >
> > /* Compute register frequency from the BB frequency. When optimizing for size,
> > or profile driven feedback is available and the function is never executed,
> > frequency is always equivalent. Otherwise rescale the basic block
> > frequency. */
> > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) \
> > || !cfun->cfg->count_max.initialized_p ()) \
> > ? REG_FREQ_MAX \
> > : ((bb)->count.to_frequency (cfun) \
> > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > ? ((bb)->count.to_frequency (cfun) \
> > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > : 1)
> >
> > 1000 is the default. If it isn't 1000, it isn't the default. I only want
> > to get a more reasonable default scale, instead of 1000. Lower
> > scale will fail the PR rtl-optimization/111673 test on powerpc64.
>
> I see. Why not adjust the above macro then? That would be a bit more obvious. Like use MAX/2 or so?
commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
Author: Surya Kumari Jangala <jskumari@linux.ibm.com>
Date: Tue Jun 25 08:37:49 2024 -0500
ira: Scale save/restore costs of callee save registers with block frequency
uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse.
I don't want to change REG_FREQ_FROM_BB since it is used in other places,
not as a cost scale. Maybe the above commit should be reverted and we add
a target hook for callee-saved register cost scale. Each target can choose
a proper cost scale, install of increasing the cost by 1000x for everyone.
> >
> >
> >>> + }
> >>> rclass = REGNO_REG_CLASS (hard_regno);
> >>> add_cost = ((ira_memory_move_cost[mode][rclass][0]
> >>> + ira_memory_move_cost[mode][rclass][1])
> >>> * saved_nregs / hard_regno_nregs (hard_regno,
> >>> mode) - 1)
> >>> - * (optimize_size ? 1 :
> >>> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> >>> + * scale;
> >>> cost += add_cost;
> >>> full_cost += add_cost;
> >>> }
> >>> --
> >>> 2.48.1
> >>>
> >
> >
> >
> > --
> > H.J.
On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> >
> >
> > > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.tools@gmail.com>:
> > >
> > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > >>
> > >>
> > >>
> > >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.tools@gmail.com>:
> > >>>
> > >>> Don't increase callee-saved register cost by 1000x, which leads to that
> > >>> callee-saved registers aren't used to preserve local variable values
> > >>> across calls, by capping the scale to 300.
> > >>
> > >>> PR rtl-optimization/111673
> > >>> PR rtl-optimization/115932
> > >>> PR rtl-optimization/116028
> > >>> PR rtl-optimization/117081
> > >>> PR rtl-optimization/118497
> > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> > >>> scale to 300.
> > >>>
> > >>> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> > >>> ---
> > >>> gcc/ira-color.cc | 16 ++++++++++++++--
> > >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > >>> index 0699b349a1a..707ff188250 100644
> > >>> --- a/gcc/ira-color.cc
> > >>> +++ b/gcc/ira-color.cc
> > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> > >>> /* We need to save/restore the hard register in
> > >>> epilogue/prologue. Therefore we increase the cost. */
> > >>> {
> > >>> + int scale;
> > >>> + if (optimize_size)
> > >>> + scale = 1;
> > >>> + else
> > >>> + {
> > >>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > >>> + /* Don't increase callee-saved register cost by 1000x,
> > >>> + which leads to that callee-saved registers aren't
> > >>> + used to preserve local variable values across calls,
> > >>> + by capping the scale to 300. */
> > >>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > >>> + scale = 300;
> > >>
> > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected to scale this down to [0, 300] or is MAX a magic value?
> > >
> > > There are
> > >
> > > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> > > This constant does not need to be high, as in infrequently executed
> > > regions we want to count instructions equivalently to optimize for
> > > size instead of speed. */
> > > #define REG_FREQ_MAX 1000
> > >
> > > /* Compute register frequency from the BB frequency. When optimizing for size,
> > > or profile driven feedback is available and the function is never executed,
> > > frequency is always equivalent. Otherwise rescale the basic block
> > > frequency. */
> > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) \
> > > || !cfun->cfg->count_max.initialized_p ()) \
> > > ? REG_FREQ_MAX \
> > > : ((bb)->count.to_frequency (cfun) \
> > > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > > ? ((bb)->count.to_frequency (cfun) \
> > > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > > : 1)
> > >
> > > 1000 is the default. If it isn't 1000, it isn't the default. I only want
> > > to get a more reasonable default scale, instead of 1000. Lower
> > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> >
> > I see. Why not adjust the above macro then? That would be a bit more obvious. Like use MAX/2 or so?
>
> commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> Author: Surya Kumari Jangala <jskumari@linux.ibm.com>
> Date: Tue Jun 25 08:37:49 2024 -0500
>
> ira: Scale save/restore costs of callee save registers with block frequency
>
> uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse.
> I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> not as a cost scale. Maybe the above commit should be reverted and we add
> a target hook for callee-saved register cost scale. Each target can choose
> a proper cost scale, install of increasing the cost by 1000x for everyone.
I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least,
as it doesn't seem to be used. The comment talks about profile feedback,
but for example with -fprofile-correction or -fpartial-profile this
test looks odd.
In fact optimize_function_for_size_p should already handle this correctly.
Also REG_FREQ_FROM_BB simply documents that in this case the
frequency will be equivalent for all BBs and not any particular value. The new
use might indeed not have the same constraints as others, instead of
a target hook making the "same value" another macro argument might be
a good first step.
That said - does removing the || !cfun->cfg->count_max.initialized_p ()
test "fix" things?
Richard.
> > >
> > >
> > >>> + }
> > >>> rclass = REGNO_REG_CLASS (hard_regno);
> > >>> add_cost = ((ira_memory_move_cost[mode][rclass][0]
> > >>> + ira_memory_move_cost[mode][rclass][1])
> > >>> * saved_nregs / hard_regno_nregs (hard_regno,
> > >>> mode) - 1)
> > >>> - * (optimize_size ? 1 :
> > >>> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> > >>> + * scale;
> > >>> cost += add_cost;
> > >>> full_cost += add_cost;
> > >>> }
> > >>> --
> > >>> 2.48.1
> > >>>
> > >
> > >
> > >
> > > --
> > > H.J.
>
>
>
> --
> H.J.
On Mon, Feb 3, 2025 at 5:21 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > >
> > >
> > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.tools@gmail.com>:
> > > >
> > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.tools@gmail.com>:
> > > >>>
> > > >>> Don't increase callee-saved register cost by 1000x, which leads to that
> > > >>> callee-saved registers aren't used to preserve local variable values
> > > >>> across calls, by capping the scale to 300.
> > > >>
> > > >>> PR rtl-optimization/111673
> > > >>> PR rtl-optimization/115932
> > > >>> PR rtl-optimization/116028
> > > >>> PR rtl-optimization/117081
> > > >>> PR rtl-optimization/118497
> > > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> > > >>> scale to 300.
> > > >>>
> > > >>> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> > > >>> ---
> > > >>> gcc/ira-color.cc | 16 ++++++++++++++--
> > > >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> > > >>>
> > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > > >>> index 0699b349a1a..707ff188250 100644
> > > >>> --- a/gcc/ira-color.cc
> > > >>> +++ b/gcc/ira-color.cc
> > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> > > >>> /* We need to save/restore the hard register in
> > > >>> epilogue/prologue. Therefore we increase the cost. */
> > > >>> {
> > > >>> + int scale;
> > > >>> + if (optimize_size)
> > > >>> + scale = 1;
> > > >>> + else
> > > >>> + {
> > > >>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > >>> + /* Don't increase callee-saved register cost by 1000x,
> > > >>> + which leads to that callee-saved registers aren't
> > > >>> + used to preserve local variable values across calls,
> > > >>> + by capping the scale to 300. */
> > > >>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > > >>> + scale = 300;
> > > >>
> > > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected to scale this down to [0, 300] or is MAX a magic value?
> > > >
> > > > There are
> > > >
> > > > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> > > > This constant does not need to be high, as in infrequently executed
> > > > regions we want to count instructions equivalently to optimize for
> > > > size instead of speed. */
> > > > #define REG_FREQ_MAX 1000
> > > >
> > > > /* Compute register frequency from the BB frequency. When optimizing for size,
> > > > or profile driven feedback is available and the function is never executed,
> > > > frequency is always equivalent. Otherwise rescale the basic block
> > > > frequency. */
> > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) \
> > > > || !cfun->cfg->count_max.initialized_p ()) \
> > > > ? REG_FREQ_MAX \
> > > > : ((bb)->count.to_frequency (cfun) \
> > > > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > > > ? ((bb)->count.to_frequency (cfun) \
> > > > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > > > : 1)
> > > >
> > > > 1000 is the default. If it isn't 1000, it isn't the default. I only want
> > > > to get a more reasonable default scale, instead of 1000. Lower
> > > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> > >
> > > I see. Why not adjust the above macro then? That would be a bit more obvious. Like use MAX/2 or so?
> >
> > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > Author: Surya Kumari Jangala <jskumari@linux.ibm.com>
> > Date: Tue Jun 25 08:37:49 2024 -0500
> >
> > ira: Scale save/restore costs of callee save registers with block frequency
> >
> > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse.
> > I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> > not as a cost scale. Maybe the above commit should be reverted and we add
> > a target hook for callee-saved register cost scale. Each target can choose
> > a proper cost scale, install of increasing the cost by 1000x for everyone.
>
> I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least,
> as it doesn't seem to be used. The comment talks about profile feedback,
> but for example with -fprofile-correction or -fpartial-profile this
> test looks odd.
> In fact optimize_function_for_size_p should already handle this correctly.
>
> Also REG_FREQ_FROM_BB simply documents that in this case the
> frequency will be equivalent for all BBs and not any particular value. The new
> use might indeed not have the same constraints as others, instead of
> a target hook making the "same value" another macro argument might be
> a good first step.
>
> That said - does removing the || !cfun->cfg->count_max.initialized_p ()
> test "fix" things?
I don't think so. Frequency value can be anything, 1000 or 100000.
It is relative. It is very odd to use it as the cost scale. Here is the v2
patch
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674940.html
to add a target hook for callee-saved register cost scale and restore the
old behavior for x86.
> On Mon, Feb 3, 2025 at 5:21 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.tools@gmail.com>:
> > > > >
> > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.tools@gmail.com>:
> > > > >>>
> > > > >>> Don't increase callee-saved register cost by 1000x, which leads to that
> > > > >>> callee-saved registers aren't used to preserve local variable values
> > > > >>> across calls, by capping the scale to 300.
> > > > >>
> > > > >>> PR rtl-optimization/111673
> > > > >>> PR rtl-optimization/115932
> > > > >>> PR rtl-optimization/116028
> > > > >>> PR rtl-optimization/117081
> > > > >>> PR rtl-optimization/118497
> > > > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> > > > >>> scale to 300.
> > > > >>>
> > > > >>> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
> > > > >>> ---
> > > > >>> gcc/ira-color.cc | 16 ++++++++++++++--
> > > > >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> > > > >>>
> > > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > > > >>> index 0699b349a1a..707ff188250 100644
> > > > >>> --- a/gcc/ira-color.cc
> > > > >>> +++ b/gcc/ira-color.cc
> > > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> > > > >>> /* We need to save/restore the hard register in
> > > > >>> epilogue/prologue. Therefore we increase the cost. */
> > > > >>> {
> > > > >>> + int scale;
> > > > >>> + if (optimize_size)
> > > > >>> + scale = 1;
> > > > >>> + else
> > > > >>> + {
> > > > >>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > > >>> + /* Don't increase callee-saved register cost by 1000x,
> > > > >>> + which leads to that callee-saved registers aren't
> > > > >>> + used to preserve local variable values across calls,
> > > > >>> + by capping the scale to 300. */
> > > > >>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > > > >>> + scale = 300;
> > > > >>
> > > > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected to scale this down to [0, 300] or is MAX a magic value?
> > > > >
> > > > > There are
> > > > >
> > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> > > > > This constant does not need to be high, as in infrequently executed
> > > > > regions we want to count instructions equivalently to optimize for
> > > > > size instead of speed. */
> > > > > #define REG_FREQ_MAX 1000
> > > > >
> > > > > /* Compute register frequency from the BB frequency. When optimizing for size,
> > > > > or profile driven feedback is available and the function is never executed,
> > > > > frequency is always equivalent. Otherwise rescale the basic block
> > > > > frequency. */
> > > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) \
> > > > > || !cfun->cfg->count_max.initialized_p ()) \
> > > > > ? REG_FREQ_MAX \
> > > > > : ((bb)->count.to_frequency (cfun) \
> > > > > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > > > > ? ((bb)->count.to_frequency (cfun) \
> > > > > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > > > > : 1)
> > > > >
> > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only want
> > > > > to get a more reasonable default scale, instead of 1000. Lower
> > > > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> > > >
> > > > I see. Why not adjust the above macro then? That would be a bit more obvious. Like use MAX/2 or so?
> > >
> > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > > Author: Surya Kumari Jangala <jskumari@linux.ibm.com>
> > > Date: Tue Jun 25 08:37:49 2024 -0500
> > >
> > > ira: Scale save/restore costs of callee save registers with block frequency
> > >
> > > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse.
> > > I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> > > not as a cost scale. Maybe the above commit should be reverted and we add
> > > a target hook for callee-saved register cost scale. Each target can choose
> > > a proper cost scale, install of increasing the cost by 1000x for everyone.
> >
> > I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least,
> > as it doesn't seem to be used. The comment talks about profile feedback,
> > but for example with -fprofile-correction or -fpartial-profile this
> > test looks odd.
> > In fact optimize_function_for_size_p should already handle this correctly.
count_max is computed in update_max_bb_count and if some basic blocks do
not have initialized count, they are ignored. So if you have any
initialized count, count_max should be greater or equal to it
after update call. This may be later changed if multiple code paths are
merged to one and then BB count can be somewhat larger than max.
With -fprofile-correction or -fpartial-profile all BBs sould have counts
initialized. If profile feedback is missing we keep guessed profile if
feedback is incoherent we only attemt to correct and it and drop quality
info. If some passes later create new BBs with unitialized profile,
they should be fixed.
What can happen is inlining function without profile (-O0) to function
with profile. In that case you get CFG with both initialized and
uninitialized BBs. But we probably do not care that much about
performance of htis.
> >
> > Also REG_FREQ_FROM_BB simply documents that in this case the
> > frequency will be equivalent for all BBs and not any particular value. The new
> > use might indeed not have the same constraints as others, instead of
> > a target hook making the "same value" another macro argument might be
> > a good first step.
> >
> > That said - does removing the || !cfun->cfg->count_max.initialized_p ()
> > test "fix" things?
>
> I don't think so. Frequency value can be anything, 1000 or 100000.
> It is relative. It is very odd to use it as the cost scale. Here is the v2
> patch
>
> https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674940.html
>
> to add a target hook for callee-saved register cost scale and restore the
> old behavior for x86.
Original prupose of REG_FREQ_FROM_BB macro was to translate new profile
counts to old BB frequencies. Those were saled to be in the range
0....REG_FREQ_MAX where REG_FREQ_MAX was set to 1000 so one can do third
power in 32bit value chaply. In early 2000s when this code was written
64bit arithmetics was expensive.
It also tries to imitate old code in register allocator which used
constnat value of 1000 for everything if profile was missing or we
optimize for size (and thus static count of reloads).
These days it is probably smoother to write code that cares about
relative frequencies using sreals unless it is very performance
critical.
Honza
> > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) \
> > > > || !cfun->cfg->count_max.initialized_p ()) \
> > > > ? REG_FREQ_MAX \
> > > > : ((bb)->count.to_frequency (cfun) \
> > > > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > > > ? ((bb)->count.to_frequency (cfun) \
> > > > * REG_FREQ_MAX / BB_FREQ_MAX) \
> > > > : 1)
> > > >
> > > > 1000 is the default. If it isn't 1000, it isn't the default. I only want
> > > > to get a more reasonable default scale, instead of 1000. Lower
> > > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> > >
> > > I see. Why not adjust the above macro then? That would be a bit more obvious. Like use MAX/2 or so?
> >
> > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > Author: Surya Kumari Jangala <jskumari@linux.ibm.com>
> > Date: Tue Jun 25 08:37:49 2024 -0500
> >
> > ira: Scale save/restore costs of callee save registers with block frequency
> >
> > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse.
> > I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> > not as a cost scale. Maybe the above commit should be reverted and we add
> > a target hook for callee-saved register cost scale. Each target can choose
> > a proper cost scale, install of increasing the cost by 1000x for everyone.
>
> I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least,
> as it doesn't seem to be used. The comment talks about profile feedback,
It is used by count.to_frequency, which basically computes count/max_count * REG_FREQ_MAX.
It aborts if max_count is uninitialized rather than returning arbitrary
value...
Honza
@@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
/* We need to save/restore the hard register in
epilogue/prologue. Therefore we increase the cost. */
{
+ int scale;
+ if (optimize_size)
+ scale = 1;
+ else
+ {
+ scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+ /* Don't increase callee-saved register cost by 1000x,
+ which leads to that callee-saved registers aren't
+ used to preserve local variable values across calls,
+ by capping the scale to 300. */
+ if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
+ scale = 300;
+ }
rclass = REGNO_REG_CLASS (hard_regno);
add_cost = ((ira_memory_move_cost[mode][rclass][0]
+ ira_memory_move_cost[mode][rclass][1])
* saved_nregs / hard_regno_nregs (hard_regno,
mode) - 1)
- * (optimize_size ? 1 :
- REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+ * scale;
cost += add_cost;
full_cost += add_cost;
}