[2/2] ivopts: Revert register pressure cost when there are enough registers.

Message ID 20221221131214.190579-3-dimitrije.milosevic@syrmia.com
State New
Headers
Series : Fix address cost complexity and register pressure cost calculation. |

Commit Message

Dimitrije Milošević Dec. 21, 2022, 1:12 p.m. UTC
  When there are enough registers, the register pressure cost is
unnecessarily bumped by adding another n_cands.

This behavior may result in register pressure costs for the case
when there are enough registers being higher than for other cases.

When there are enough registers, the register pressure cost should be
equal to n_invs + n_cands.

This used to be the case before c18101f.

gcc/ChangeLog:

	* tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.

Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
---
 gcc/tree-ssa-loop-ivopts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Richard Biener May 15, 2023, 10:44 a.m. UTC | #1
On Wed, Dec 21, 2022 at 2:12 PM Dimitrije Milošević
<dimitrije.milosevic@syrmia.com> wrote:
>
> When there are enough registers, the register pressure cost is
> unnecessarily bumped by adding another n_cands.
>
> This behavior may result in register pressure costs for the case
> when there are enough registers being higher than for other cases.
>
> When there are enough registers, the register pressure cost should be
> equal to n_invs + n_cands.
>
> This used to be the case before c18101f.
>
> gcc/ChangeLog:
>
>         * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
>
> Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> ---
>  gcc/tree-ssa-loop-ivopts.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 60c61dc9e49..3176482d0d9 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -6092,7 +6092,7 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
>
>    /* If we have enough registers.  */
>    if (regs_needed + target_res_regs < available_regs)
> -    cost = n_new;
> +    return n_new;

This still doesn't make much sense (before nor after).  We're
comparing apples and oranges.

I think it would make most sense to merge this case with the following
and thus do
the following.  The distinction between the cases should be preserved
and attenuated
by the adding of n_cands at the end (as tie-breaker).

Does this help the mips case?  I'm going to throw it at x86_64-linux
bootstrap/regtest.

Btw, I don't think using address complexity makes much sense for a port that
has only one addressing mode so I guess a better approach for 1/2 would be
to make sure it is consistently the same value (I suppose it is not, otherwise
you wouldn't have changed it).  Oh, and we're adding the
reg-pressure cost to the same bucket as well, and there we don't really know
how many times we're going to spill.  That said, I think ->complexity should
rather go away - we are asking for address-cost already and IVOPTs uses
built RTX to query the target.

But yes, I agree ivopts_estimate_reg_pressure has an issue.

Sorry for the very long delay,
Richard.

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 6fbd2d59318..bc8493622de 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -6077,8 +6077,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data
*data, unsigned n_invs,
                              unsigned n_cands)
 {
   unsigned cost;
-  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
-  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
+  unsigned n_old = data->regs_used;
+  unsigned regs_needed = n_invs + n_cands + n_old;
+  unsigned available_regs = target_avail_regs;
   bool speed = data->speed;

   /* If there is a call in the loop body, the call-clobbered registers
@@ -6087,10 +6088,7 @@ ivopts_estimate_reg_pressure (struct
ivopts_data *data, unsigned n_invs,
     available_regs = available_regs - target_clobbered_regs;

   /* If we have enough registers.  */
-  if (regs_needed + target_res_regs < available_regs)
-    cost = n_new;
-  /* If close to running out of registers, try to preserve them.  */
-  else if (regs_needed <= available_regs)
+  if (regs_needed <= available_regs)
     cost = target_reg_cost [speed] * regs_needed;
   /* If we run out of available registers but the number of candidates
      does not, we penalize extra registers using target_spill_cost.  */


>    /* If close to running out of registers, try to preserve them.  */
>    else if (regs_needed <= available_regs)
>      cost = target_reg_cost [speed] * regs_needed;
> --
> 2.25.1
>
  
Richard Biener May 15, 2023, 12:23 p.m. UTC | #2
On Mon, May 15, 2023 at 12:44 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Dec 21, 2022 at 2:12 PM Dimitrije Milošević
> <dimitrije.milosevic@syrmia.com> wrote:
> >
> > When there are enough registers, the register pressure cost is
> > unnecessarily bumped by adding another n_cands.
> >
> > This behavior may result in register pressure costs for the case
> > when there are enough registers being higher than for other cases.
> >
> > When there are enough registers, the register pressure cost should be
> > equal to n_invs + n_cands.
> >
> > This used to be the case before c18101f.
> >
> > gcc/ChangeLog:
> >
> >         * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> >
> > Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> > ---
> >  gcc/tree-ssa-loop-ivopts.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index 60c61dc9e49..3176482d0d9 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -6092,7 +6092,7 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
> >
> >    /* If we have enough registers.  */
> >    if (regs_needed + target_res_regs < available_regs)
> > -    cost = n_new;
> > +    return n_new;
>
> This still doesn't make much sense (before nor after).  We're
> comparing apples and oranges.
>
> I think it would make most sense to merge this case with the following
> and thus do
> the following.  The distinction between the cases should be preserved
> and attenuated
> by the adding of n_cands at the end (as tie-breaker).
>
> Does this help the mips case?  I'm going to throw it at x86_64-linux
> bootstrap/regtest.
>
> Btw, I don't think using address complexity makes much sense for a port that
> has only one addressing mode so I guess a better approach for 1/2 would be
> to make sure it is consistently the same value (I suppose it is not, otherwise
> you wouldn't have changed it).  Oh, and we're adding the
> reg-pressure cost to the same bucket as well, and there we don't really know
> how many times we're going to spill.  That said, I think ->complexity should
> rather go away - we are asking for address-cost already and IVOPTs uses
> built RTX to query the target.
>
> But yes, I agree ivopts_estimate_reg_pressure has an issue.
>
> Sorry for the very long delay,
> Richard.

The patch below bootstraps and regtests ok on x86_64-unknown-linux-gnu,
but I guess that doesn't mean much.

Richard.

> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 6fbd2d59318..bc8493622de 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -6077,8 +6077,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data
> *data, unsigned n_invs,
>                               unsigned n_cands)
>  {
>    unsigned cost;
> -  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> -  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> +  unsigned n_old = data->regs_used;
> +  unsigned regs_needed = n_invs + n_cands + n_old;
> +  unsigned available_regs = target_avail_regs;
>    bool speed = data->speed;
>
>    /* If there is a call in the loop body, the call-clobbered registers
> @@ -6087,10 +6088,7 @@ ivopts_estimate_reg_pressure (struct
> ivopts_data *data, unsigned n_invs,
>      available_regs = available_regs - target_clobbered_regs;
>
>    /* If we have enough registers.  */
> -  if (regs_needed + target_res_regs < available_regs)
> -    cost = n_new;
> -  /* If close to running out of registers, try to preserve them.  */
> -  else if (regs_needed <= available_regs)
> +  if (regs_needed <= available_regs)
>      cost = target_reg_cost [speed] * regs_needed;
>    /* If we run out of available registers but the number of candidates
>       does not, we penalize extra registers using target_spill_cost.  */
>
>
> >    /* If close to running out of registers, try to preserve them.  */
> >    else if (regs_needed <= available_regs)
> >      cost = target_reg_cost [speed] * regs_needed;
> > --
> > 2.25.1
> >
  
Jovan Dmitrovic May 15, 2023, 2:32 p.m. UTC | #3
Hi Richard,
I had pinged the community about this problem back in March, and I will be taking Dimitrije's place, considering he isn't working on these patches anymore.

Your solution for 2/2 seems reasonable, I don't exactly know why target_reg_cost hasn't been accounted for in the first case, nor do I know why that particular case was specified at all.

I will get back to you when I have researched 1/2 a bit more thoroughly.

Regards,
Jovan

________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Monday, May 15, 2023 2:23 PM
To: Dimitrije Milošević <dimitrije.milosevic@syrmia.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Djordje Todorovic <Djordje.Todorovic@syrmia.com>; jeffreyalaw@gmail.com <jeffreyalaw@gmail.com>
Subject: Re: [PATCH 2/2] ivopts: Revert register pressure cost when there are enough registers.

On Mon, May 15, 2023 at 12:44 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Dec 21, 2022 at 2:12 PM Dimitrije Milošević
> <dimitrije.milosevic@syrmia.com> wrote:
> >
> > When there are enough registers, the register pressure cost is
> > unnecessarily bumped by adding another n_cands.
> >
> > This behavior may result in register pressure costs for the case
> > when there are enough registers being higher than for other cases.
> >
> > When there are enough registers, the register pressure cost should be
> > equal to n_invs + n_cands.
> >
> > This used to be the case before c18101f.
> >
> > gcc/ChangeLog:
> >
> >         * tree-ssa-loop-ivopts.cc (ivopts_estimate_reg_pressure): Adjust.
> >
> > Signed-off-by: Dimitrije Milosevic <dimitrije.milosevic@syrmia.com>
> > ---
> >  gcc/tree-ssa-loop-ivopts.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index 60c61dc9e49..3176482d0d9 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -6092,7 +6092,7 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
> >
> >    /* If we have enough registers.  */
> >    if (regs_needed + target_res_regs < available_regs)
> > -    cost = n_new;
> > +    return n_new;
>
> This still doesn't make much sense (before nor after).  We're
> comparing apples and oranges.
>
> I think it would make most sense to merge this case with the following
> and thus do
> the following.  The distinction between the cases should be preserved
> and attenuated
> by the adding of n_cands at the end (as tie-breaker).
>
> Does this help the mips case?  I'm going to throw it at x86_64-linux
> bootstrap/regtest.
>
> Btw, I don't think using address complexity makes much sense for a port that
> has only one addressing mode so I guess a better approach for 1/2 would be
> to make sure it is consistently the same value (I suppose it is not, otherwise
> you wouldn't have changed it).  Oh, and we're adding the
> reg-pressure cost to the same bucket as well, and there we don't really know
> how many times we're going to spill.  That said, I think ->complexity should
> rather go away - we are asking for address-cost already and IVOPTs uses
> built RTX to query the target.
>
> But yes, I agree ivopts_estimate_reg_pressure has an issue.
>
> Sorry for the very long delay,
> Richard.

The patch below bootstraps and regtests ok on x86_64-unknown-linux-gnu,
but I guess that doesn't mean much.

Richard.

> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 6fbd2d59318..bc8493622de 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -6077,8 +6077,9 @@ ivopts_estimate_reg_pressure (struct ivopts_data
> *data, unsigned n_invs,
>                               unsigned n_cands)
>  {
>    unsigned cost;
> -  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> -  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> +  unsigned n_old = data->regs_used;
> +  unsigned regs_needed = n_invs + n_cands + n_old;
> +  unsigned available_regs = target_avail_regs;
>    bool speed = data->speed;
>
>    /* If there is a call in the loop body, the call-clobbered registers
> @@ -6087,10 +6088,7 @@ ivopts_estimate_reg_pressure (struct
> ivopts_data *data, unsigned n_invs,
>      available_regs = available_regs - target_clobbered_regs;
>
>    /* If we have enough registers.  */
> -  if (regs_needed + target_res_regs < available_regs)
> -    cost = n_new;
> -  /* If close to running out of registers, try to preserve them.  */
> -  else if (regs_needed <= available_regs)
> +  if (regs_needed <= available_regs)
>      cost = target_reg_cost [speed] * regs_needed;
>    /* If we run out of available registers but the number of candidates
>       does not, we penalize extra registers using target_spill_cost.  */
>
>
> >    /* If close to running out of registers, try to preserve them.  */
> >    else if (regs_needed <= available_regs)
> >      cost = target_reg_cost [speed] * regs_needed;
> > --
> > 2.25.1
> >
  

Patch

diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 60c61dc9e49..3176482d0d9 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -6092,7 +6092,7 @@  ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
 
   /* If we have enough registers.  */
   if (regs_needed + target_res_regs < available_regs)
-    cost = n_new;
+    return n_new;
   /* If close to running out of registers, try to preserve them.  */
   else if (regs_needed <= available_regs)
     cost = target_reg_cost [speed] * regs_needed;