PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

Message ID 000c01d94ec7$a6921430$f3b63c90$@nextmovesoftware.com
State New
Headers
Series PR rtl-optimization/106594: Preserve zero_extend in combine when cheap. |

Commit Message

Roger Sayle March 4, 2023, 6:32 p.m. UTC
  This patch addresses PR rtl-optimization/106594, a P1 performance
regression affecting aarch64.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.

If someone (who can regression test this on aarch64) could take this
from here that would be much appreciated.  Thanks in advance.


2023-03-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR rtl-optimization/106594
        * combine.cc (expand_compound_operation): Don't expand/transform
        ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are
        cheap.

Roger
--
  

Comments

Segher Boessenkool March 4, 2023, 10:17 p.m. UTC | #1
On Sat, Mar 04, 2023 at 06:32:15PM -0000, Roger Sayle wrote:
> This patch addresses PR rtl-optimization/106594, a P1 performance
> regression affecting aarch64.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.

It should be tested for performance everywhere else, too.

It can very easily result in worse code on some targets.  This kind of
thing really should be done in stage 1, not stage 4.

>         PR rtl-optimization/106594
>         * combine.cc (expand_compound_operation): Don't expand/transform
>         ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are
>         cheap.

That is not how combine works.  If the old code is more expensive than
what combine comes up with., it *should* transform it.  Magic cost
cutoffs are not okay anywhere in combine, either.

If expand_compound_operation and friends misbehave (not really an "if",
unfortunately), then please fix that, instead of randomly disabling
parts of combine?


Segher
  
Tamar Christina March 5, 2023, 7:28 p.m. UTC | #2
The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled.

The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take".

So we need a way forward, even if it's stage-4.

Thanks,
Tamar
  
Jeff Law March 5, 2023, 7:56 p.m. UTC | #3
On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
> 
> The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled.
> 
> The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take".
> 
> So we need a way forward, even if it's stage-4.
Then it needs to be in a way that works within the design constraints of 
combine.

As Segher has indicated, using a magic constant to say "this is always 
cheap enough" isn't acceptable.  Furthermore, what this patch changes is 
combine's internal canonicalization of extensions into shift pairs.

So I think another path forward needs to be found.  I don't see hacking 
up expand_compound_operation is viable.

Jeff
  
Tamar Christina March 5, 2023, 8:43 p.m. UTC | #4
> 
> On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
> >
> > The regression was reported during stage-1. A patch was provided during
> stage 1 and the discussions around combine stalled.
> >
> > The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just
> to "take".
> >
> > So we need a way forward, even if it's stage-4.
> Then it needs to be in a way that works within the design constraints of
> combine.
> 
> As Segher has indicated, using a magic constant to say "this is always cheap
> enough" isn't acceptable.  Furthermore, what this patch changes is combine's
> internal canonicalization of extensions into shift pairs.
> 
> So I think another path forward needs to be found.  I don't see hacking up
> expand_compound_operation is viable.

I'm not arguing at all about the merits of the patch. My argument was about Segher saying he doesn't think this is a P1 regression or one that should be addressed in stage-4.

We noticed and reported the regression early on during stage-1.  So I'm unsure what else we should have done and it's not right to waive off fixing it now, otherwise what's the point in us filing bug reports.

Tamar.
> 
> Jeff
  
Segher Boessenkool March 5, 2023, 9:33 p.m. UTC | #5
Hi!

On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote:
> > On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
> > > The regression was reported during stage-1. A patch was provided during
> > stage 1 and the discussions around combine stalled.
> > >
> > > The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just
> > to "take".
> > >
> > > So we need a way forward, even if it's stage-4.
> > Then it needs to be in a way that works within the design constraints of
> > combine.
> > 
> > As Segher has indicated, using a magic constant to say "this is always cheap
> > enough" isn't acceptable.  Furthermore, what this patch changes is combine's
> > internal canonicalization of extensions into shift pairs.
> > 
> > So I think another path forward needs to be found.  I don't see hacking up
> > expand_compound_operation is viable.
> 
> I'm not arguing at all about the merits of the patch. My argument was about Segher saying he doesn't think this is a P1 regression or one that should be addressed in stage-4.

That is not what I said (in the PR).  I said:
  Either this should not be P1, or the proposed patch is taking
  completely the wrong direction.  P1 means there is a regression.
  There is no regression in combine, in fact the proposed patch would
  *cause* regressions on many targets!
(<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594#c13>)

> We noticed and reported the regression early on during stage-1.  So I'm unsure what else we should have done and it's not right to waive off fixing it now, otherwise what's the point in us filing bug reports.

Something that fixes the regression is of course welcome.  But something
that *causes* many regressions is not.  Something that makes
compound_operation stuff better on all targets is more than welcome as
well, but *better* on *all* targets, not regressing most.  This really
is stage 1 material most likely.  I have been chipping away at this for
years, I don't expect any trivial patch can help, and it certainly won't
solve most problems here.

Maybe a target hook for this is best.  But not a completely ad-hoc one,
something usable and maintainable please.  So, it should say we do not
want certain kinds of code (or what kind of code we want instead), and
it should not add magic to the bowels of basic passes, magic that just
happens to make the code of particular testcases look better on a
particular target.

Yes, *look* better: I have seen no proof or indication that this would
actually generate better code, not even on just aarch, let alone on the
majority of targets.  As I said I have a test running, you may be lucky
even :-)  It has to run for about six hours more and after that it needs
analysis still (a few more hours if it isn't obviously always better or
worse), so expect results tomorrow night at the earliest.


Segher
  
Segher Boessenkool March 6, 2023, 12:08 p.m. UTC | #6
Hi!

On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote:
> Yes, *look* better: I have seen no proof or indication that this would

("looks", I cannot type, sorry)

> actually generate better code, not even on just aarch, let alone on the
> majority of targets.  As I said I have a test running, you may be lucky
> even :-)  It has to run for about six hours more and after that it needs
> analysis still (a few more hours if it isn't obviously always better or
> worse), so expect results tomorrow night at the earliest.

The results are in:

$ perl sizes.pl --percent C[12]
                    C1        C2
       alpha   7082243  100.066%
         arc   4207975  100.015%
         arm  11518624  100.008%
       arm64  24514565  100.067%
       armhf  16661684  100.098%
        csky   4031841  100.002%
        i386         0         0
        ia64  20354295  100.029%
        m68k   4394084  100.023%
  microblaze   6549965  100.014%
        mips  10684680  100.024%
      mips64   8171850  100.002%
       nios2   4356713  100.012%
    openrisc   5010570  100.003%
      parisc   8406294  100.002%
    parisc64         0         0
     powerpc  11104901   99.992%
   powerpc64  24532358  100.057%
 powerpc64le  21293219  100.062%
     riscv32   2028474  100.131%
     riscv64   9515453  100.120%
        s390  20519612  100.279%
          sh         0         0
     shnommu   1840960  100.012%
       sparc   5314422  100.004%
     sparc64   7964129   99.992%
      x86_64         0         0
      xtensa   2925723  100.070%


C1 is the original, C2 with your patch.  These numbers are the code
sizes of a Linux kernel, some defconfig for every arch.  This is a good
measure of how effective combine was.

The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
everywhere else.  Look at that s390 number!  Or riscv, or most of the
arm variants (including aarch64).

Do you want me to look in detail what causes this regression on some
particular target, i.e. why we really still need the expand_compound
functionality there?

(Btw.  "0" means the target did not build.  For the x86 targets this is
just more -Werror madness that seeped in it seems.  For parisc64 and sh
it is the choice of config.  Will fix.)


Segher
  
Tamar Christina March 6, 2023, 12:11 p.m. UTC | #7
> Hi!
> 
> On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> > On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote:
> > Yes, *look* better: I have seen no proof or indication that this would
> 
> ("looks", I cannot type, sorry)
> 
> > actually generate better code, not even on just aarch, let alone on
> > the majority of targets.  As I said I have a test running, you may be
> > lucky even :-)  It has to run for about six hours more and after that
> > it needs analysis still (a few more hours if it isn't obviously always
> > better or worse), so expect results tomorrow night at the earliest.
> 
> The results are in:
> 
> $ perl sizes.pl --percent C[12]
>                     C1        C2
>        alpha   7082243  100.066%
>          arc   4207975  100.015%
>          arm  11518624  100.008%
>        arm64  24514565  100.067%
>        armhf  16661684  100.098%
>         csky   4031841  100.002%
>         i386         0         0
>         ia64  20354295  100.029%
>         m68k   4394084  100.023%
>   microblaze   6549965  100.014%
>         mips  10684680  100.024%
>       mips64   8171850  100.002%
>        nios2   4356713  100.012%
>     openrisc   5010570  100.003%
>       parisc   8406294  100.002%
>     parisc64         0         0
>      powerpc  11104901   99.992%
>    powerpc64  24532358  100.057%
>  powerpc64le  21293219  100.062%
>      riscv32   2028474  100.131%
>      riscv64   9515453  100.120%
>         s390  20519612  100.279%
>           sh         0         0
>      shnommu   1840960  100.012%
>        sparc   5314422  100.004%
>      sparc64   7964129   99.992%
>       x86_64         0         0
>       xtensa   2925723  100.070%
> 
> 
> C1 is the original, C2 with your patch.  These numbers are the code sizes of a
> Linux kernel, some defconfig for every arch.  This is a good measure of how
> effective combine was.
> 
> The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
> everywhere else.  Look at that s390 number!  Or riscv, or most of the arm
> variants (including aarch64).
> 
> Do you want me to look in detail what causes this regression on some
> particular target, i.e. why we really still need the expand_compound
> functionality there?
> 

Hi,

Thanks for having a look! I think the Richards are exploring a different solution on the PR
so I don't think it's worth looking at now (maybe in stage-1?).  Thanks for checking though!

I Appreciate you all helping to get this fixed!

Kind Regards,
Tamar

> (Btw.  "0" means the target did not build.  For the x86 targets this is just more
> -Werror madness that seeped in it seems.  For parisc64 and sh it is the choice
> of config.  Will fix.)
> 
> 
> Segher
  

Patch

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 0538795..cf126c8 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7288,7 +7288,17 @@  expand_compound_operation (rtx x)
 	  && (STORE_FLAG_VALUE & ~GET_MODE_MASK (inner_mode)) == 0)
 	return SUBREG_REG (XEXP (x, 0));
 
+      /* If ZERO_EXTEND is cheap on this target, do nothing,
+	 i.e. don't attempt to convert it to a pair of shifts.  */
+      if (set_src_cost (x, mode, optimize_this_for_speed_p)
+	  <= COSTS_N_INSNS (1))
+	return x;
     }
+  /* Likewise, if SIGN_EXTEND is cheap, do nothing.  */
+  else if (GET_CODE (x) == SIGN_EXTEND
+	   && set_src_cost (x, mode, optimize_this_for_speed_p)
+	      <= COSTS_N_INSNS (1))
+    return x;
 
   /* If we reach here, we want to return a pair of shifts.  The inner
      shift is a left shift of BITSIZE - POS - LEN bits.  The outer