Message ID | 000c01d94ec7$a6921430$f3b63c90$@nextmovesoftware.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A3B6B385B52E for <patchwork@sourceware.org>; Sat, 4 Mar 2023 18:32:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 2A2893858D28 for <gcc-patches@gcc.gnu.org>; Sat, 4 Mar 2023 18:32:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2A2893858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=3wOSjgvQYuqniGDsUPSYCk5xQmueoeNthA6cKA8q2dc=; b=RMsoneYnzlERFXY1/prnFZttgB ZJ4ms5l4DF+m+M7pM4LbTgI9u03H8jFS9H59P1S28bASS9dEn0oB8dUJ4u9GftzTQlhIvFE2AJ+OV CcYBSP9ICCMTm2Mvy3dSUmpsU8PSkzP3iKemshswcv03kpvgfVO6saNHPvV9GBqpOzJyP/r6H+db9 vLD1PW7VCb/VXE5MtVEdqZL0f94Kfr8ZggdwhjqXlOFnAhf32SKO/c5WREsCCzm3XB7XXFfo1vEUw 7ReTdcs6+Lc5njHRGB27myN5DDXlIAbcOTsf/gKznGcyn8bgB+FJwk9mP1Y3UNss1FLQZ06PUsz4j 7a/mOSIQ==; Received: from host86-163-35-31.range86-163.btcentralplus.com ([86.163.35.31]:63004 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from <roger@nextmovesoftware.com>) id 1pYWg6-0002M6-Tl; Sat, 04 Mar 2023 13:32:19 -0500 From: "Roger Sayle" <roger@nextmovesoftware.com> To: "'GCC Patches'" <gcc-patches@gcc.gnu.org> Cc: "'Tamar Christina'" <Tamar.Christina@arm.com>, "'Richard Sandiford'" <richard.sandiford@arm.com>, "'Segher Boessenkool'" <segher@kernel.crashing.org> Subject: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap. Date: Sat, 4 Mar 2023 18:32:15 -0000 Message-ID: <000c01d94ec7$a6921430$f3b63c90$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_000D_01D94EC7.A6921430" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdlOxlGLdZf5RsEKSy2T3BJGVUY5jA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.
|
|
Commit Message
Roger Sayle
March 4, 2023, 6:32 p.m. UTC
This patch addresses PR rtl-optimization/106594, a P1 performance regression affecting aarch64. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. If someone (who can regression test this on aarch64) could take this from here that would be much appreciated. Thanks in advance. 2023-03-04 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR rtl-optimization/106594 * combine.cc (expand_compound_operation): Don't expand/transform ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are cheap. Roger --
Comments
On Sat, Mar 04, 2023 at 06:32:15PM -0000, Roger Sayle wrote: > This patch addresses PR rtl-optimization/106594, a P1 performance > regression affecting aarch64. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. It should be tested for performance everywhere else, too. It can very easily result in worse code on some targets. This kind of thing really should be done in stage 1, not stage 4. > PR rtl-optimization/106594 > * combine.cc (expand_compound_operation): Don't expand/transform > ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are > cheap. That is not how combine works. If the old code is more expensive than what combine comes up with., it *should* transform it. Magic cost cutoffs are not okay anywhere in combine, either. If expand_compound_operation and friends misbehave (not really an "if", unfortunately), then please fix that, instead of randomly disabling parts of combine? Segher
The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled. The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take". So we need a way forward, even if it's stage-4. Thanks, Tamar
On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote: > > The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled. > > The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take". > > So we need a way forward, even if it's stage-4. Then it needs to be in a way that works within the design constraints of combine. As Segher has indicated, using a magic constant to say "this is always cheap enough" isn't acceptable. Furthermore, what this patch changes is combine's internal canonicalization of extensions into shift pairs. So I think another path forward needs to be found. I don't see hacking up expand_compound_operation is viable. Jeff
> > On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote: > > > > The regression was reported during stage-1. A patch was provided during > stage 1 and the discussions around combine stalled. > > > > The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just > to "take". > > > > So we need a way forward, even if it's stage-4. > Then it needs to be in a way that works within the design constraints of > combine. > > As Segher has indicated, using a magic constant to say "this is always cheap > enough" isn't acceptable. Furthermore, what this patch changes is combine's > internal canonicalization of extensions into shift pairs. > > So I think another path forward needs to be found. I don't see hacking up > expand_compound_operation is viable. I'm not arguing at all about the merits of the patch. My argument was about Segher saying he doesn't think this is a P1 regression or one that should be addressed in stage-4. We noticed and reported the regression early on during stage-1. So I'm unsure what else we should have done and it's not right to waive off fixing it now, otherwise what's the point in us filing bug reports. Tamar. > > Jeff
Hi! On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote: > > On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote: > > > The regression was reported during stage-1. A patch was provided during > > stage 1 and the discussions around combine stalled. > > > > > > The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just > > to "take". > > > > > > So we need a way forward, even if it's stage-4. > > Then it needs to be in a way that works within the design constraints of > > combine. > > > > As Segher has indicated, using a magic constant to say "this is always cheap > > enough" isn't acceptable. Furthermore, what this patch changes is combine's > > internal canonicalization of extensions into shift pairs. > > > > So I think another path forward needs to be found. I don't see hacking up > > expand_compound_operation is viable. > > I'm not arguing at all about the merits of the patch. My argument was about Segher saying he doesn't think this is a P1 regression or one that should be addressed in stage-4. That is not what I said (in the PR). I said: Either this should not be P1, or the proposed patch is taking completely the wrong direction. P1 means there is a regression. There is no regression in combine, in fact the proposed patch would *cause* regressions on many targets! (<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594#c13>) > We noticed and reported the regression early on during stage-1. So I'm unsure what else we should have done and it's not right to waive off fixing it now, otherwise what's the point in us filing bug reports. Something that fixes the regression is of course welcome. But something that *causes* many regressions is not. Something that makes compound_operation stuff better on all targets is more than welcome as well, but *better* on *all* targets, not regressing most. This really is stage 1 material most likely. I have been chipping away at this for years, I don't expect any trivial patch can help, and it certainly won't solve most problems here. Maybe a target hook for this is best. But not a completely ad-hoc one, something usable and maintainable please. So, it should say we do not want certain kinds of code (or what kind of code we want instead), and it should not add magic to the bowels of basic passes, magic that just happens to make the code of particular testcases look better on a particular target. Yes, *look* better: I have seen no proof or indication that this would actually generate better code, not even on just aarch, let alone on the majority of targets. As I said I have a test running, you may be lucky even :-) It has to run for about six hours more and after that it needs analysis still (a few more hours if it isn't obviously always better or worse), so expect results tomorrow night at the earliest. Segher
Hi! On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote: > On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote: > Yes, *look* better: I have seen no proof or indication that this would ("looks", I cannot type, sorry) > actually generate better code, not even on just aarch, let alone on the > majority of targets. As I said I have a test running, you may be lucky > even :-) It has to run for about six hours more and after that it needs > analysis still (a few more hours if it isn't obviously always better or > worse), so expect results tomorrow night at the earliest. The results are in: $ perl sizes.pl --percent C[12] C1 C2 alpha 7082243 100.066% arc 4207975 100.015% arm 11518624 100.008% arm64 24514565 100.067% armhf 16661684 100.098% csky 4031841 100.002% i386 0 0 ia64 20354295 100.029% m68k 4394084 100.023% microblaze 6549965 100.014% mips 10684680 100.024% mips64 8171850 100.002% nios2 4356713 100.012% openrisc 5010570 100.003% parisc 8406294 100.002% parisc64 0 0 powerpc 11104901 99.992% powerpc64 24532358 100.057% powerpc64le 21293219 100.062% riscv32 2028474 100.131% riscv64 9515453 100.120% s390 20519612 100.279% sh 0 0 shnommu 1840960 100.012% sparc 5314422 100.004% sparc64 7964129 99.992% x86_64 0 0 xtensa 2925723 100.070% C1 is the original, C2 with your patch. These numbers are the code sizes of a Linux kernel, some defconfig for every arch. This is a good measure of how effective combine was. The patch is a tiny win for sparc64 and classic powerpc32 only, but bad everywhere else. Look at that s390 number! Or riscv, or most of the arm variants (including aarch64). Do you want me to look in detail what causes this regression on some particular target, i.e. why we really still need the expand_compound functionality there? (Btw. "0" means the target did not build. For the x86 targets this is just more -Werror madness that seeped in it seems. For parisc64 and sh it is the choice of config. Will fix.) Segher
> Hi! > > On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote: > > On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote: > > Yes, *look* better: I have seen no proof or indication that this would > > ("looks", I cannot type, sorry) > > > actually generate better code, not even on just aarch, let alone on > > the majority of targets. As I said I have a test running, you may be > > lucky even :-) It has to run for about six hours more and after that > > it needs analysis still (a few more hours if it isn't obviously always > > better or worse), so expect results tomorrow night at the earliest. > > The results are in: > > $ perl sizes.pl --percent C[12] > C1 C2 > alpha 7082243 100.066% > arc 4207975 100.015% > arm 11518624 100.008% > arm64 24514565 100.067% > armhf 16661684 100.098% > csky 4031841 100.002% > i386 0 0 > ia64 20354295 100.029% > m68k 4394084 100.023% > microblaze 6549965 100.014% > mips 10684680 100.024% > mips64 8171850 100.002% > nios2 4356713 100.012% > openrisc 5010570 100.003% > parisc 8406294 100.002% > parisc64 0 0 > powerpc 11104901 99.992% > powerpc64 24532358 100.057% > powerpc64le 21293219 100.062% > riscv32 2028474 100.131% > riscv64 9515453 100.120% > s390 20519612 100.279% > sh 0 0 > shnommu 1840960 100.012% > sparc 5314422 100.004% > sparc64 7964129 99.992% > x86_64 0 0 > xtensa 2925723 100.070% > > > C1 is the original, C2 with your patch. These numbers are the code sizes of a > Linux kernel, some defconfig for every arch. This is a good measure of how > effective combine was. > > The patch is a tiny win for sparc64 and classic powerpc32 only, but bad > everywhere else. Look at that s390 number! Or riscv, or most of the arm > variants (including aarch64). > > Do you want me to look in detail what causes this regression on some > particular target, i.e. why we really still need the expand_compound > functionality there? > Hi, Thanks for having a look! I think the Richards are exploring a different solution on the PR so I don't think it's worth looking at now (maybe in stage-1?). Thanks for checking though! I Appreciate you all helping to get this fixed! Kind Regards, Tamar > (Btw. "0" means the target did not build. For the x86 targets this is just more > -Werror madness that seeped in it seems. For parisc64 and sh it is the choice > of config. Will fix.) > > > Segher
diff --git a/gcc/combine.cc b/gcc/combine.cc index 0538795..cf126c8 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -7288,7 +7288,17 @@ expand_compound_operation (rtx x) && (STORE_FLAG_VALUE & ~GET_MODE_MASK (inner_mode)) == 0) return SUBREG_REG (XEXP (x, 0)); + /* If ZERO_EXTEND is cheap on this target, do nothing, + i.e. don't attempt to convert it to a pair of shifts. */ + if (set_src_cost (x, mode, optimize_this_for_speed_p) + <= COSTS_N_INSNS (1)) + return x; } + /* Likewise, if SIGN_EXTEND is cheap, do nothing. */ + else if (GET_CODE (x) == SIGN_EXTEND + && set_src_cost (x, mode, optimize_this_for_speed_p) + <= COSTS_N_INSNS (1)) + return x; /* If we reach here, we want to return a pair of shifts. The inner shift is a left shift of BITSIZE - POS - LEN bits. The outer