Message ID | 00f501d86e81$5c48c8a0$14da59e0$@nextmovesoftware.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3CC26383A339 for <patchwork@sourceware.org>; Mon, 23 May 2022 08:45:14 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id E71EC3856241 for <gcc-patches@gcc.gnu.org>; Mon, 23 May 2022 08:44:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E71EC3856241 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=zVOQXwtFZk+DfJynDlBjMLWdwIbKiWqAQOc7k8RZdQM=; b=cbLdhMOUoldEfS18j5lbTqyuxo 16Q1d6KSUp04ASaf04DXgFuHZMUZ8r1VRQ0/bgkoXQFKRoeZNfn1d7yNP6iJ15hh44MEHPuq3ZPly gVd0+jpZUE7/iopkl9HolKmefLwNt6qISXK/SzLYyQtRfzPQt44smldpu2SDlM8OJf73Zh3C6f2t1 5XF0pXGCiTw0Odp66leQf/XH4unOMMBUy6Dr9d53KimnB27mIOCQDP/jd7WcimKEP+JHLGqRgTrRF p5rFL9UyRipy111sl2QPh9ZZUGk3iV9Z7NwyDzqspCiDPiWyALPZPpKrrSUAuIeg4LMetW4ManXem ZA2o+QxQ==; Received: from host109-154-46-241.range109-154.btcentralplus.com ([109.154.46.241]:51539 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <roger@nextmovesoftware.com>) id 1nt3gI-0008MS-D1; Mon, 23 May 2022 04:44:50 -0400 From: "Roger Sayle" <roger@nextmovesoftware.com> To: <gcc-patches@gcc.gnu.org> Subject: [x86 PING] Peephole pand;pxor into pandn Date: Mon, 23 May 2022 09:44:46 +0100 Message-ID: <00f501d86e81$5c48c8a0$14da59e0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_00F6_01D86E89.BE0E6920" X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adhuf202sg/gnYepShGTug/yteQH+g== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[x86,PING] Peephole pand;pxor into pandn
|
|
Commit Message
Roger Sayle
May 23, 2022, 8:44 a.m. UTC
This is a ping of a patch from April (a dependency of another stage1 patch): https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593123.html This patch has been refreshed/retested against gcc 13 trunk on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-05-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/sse.md (peephole2): Convert suitable pand followed by pxor into pandn, i.e. (X&Y)^X into X & ~Y. Many thanks in advance, Roger --
Comments
On Mon, May 23, 2022 at 10:44 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This is a ping of a patch from April (a dependency of another stage1 patch): > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593123.html > > This patch has been refreshed/retested against gcc 13 trunk on > x86_64-pc-linux-gnu with make bootstrap and make -k check, > both with and without --target_board=unix{-m32}, with no new failures. > Ok for mainline? I think this should be handled in a pre-reload splitter (or perhaps combine splitter). We have so many variants of SSE/AVX logic instructions that the transform after reload barely makes sense (please see the number of regno checks in the proposed patch). Uros. > 2022-05-23 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/sse.md (peephole2): Convert suitable pand followed > by pxor into pandn, i.e. (X&Y)^X into X & ~Y. > > Many thanks in advance, > Roger > -- >
Hi Uros, Thanks for the speedy review. The point of this patch is that (with pending changes to STV) the pand;pxor sequence isn't created until after combine, and hence doesn't/won't get caught by any of the current pre-reload/combine splitters. > -----Original Message----- > From: Uros Bizjak <ubizjak@gmail.com> > Sent: 23 May 2022 09:51 > To: Roger Sayle <roger@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PING] Peephole pand;pxor into pandn > > On Mon, May 23, 2022 at 10:44 AM Roger Sayle > <roger@nextmovesoftware.com> wrote: > > > > > > This is a ping of a patch from April (a dependency of another stage1 patch): > > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593123.html > > > > This patch has been refreshed/retested against gcc 13 trunk on > > x86_64-pc-linux-gnu with make bootstrap and make -k check, both with > > and without --target_board=unix{-m32}, with no new failures. > > Ok for mainline? > > I think this should be handled in a pre-reload splitter (or perhaps combine > splitter). We have so many variants of SSE/AVX logic instructions that the > transform after reload barely makes sense (please see the number of regno > checks in the proposed patch). > > Uros. > > > 2022-05-23 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > * config/i386/sse.md (peephole2): Convert suitable pand followed > > by pxor into pandn, i.e. (X&Y)^X into X & ~Y. > > > > Many thanks in advance, > > Roger > > -- > >
On Mon, May 23, 2022 at 10:59 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Uros, > > Thanks for the speedy review. The point of this patch is that (with > pending changes to STV) the pand;pxor sequence isn't created until > after combine, and hence doesn't/won't get caught by any of the > current pre-reload/combine splitters. IMO this happens due to inconsistencies between integer and vector set, where integer andn is absent without BMI. However, we don't re-run the combine after reload, and I don't think it is worth to reimplement it via peephole2 patterns. Please note that AVX allows much more combinations that are not catched by your patch, and considering that combine already does the transformation, I don't see a compelling reason for this very specialized peephole2. Let's keep the patch shelved until a testcase shows the benefits of the patch. Uros. > > > > -----Original Message----- > > From: Uros Bizjak <ubizjak@gmail.com> > > Sent: 23 May 2022 09:51 > > To: Roger Sayle <roger@nextmovesoftware.com> > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [x86 PING] Peephole pand;pxor into pandn > > > > On Mon, May 23, 2022 at 10:44 AM Roger Sayle > > <roger@nextmovesoftware.com> wrote: > > > > > > > > > This is a ping of a patch from April (a dependency of another stage1 patch): > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593123.html > > > > > > This patch has been refreshed/retested against gcc 13 trunk on > > > x86_64-pc-linux-gnu with make bootstrap and make -k check, both with > > > and without --target_board=unix{-m32}, with no new failures. > > > Ok for mainline? > > > > I think this should be handled in a pre-reload splitter (or perhaps combine > > splitter). We have so many variants of SSE/AVX logic instructions that the > > transform after reload barely makes sense (please see the number of regno > > checks in the proposed patch). > > > > Uros. > > > > > 2022-05-23 Roger Sayle <roger@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > * config/i386/sse.md (peephole2): Convert suitable pand followed > > > by pxor into pandn, i.e. (X&Y)^X into X & ~Y. > > > > > > Many thanks in advance, > > > Roger > > > -- > > > >
Hi Uros, Hopefully, if I explain even more of the context, you'll better understand why this harmless (and at worse seemingly redundant) peephole2 is actually critical for addressing significant regressions in the compiler without introducing new testsuite failures. I wouldn't ask (again), if I didn't feel it's important. Basically, I'm trying to unblock Hongtao's patch (for PR target/104610) which in your own review, explained is better handled by/during STV: https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594070.html Unfortunately, that patch of mine to STV (that I want to ping next) that solves the P2 code quality regression PR target/70321, is itself blocked by another review of yours: https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593200.html where this fix (alone) leads to a regression of the test case pr65105-5.c. This pending regression has nothing to do with TARGET_BMI's andn, but the idiom "if ((x & y) != y)" on ia32, where x and y are DImode, and stv/reload has decided to place these values in SSE registers. After combine we have an *anddi3_doubleword and *cmpdi3_doubleword: (insn 22 21 23 4 (parallel [ (set (reg:DI 97) (and:DI (reg/v:DI 92 [ p2 ]) (reg:DI 88 [ _25 ]))) (clobber (reg:CC 17 flags)) ]) "pr65105-5.c":20:18 530 {*anddi3_doubleword} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 23 22 24 4 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:DI 92 [ p2 ]) (reg:DI 97))) "pr65105-5.c":20:8 29 {*cmpdi_doubleword} (expr_list:REG_DEAD (reg:DI 97) (nil))) After STV we have: (insn 22 21 45 4 (set (subreg:V2DI (reg:DI 97) 0) (and:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0) (subreg:V2DI (reg:DI 88 [ _25 ]) 0))) "pr65105-5.c":20:18 6640 {*andv2di3} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 45 22 46 4 (set (reg:V2DI 103) (xor:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0) (subreg:V2DI (reg:DI 97) 0))) "pr65105-5.c":20:8 -1 (nil)) (insn 46 45 23 4 (set (reg:V2DI 103) (vec_select:V2DI (vec_concat:V4DI (reg:V2DI 103) (reg:V2DI 103)) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) ]))) "pr65105-5.c":20:8 -1 (nil)) (insn 23 46 24 4 (set (reg:CC 17 flags) (unspec:CC [ (reg:V2DI 103) repeated x2 ] UNSPEC_PTEST)) "pr65105-5.c":20:8 7425 {sse4_1_ptestv2di} (expr_list:REG_DEAD (reg:DI 97) (nil))) where the XOR has been introduce to implement the equality, as P == Q is effectively implemented as (P ^ Q) == 0. At this point, the only remaining pass that can optimize the pand followed by the pxor is peephole2. The requirement to optimize this is from gcc.target/i386/pr65105-5.c where the desired implementation is explicitly looking for pandn+ptest: /* { dg-do compile { target ia32 } } */ /* { dg-options "-O2 -march=core-avx2 -mno-stackrealign" } */ /* { dg-final { scan-assembler "pandn" } } */ /* { dg-final { scan-assembler "pxor" } } */ /* { dg-final { scan-assembler "ptest" } } */ Confusingly, I've even more patches in the queue/backlog for this part of the compiler (it's an air traffic control problem, fallout from stage 4). And of course, very many thanks for the various andn related patches that have already been approved/committed to the backend, to avoid potential regressions related to code size (-Os and -Oz). It's a long road with many steps. Might you reconsider? Pretty please? Roger -- > -----Original Message----- > From: Uros Bizjak <ubizjak@gmail.com> > Sent: 23 May 2022 10:11 > To: Roger Sayle <roger@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PING] Peephole pand;pxor into pandn > > On Mon, May 23, 2022 at 10:59 AM Roger Sayle > <roger@nextmovesoftware.com> wrote: > > > > > > Hi Uros, > > > > Thanks for the speedy review. The point of this patch is that (with > > pending changes to STV) the pand;pxor sequence isn't created until > > after combine, and hence doesn't/won't get caught by any of the > > current pre-reload/combine splitters. > > IMO this happens due to inconsistencies between integer and vector set, where > integer andn is absent without BMI. However, we don't re-run the combine after > reload, and I don't think it is worth to reimplement it via peephole2 patterns. > Please note that AVX allows much more combinations that are not catched by > your patch, and considering that combine already does the transformation, I > don't see a compelling reason for this very specialized peephole2. > > Let's keep the patch shelved until a testcase shows the benefits of the patch. > > Uros. > > > > > > > > -----Original Message----- > > > From: Uros Bizjak <ubizjak@gmail.com> > > > Sent: 23 May 2022 09:51 > > > To: Roger Sayle <roger@nextmovesoftware.com> > > > Cc: gcc-patches@gcc.gnu.org > > > Subject: Re: [x86 PING] Peephole pand;pxor into pandn > > > > > > On Mon, May 23, 2022 at 10:44 AM Roger Sayle > > > <roger@nextmovesoftware.com> wrote: > > > > > > > > > > > > This is a ping of a patch from April (a dependency of another stage1 patch): > > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593123.html > > > > > > > > This patch has been refreshed/retested against gcc 13 trunk on > > > > x86_64-pc-linux-gnu with make bootstrap and make -k check, both > > > > with and without --target_board=unix{-m32}, with no new failures. > > > > Ok for mainline? > > > > > > I think this should be handled in a pre-reload splitter (or perhaps > > > combine splitter). We have so many variants of SSE/AVX logic > > > instructions that the transform after reload barely makes sense > > > (please see the number of regno checks in the proposed patch). > > > > > > Uros. > > > > > > > 2022-05-23 Roger Sayle <roger@nextmovesoftware.com> > > > > > > > > gcc/ChangeLog > > > > * config/i386/sse.md (peephole2): Convert suitable pand followed > > > > by pxor into pandn, i.e. (X&Y)^X into X & ~Y. > > > > > > > > Many thanks in advance, > > > > Roger > > > > -- > > > > > >
On Mon, May 23, 2022 at 12:49 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Uros, > Hopefully, if I explain even more of the context, you'll better understand why > this harmless (and at worse seemingly redundant) peephole2 is actually critical > for addressing significant regressions in the compiler without introducing new > testsuite failures. I wouldn't ask (again), if I didn't feel it's important. > > Basically, I'm trying to unblock Hongtao's patch (for PR target/104610) > which in your own review, explained is better handled by/during STV: > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594070.html > > Unfortunately, that patch of mine to STV (that I want to ping next) that solves > the P2 code quality regression PR target/70321, is itself blocked by another > review of yours: > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593200.html > where this fix (alone) leads to a regression of the test case pr65105-5.c. Is it possible to start with a STV patch? If there are only a few introduced regressions, we can afford them in this stage of development, and fix regressions later with a follow-up patches. THis way, it is much easier for me to see the effect of the patch, and its benefit can be weighted appropriately. I was indeed under the impression that we try to peephole a combination that appears once in a blue moon, but if the situation appears regularly, this is a completely different matter. > This pending regression has nothing to do with TARGET_BMI's andn, but > the idiom "if ((x & y) != y)" on ia32, where x and y are DImode, and stv/reload > has decided to place these values in SSE registers. > > After combine we have an *anddi3_doubleword and *cmpdi3_doubleword: > (insn 22 21 23 4 (parallel [ > (set (reg:DI 97) > (and:DI (reg/v:DI 92 [ p2 ]) > (reg:DI 88 [ _25 ]))) > (clobber (reg:CC 17 flags)) > ]) "pr65105-5.c":20:18 530 {*anddi3_doubleword} > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil))) > (insn 23 22 24 4 (set (reg:CCZ 17 flags) > (compare:CCZ (reg/v:DI 92 [ p2 ]) > (reg:DI 97))) "pr65105-5.c":20:8 29 {*cmpdi_doubleword} > (expr_list:REG_DEAD (reg:DI 97) > (nil))) One possible approach is to introduce intermediate compound (but non-existent) instruction that is created by combine pass, and is later split to real instructions. But a real testcase is needed, so the correct strategy is used. > After STV we have: > (insn 22 21 45 4 (set (subreg:V2DI (reg:DI 97) 0) > (and:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0) > (subreg:V2DI (reg:DI 88 [ _25 ]) 0))) "pr65105-5.c":20:18 6640 {*andv2di3} > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil))) > (insn 45 22 46 4 (set (reg:V2DI 103) > (xor:V2DI (subreg:V2DI (reg/v:DI 92 [ p2 ]) 0) > (subreg:V2DI (reg:DI 97) 0))) "pr65105-5.c":20:8 -1 > (nil)) > (insn 46 45 23 4 (set (reg:V2DI 103) > (vec_select:V2DI (vec_concat:V4DI (reg:V2DI 103) > (reg:V2DI 103)) > (parallel [ > (const_int 0 [0]) > (const_int 2 [0x2]) > ]))) "pr65105-5.c":20:8 -1 > (nil)) > (insn 23 46 24 4 (set (reg:CC 17 flags) > (unspec:CC [ > (reg:V2DI 103) repeated x2 > ] UNSPEC_PTEST)) "pr65105-5.c":20:8 7425 {sse4_1_ptestv2di} > (expr_list:REG_DEAD (reg:DI 97) > (nil))) > > where the XOR has been introduce to implement the equality, > as P == Q is effectively implemented as (P ^ Q) == 0. At this point, > the only remaining pass that can optimize the pand followed by > the pxor is peephole2. > > The requirement to optimize this is from gcc.target/i386/pr65105-5.c > where the desired implementation is explicitly looking for pandn+ptest: > > /* { dg-do compile { target ia32 } } */ > /* { dg-options "-O2 -march=core-avx2 -mno-stackrealign" } */ > /* { dg-final { scan-assembler "pandn" } } */ > /* { dg-final { scan-assembler "pxor" } } */ > /* { dg-final { scan-assembler "ptest" } } */ > > > Confusingly, I've even more patches in the queue/backlog for this part > of the compiler (it's an air traffic control problem, fallout from stage 4). > > And of course, very many thanks for the various andn related patches > that have already been approved/committed to the backend, to avoid > potential regressions related to code size (-Os and -Oz). It's a long road > with many steps. > > Might you reconsider? Pretty please? No problem for me, but the testcase would really help. Uros.
On Mon, May 23, 2022 at 12:49 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Uros, > Hopefully, if I explain even more of the context, you'll better understand why > this harmless (and at worse seemingly redundant) peephole2 is actually critical > for addressing significant regressions in the compiler without introducing new > testsuite failures. I wouldn't ask (again), if I didn't feel it's important. > > Basically, I'm trying to unblock Hongtao's patch (for PR target/104610) > which in your own review, explained is better handled by/during STV: > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594070.html > > Unfortunately, that patch of mine to STV (that I want to ping next) that solves > the P2 code quality regression PR target/70321, is itself blocked by another > review of yours: > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593200.html > where this fix (alone) leads to a regression of the test case pr65105-5.c. > > This pending regression has nothing to do with TARGET_BMI's andn, but > the idiom "if ((x & y) != y)" on ia32, where x and y are DImode, and stv/reload > has decided to place these values in SSE registers. > > After combine we have an *anddi3_doubleword and *cmpdi3_doubleword: > (insn 22 21 23 4 (parallel [ > (set (reg:DI 97) > (and:DI (reg/v:DI 92 [ p2 ]) > (reg:DI 88 [ _25 ]))) > (clobber (reg:CC 17 flags)) > ]) "pr65105-5.c":20:18 530 {*anddi3_doubleword} > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil))) > (insn 23 22 24 4 (set (reg:CCZ 17 flags) > (compare:CCZ (reg/v:DI 92 [ p2 ]) > (reg:DI 97))) "pr65105-5.c":20:8 29 {*cmpdi_doubleword} > (expr_list:REG_DEAD (reg:DI 97) > (nil))) But originally, during combine we have (pr65105-5.c): Trying 22 -> 23: 22: {r97:DI=r92:DI&r88:DI;clobber flags:CC;} REG_UNUSED flags:CC 23: {r98:DI=r92:DI^r97:DI;clobber flags:CC;} REG_DEAD r97:DI REG_UNUSED flags:CC Successfully matched this instruction: (parallel [ (set (reg:DI 98) (and:DI (not:DI (reg:DI 88 [ _25 ])) (reg/v:DI 92 [ p2 ]))) (clobber (reg:CC 17 flags)) ]) allowing combination of insns 22 and 23 original costs 8 + 8 = 16 replacement cost 16 deferring deletion of insn with uid = 22. modifying insn i3 23: {r98:DI=~r88:DI&r92:DI;clobber flags:CC;} REG_UNUSED flags:CC deferring rescan insn with uid = 23. so combine is creating: (insn 23 22 24 4 (parallel [ (set (reg:DI 98) (and:DI (not:DI (reg:DI 88 [ _25 ])) (reg/v:DI 92 [ p2 ]))) (clobber (reg:CC 17 flags)) ]) "pr65105-5.c":20:8 552 {*andndi3_doubleword} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) why is this not the case anymore with your patch? Uros.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 191371b..4203fe0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17021,6 +17021,44 @@ (match_dup 2)))] "operands[3] = gen_reg_rtx (<MODE>mode);") +;; Combine pand;pxor into pandn. (X&Y)^X -> X & ~Y. +(define_peephole2 + [(set (match_operand:VMOVE 0 "register_operand") + (and:VMOVE (match_operand:VMOVE 1 "register_operand") + (match_operand:VMOVE 2 "register_operand"))) + (set (match_operand:VMOVE 3 "register_operand") + (xor:VMOVE (match_operand:VMOVE 4 "register_operand") + (match_operand:VMOVE 5 "register_operand")))] + "TARGET_SSE + && REGNO (operands[1]) != REGNO (operands[2]) + && REGNO (operands[4]) != REGNO (operands[5]) + && (REGNO (operands[0]) == REGNO (operands[3]) + || peep2_reg_dead_p (2, operands[0]))" + [(set (match_dup 3) + (and:VMOVE (not:VMOVE (match_dup 6)) (match_dup 7)))] +{ + if (REGNO (operands[0]) != REGNO (operands[1]) + && ((REGNO (operands[4]) == REGNO (operands[0]) + && REGNO (operands[5]) == REGNO (operands[1])) + || (REGNO (operands[4]) == REGNO (operands[1]) + && REGNO (operands[5]) == REGNO (operands[0])))) + { + operands[6] = operands[2]; + operands[7] = operands[1]; + } + else if (REGNO (operands[0]) != REGNO (operands[2]) + && ((REGNO (operands[4]) == REGNO (operands[0]) + && REGNO (operands[5]) == REGNO (operands[2])) + || (REGNO (operands[4]) == REGNO (operands[2]) + && REGNO (operands[5]) == REGNO (operands[0])))) + { + operands[6] = operands[1]; + operands[7] = operands[2]; + } + else + FAIL; +}) + (define_insn "*andnot<mode>3_mask" [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") (vec_merge:VI48_AVX512VL