Message ID | 027501d83211$eb989d70$c2c9d850$@nextmovesoftware.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 25CCE3858403 for <patchwork@sourceware.org>; Mon, 7 Mar 2022 10:56:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 2C06A3858D35 for <gcc-patches@gcc.gnu.org>; Mon, 7 Mar 2022 10:55:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2C06A3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=VfCK1mrsNWVl9ZmKVNIratazRuptG+G3FH+4MckRTyo=; b=WhWMI+8xEZIxi0ztrgw9axdafJ HCziUvHOlI6Z9h+JUNobYuXD3lNmCZFyDeO8M1tcupgVCPBAFfYRKl1Jz6IBYdTT7sVmnaRg+5gZL RoX1zV/j7a+V3jSFvGc7iiIXc9VCL6CGzpNg6R0xa5xUgYKt9KBYISbdTc8d9fqiCHGz/yN/k7h5t smZZ22cKG6uJj31JTd6MsquvTTIpXXYF5QKlQvtOvBOuFLTa962911HzdjiBi6d/ie9OgSVEVQxzM pMSHSfpfpyJiB9JGJ8nry7qRnQ4JLCt92afN/j1R4Uv+nGOEiTRIa7k0Ob7sRWr6q1zTk1mCaEIfa 8WBrYYjw==; Received: from host86-186-213-42.range86-186.btcentralplus.com ([86.186.213.42]:53988 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <roger@nextmovesoftware.com>) id 1nRB1w-0001Sd-K1; Mon, 07 Mar 2022 05:55:56 -0500 From: "Roger Sayle" <roger@nextmovesoftware.com> To: <gcc-patches@gcc.gnu.org> Subject: [x86 PATCH] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl Date: Mon, 7 Mar 2022 10:55:53 -0000 Message-ID: <027501d83211$eb989d70$c2c9d850$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0276_01D83211.EB989D70" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdgyEYCKkUPfYrnuToiQALCw6NCXxg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[x86] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl
|
|
Commit Message
Roger Sayle
March 7, 2022, 10:55 a.m. UTC
This patch is the backend piece of my proposed fix to PR tree-opt/98335, to allow C++ partial struct initialization to be as efficient/optimized as full struct initialization. With the middle-end patch just posted to gcc-patches, the test case in the PR compiles on x86_64-pc-linux-gnu with -O2 to: xorl %eax, %eax movb c(%rip), %al ret with this additional peephole2 (actually pair of peephole2s): movzbl c(%rip), %eax ret This patch has been tested on x86_64-pc-linux-gnu, on top of the middle-end piece, with make bootstrap and make -k check with no new failures. Posted in pieces to simplify review. Ok for mainline? 2022-03-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR tree-optimization/98335 * config/i386/i386.md (peephole2): Transform xorl followed by a suitable movb or movw into the equivalent movz[bw]l. gcc/testsuite/ChangeLog PR tree-optimization/98335 * g++.target/i386/pr98335.C: New test case. Thanks in advance, Roger --
Comments
On Mon, Mar 7, 2022 at 11:55 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch is the backend piece of my proposed fix to PR tree-opt/98335, > to allow C++ partial struct initialization to be as efficient/optimized > as full struct initialization. > > With the middle-end patch just posted to gcc-patches, the test case > in the PR compiles on x86_64-pc-linux-gnu with -O2 to: > > xorl %eax, %eax > movb c(%rip), %al > ret > > with this additional peephole2 (actually pair of peephole2s): > > movzbl c(%rip), %eax > ret > > > This patch has been tested on x86_64-pc-linux-gnu, on top of the > middle-end piece, with make bootstrap and make -k check with no > new failures. Posted in pieces to simplify review. Ok for mainline? Is there a reason that only inserts to DImode registers are implemented? IMO, these peepholes should also handle inserts to SImode. Uros. > > 2022-03-07 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR tree-optimization/98335 > * config/i386/i386.md (peephole2): Transform xorl followed by > a suitable movb or movw into the equivalent movz[bw]l. > > gcc/testsuite/ChangeLog > PR tree-optimization/98335 > * g++.target/i386/pr98335.C: New test case. > > > Thanks in advance, > Roger > -- >
Hi Uros, > Is there a reason that only inserts to DImode registers are implemented? > IMO, these peepholes should also handle inserts to SImode. I wasn't able to construct a test case that produced a byte or word insert into an SImode register. The front-ends and middle-end end up producing different code sequences, and -m32 changes the ABI so that small structs get passed in memory rather than in registers. Here's the expanded testcase that I investigated: struct DataCL { char a; int b; }; struct DataWL { short a; int b; }; struct DataIL { int a; int b; }; struct DataCI { char a; short b; }; struct DataWI { short a; short b; }; char c; short w; int i; DataCL foo_cl(int idx) { return { c }; } DataCL bar_cl(int idx) { return { c, 0 }; } DataWL foo_wl(int idx) { return { w }; } DataWL bar_wl(int idx) { return { w, 0 }; } DataIL foo_il(int idx) { return { i }; } DataIL bar_il(int idx) { return { i, 0 }; } DataCI foo_ci(int idx) { return { c }; } DataCI bar_ci(int idx) { return { c, 0 }; } DataWI foo_wi(int idx) { return { w }; } DataWI bar_wi(int idx) { return { w, 0 }; } I agree that for completeness similar peepholes handling inserts into SImode would be a good thing, but these wouldn't be restricted by TARGET_64BIT and would therefore require additional -m32 testing. The DImode peepholes I can justify for stage4 as PR tree-opt/98335 is a regression, SImode peepholes would be more of a "leap of faith". If you’d be willing to accept a patch without a testcase, let me know. It's also a pity that subreg handling in combine doesn't allow merging these inserts into zero registers to be combined to zero_extends in a machine independent way. My recent patch for PR 95126 (awaiting review) should also allow front-ends and middle-end passes more flexibility in optimizing small struct constructors. Thanks (as always) for reviewing patches so quickly. Roger --
On Mon, Mar 07, 2022 at 11:51:50AM -0000, Roger Sayle wrote: > I agree that for completeness similar peepholes handling inserts into > SImode would be a good thing, but these wouldn't be restricted by > TARGET_64BIT and would therefore require additional -m32 testing. > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 > is a regression, SImode peepholes would be more of a "leap of faith". > If you’d be willing to accept a patch without a testcase, let me know. For testcase, wouldn't something like: struct Data { char a; short b; }; char c; void val(void) { __asm__ __volatile__ ("" : : "r" ((struct Data) { c } )); } do it? Jakub
On Mon, Mar 7, 2022 at 12:51 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Uros, > > > Is there a reason that only inserts to DImode registers are implemented? > > IMO, these peepholes should also handle inserts to SImode. > > I wasn't able to construct a test case that produced a byte or word insert > into an SImode register. The front-ends and middle-end end up producing > different code sequences, and -m32 changes the ABI so that small structs > get passed in memory rather than in registers. > > Here's the expanded testcase that I investigated: > > struct DataCL { char a; int b; }; > struct DataWL { short a; int b; }; > struct DataIL { int a; int b; }; > > struct DataCI { char a; short b; }; > struct DataWI { short a; short b; }; > > char c; > short w; > int i; > > DataCL foo_cl(int idx) { return { c }; } > DataCL bar_cl(int idx) { return { c, 0 }; } > DataWL foo_wl(int idx) { return { w }; } > DataWL bar_wl(int idx) { return { w, 0 }; } > DataIL foo_il(int idx) { return { i }; } > DataIL bar_il(int idx) { return { i, 0 }; } > > DataCI foo_ci(int idx) { return { c }; } > DataCI bar_ci(int idx) { return { c, 0 }; } > DataWI foo_wi(int idx) { return { w }; } > DataWI bar_wi(int idx) { return { w, 0 }; } > > > I agree that for completeness similar peepholes handling inserts into > SImode would be a good thing, but these wouldn't be restricted by > TARGET_64BIT and would therefore require additional -m32 testing. > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 > is a regression, SImode peepholes would be more of a "leap of faith". > If you’d be willing to accept a patch without a testcase, let me know. We have plenty of these, where we assume that if the pattern works in one mode, it also works in other modes. So, I think that by changing DI mode to SWI48 mode iterator, we are on the safe side. We can also trust bootstrap and regression tests here. IMO, the patch with SWI48 mode iterator is OK. Thanks, Uros. > > It's also a pity that subreg handling in combine doesn't allow merging > these inserts into zero registers to be combined to zero_extends in a > machine independent way. My recent patch for PR 95126 (awaiting > review) should also allow front-ends and middle-end passes more > flexibility in optimizing small struct constructors. > > Thanks (as always) for reviewing patches so quickly. > > Roger > -- > >
Hi Uros, Firstly many thanks for already (pre)approving half of this patch. Jakub Jelinek's suggestion for creating a testcase that exposes the SImode issue was extremely helpful, but interestingly exposed another missed optimization opportunity that's also addressed with this revision of my patch. char c; union Data { char a; short b; }; void val(void) { __asm__ __volatile__ ("" : : "r" ((union Data) { c } )); } currently on x86_64 with -O2 on mainline generates following code: xorl %eax, %eax movb $0, %ah movb c(%rip), %al ret which contains the strict_low_part movb following an SI mode xor that we hope to optimize, but infuriatingly we also have a completely redundant movb $0, %ah. Hence, with this patch we have a new peephole2 that sees that in the first two instructions the movb is redundant, which then allows the SImode/SWI48 xor followed by movb peephole2 to optimize the rest to movzbl. With the revised patch below, the above testcase is now compiled as: movzbl c(%rip), %eax ret Tested on x86_64-pc-linux-gnu (on top of the revised middle-end patch) with make bootstrap and make -k check with no new failures. Is this revised version (still) Ok for mainline? 2022-03-09 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR tree-optimization/98335 * config/i386/i386.md (peephole2): Eliminate redundant insv. Combine movl followed by movb. Transform xorl followed by a suitable movb or movw into the equivalent movz[bw]l. gcc/testsuite/ChangeLog PR tree-optimization/98335 * g++.target/i386/pr98335.C: New test case. * gcc.target/i386/pr98335.c: New test case. Thanks again for your help/suggested revisions. Roger -- > -----Original Message----- > From: Uros Bizjak <ubizjak@gmail.com> > Sent: 09 March 2022 07:36 > To: Roger Sayle <roger@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PATCH] PR tree-optimization/98335: New peephole2 > xorl;movb -> movzbl > > On Mon, Mar 7, 2022 at 12:51 PM Roger Sayle > <roger@nextmovesoftware.com> wrote: > > > > > > Hi Uros, > > > > > Is there a reason that only inserts to DImode registers are implemented? > > > IMO, these peepholes should also handle inserts to SImode. > > > > I wasn't able to construct a test case that produced a byte or word > > insert into an SImode register. The front-ends and middle-end end up > > producing different code sequences, and -m32 changes the ABI so that > > small structs get passed in memory rather than in registers. > > > > Here's the expanded testcase that I investigated: > > > > struct DataCL { char a; int b; }; > > struct DataWL { short a; int b; }; > > struct DataIL { int a; int b; }; > > > > struct DataCI { char a; short b; }; > > struct DataWI { short a; short b; }; > > > > char c; > > short w; > > int i; > > > > DataCL foo_cl(int idx) { return { c }; } DataCL bar_cl(int idx) { > > return { c, 0 }; } DataWL foo_wl(int idx) { return { w }; } DataWL > > bar_wl(int idx) { return { w, 0 }; } DataIL foo_il(int idx) { return { > > i }; } DataIL bar_il(int idx) { return { i, 0 }; } > > > > DataCI foo_ci(int idx) { return { c }; } DataCI bar_ci(int idx) { > > return { c, 0 }; } DataWI foo_wi(int idx) { return { w }; } DataWI > > bar_wi(int idx) { return { w, 0 }; } > > > > > > I agree that for completeness similar peepholes handling inserts into > > SImode would be a good thing, but these wouldn't be restricted by > > TARGET_64BIT and would therefore require additional -m32 testing. > > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 is > > a regression, SImode peepholes would be more of a "leap of faith". > > If you’d be willing to accept a patch without a testcase, let me know. > > We have plenty of these, where we assume that if the pattern works in one > mode, it also works in other modes. So, I think that by changing DI mode to > SWI48 mode iterator, we are on the safe side. We can also trust bootstrap and > regression tests here. > > IMO, the patch with SWI48 mode iterator is OK. > > Thanks, > Uros. > > > > > It's also a pity that subreg handling in combine doesn't allow merging > > these inserts into zero registers to be combined to zero_extends in a > > machine independent way. My recent patch for PR 95126 (awaiting > > review) should also allow front-ends and middle-end passes more > > flexibility in optimizing small struct constructors. > > > > Thanks (as always) for reviewing patches so quickly. > > > > Roger > > -- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d15170e..c8fbf60 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -3180,6 +3180,38 @@ (const_int 8)) (subreg:SWI248 (match_dup 1) 0))]) +;; Eliminate redundant insv, e.g. xorl %eax,%eax; movb $0, %ah +(define_peephole2 + [(parallel [(set (match_operand:SWI48 0 "general_reg_operand") + (const_int 0)) + (clobber (reg:CC FLAGS_REG))]) + (set (zero_extract:SWI248 (match_operand:SWI248 1 "general_reg_operand") + (const_int 8) + (const_int 8)) + (const_int 0))] + "REGNO (operands[0]) == REGNO (operands[1])" + [(parallel [(set (match_operand:SWI48 0 "general_reg_operand") + (const_int 0)) + (clobber (reg:CC FLAGS_REG))])]) + +;; Combine movl followed by movb. +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_operand:SWI48 1 "const_int_operand")) + (set (zero_extract:SWI248 (match_operand:SWI248 2 "general_reg_operand") + (const_int 8) + (const_int 8)) + (match_operand:SWI248 3 "const_int_operand"))] + "REGNO (operands[0]) == REGNO (operands[2])" + [(set (match_operand:SWI48 0 "general_reg_operand") + (match_dup 4))] +{ + HOST_WIDE_INT tmp = INTVAL (operands[1]) & ~(HOST_WIDE_INT)0xff00; + tmp |= (INTVAL (operands[3]) & 0xff) << 8; + operands[4] = gen_int_mode (tmp, <SWI48:MODE>mode); +}) + + (define_code_iterator any_extract [sign_extract zero_extract]) (define_insn "*insvqi_2" @@ -4276,6 +4308,24 @@ [(set_attr "isa" "*,avx512dq,avx512dq") (set_attr "type" "imovx,mskmov,mskmov") (set_attr "mode" "SI,QI,QI")]) + +;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l. +(define_peephole2 + [(parallel [(set (match_operand:SWI48 0 "general_reg_operand") + (const_int 0)) + (clobber (reg:CC FLAGS_REG))]) + (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand")) + (match_operand:SWI12 2 "nonimmediate_operand"))] + "REGNO (operands[0]) == REGNO (operands[1])" + [(set (match_dup 0) (zero_extend:SWI48 (match_dup 2)))]) + +;; Likewise, but preserving FLAGS_REG. +(define_peephole2 + [(set (match_operand:SWI48 0 "general_reg_operand") (const_int 0)) + (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand")) + (match_operand:SWI12 2 "nonimmediate_operand"))] + "REGNO (operands[0]) == REGNO (operands[1])" + [(set (match_dup 0) (zero_extend:SWI48 (match_dup 2)))]) ;; Sign extension instructions diff --git a/gcc/testsuite/g++.target/i386/pr98335.C b/gcc/testsuite/g++.target/i386/pr98335.C new file mode 100644 index 0000000..2581b83 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr98335.C @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +struct Data { + char a; + int b; +}; + +char c; + +Data val(int idx) { + return { c }; // { dg-warning "extended initializer" "c++ 98" { target { c++98_only } } } +} + +/* { dg-final { scan-assembler "movzbl" } } */ +/* { dg-final { scan-assembler-not "xorl" } } */ +/* { dg-final { scan-assembler-not "movb" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr98335.c b/gcc/testsuite/gcc.target/i386/pr98335.c new file mode 100644 index 0000000..7fa7ad7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98335.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +union Data { char a; short b; }; + +char c; + +void val(void) { + __asm__ __volatile__ ("" : : "r" ((union Data) { c } )); } + +/* { dg-final { scan-assembler "movzbl" } } */ +/* { dg-final { scan-assembler-not "xorl" } } */ +/* { dg-final { scan-assembler-not "movb" } } */
On Wed, Mar 9, 2022 at 7:10 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Uros, > Firstly many thanks for already (pre)approving half of this patch. Jakub Jelinek's > suggestion for creating a testcase that exposes the SImode issue was extremely > helpful, but interestingly exposed another missed optimization opportunity that's > also addressed with this revision of my patch. > > char c; > union Data { char a; short b; }; > void val(void) { __asm__ __volatile__ ("" : : "r" ((union Data) { c } )); } > > currently on x86_64 with -O2 on mainline generates following code: > > xorl %eax, %eax > movb $0, %ah > movb c(%rip), %al > ret > > which contains the strict_low_part movb following an SI mode xor that we hope > to optimize, but infuriatingly we also have a completely redundant movb $0, %ah. > Hence, with this patch we have a new peephole2 that sees that in the first two > instructions the movb is redundant, which then allows the SImode/SWI48 xor > followed by movb peephole2 to optimize the rest to movzbl. With the revised > patch below, the above testcase is now compiled as: > > movzbl c(%rip), %eax > ret > > Tested on x86_64-pc-linux-gnu (on top of the revised middle-end patch) with > make bootstrap and make -k check with no new failures. Is this revised version > (still) Ok for mainline? > > > 2022-03-09 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR tree-optimization/98335 > * config/i386/i386.md (peephole2): Eliminate redundant insv. > Combine movl followed by movb. Transform xorl followed by > a suitable movb or movw into the equivalent movz[bw]l. > > gcc/testsuite/ChangeLog > PR tree-optimization/98335 > * g++.target/i386/pr98335.C: New test case. > * gcc.target/i386/pr98335.c: New test case. OK. Thanks, Uros. > > > Thanks again for your help/suggested revisions. > Roger > -- > > > -----Original Message----- > > From: Uros Bizjak <ubizjak@gmail.com> > > Sent: 09 March 2022 07:36 > > To: Roger Sayle <roger@nextmovesoftware.com> > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [x86 PATCH] PR tree-optimization/98335: New peephole2 > > xorl;movb -> movzbl > > > > On Mon, Mar 7, 2022 at 12:51 PM Roger Sayle > > <roger@nextmovesoftware.com> wrote: > > > > > > > > > Hi Uros, > > > > > > > Is there a reason that only inserts to DImode registers are implemented? > > > > IMO, these peepholes should also handle inserts to SImode. > > > > > > I wasn't able to construct a test case that produced a byte or word > > > insert into an SImode register. The front-ends and middle-end end up > > > producing different code sequences, and -m32 changes the ABI so that > > > small structs get passed in memory rather than in registers. > > > > > > Here's the expanded testcase that I investigated: > > > > > > struct DataCL { char a; int b; }; > > > struct DataWL { short a; int b; }; > > > struct DataIL { int a; int b; }; > > > > > > struct DataCI { char a; short b; }; > > > struct DataWI { short a; short b; }; > > > > > > char c; > > > short w; > > > int i; > > > > > > DataCL foo_cl(int idx) { return { c }; } DataCL bar_cl(int idx) { > > > return { c, 0 }; } DataWL foo_wl(int idx) { return { w }; } DataWL > > > bar_wl(int idx) { return { w, 0 }; } DataIL foo_il(int idx) { return { > > > i }; } DataIL bar_il(int idx) { return { i, 0 }; } > > > > > > DataCI foo_ci(int idx) { return { c }; } DataCI bar_ci(int idx) { > > > return { c, 0 }; } DataWI foo_wi(int idx) { return { w }; } DataWI > > > bar_wi(int idx) { return { w, 0 }; } > > > > > > > > > I agree that for completeness similar peepholes handling inserts into > > > SImode would be a good thing, but these wouldn't be restricted by > > > TARGET_64BIT and would therefore require additional -m32 testing. > > > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 is > > > a regression, SImode peepholes would be more of a "leap of faith". > > > If you’d be willing to accept a patch without a testcase, let me know. > > > > We have plenty of these, where we assume that if the pattern works in one > > mode, it also works in other modes. So, I think that by changing DI mode to > > SWI48 mode iterator, we are on the safe side. We can also trust bootstrap and > > regression tests here. > > > > IMO, the patch with SWI48 mode iterator is OK. > > > > Thanks, > > Uros. > > > > > > > > It's also a pity that subreg handling in combine doesn't allow merging > > > these inserts into zero registers to be combined to zero_extends in a > > > machine independent way. My recent patch for PR 95126 (awaiting > > > review) should also allow front-ends and middle-end passes more > > > flexibility in optimizing small struct constructors. > > > > > > Thanks (as always) for reviewing patches so quickly. > > > > > > Roger > > > -- > >
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index d15170e..f1eb62a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4276,6 +4276,26 @@ [(set_attr "isa" "*,avx512dq,avx512dq") (set_attr "type" "imovx,mskmov,mskmov") (set_attr "mode" "SI,QI,QI")]) + +;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l. +(define_peephole2 + [(parallel [(set (match_operand:DI 0 "general_reg_operand") + (const_int 0)) + (clobber (reg:CC FLAGS_REG))]) + (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand")) + (match_operand:SWI12 2 "nonimmediate_operand"))] + "TARGET_64BIT + && REGNO (operands[0]) == REGNO (operands[1])" + [(set (match_dup 0) (zero_extend:DI (match_dup 2)))]) + +;; Likewise, but preserving FLAGS_REG. +(define_peephole2 + [(set (match_operand:DI 0 "general_reg_operand") (const_int 0)) + (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand")) + (match_operand:SWI12 2 "nonimmediate_operand"))] + "TARGET_64BIT + && REGNO (operands[0]) == REGNO (operands[1])" + [(set (match_dup 0) (zero_extend:DI (match_dup 2)))]) ;; Sign extension instructions diff --git a/gcc/testsuite/g++.target/i386/pr98335.C b/gcc/testsuite/g++.target/i386/pr98335.C new file mode 100644 index 0000000..2581b83 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr98335.C @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +struct Data { + char a; + int b; +}; + +char c; + +Data val(int idx) { + return { c }; // { dg-warning "extended initializer" "c++ 98" { target { c++98_only } } } +} + +/* { dg-final { scan-assembler "movzbl" } } */ +/* { dg-final { scan-assembler-not "xorl" } } */ +/* { dg-final { scan-assembler-not "movb" } } */ +