[x86] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl

Message ID 027501d83211$eb989d70$c2c9d850$@nextmovesoftware.com
State New
Headers
Series [x86] PR tree-optimization/98335: New peephole2 xorl; movb -> movzbl |

Commit Message

Roger Sayle March 7, 2022, 10:55 a.m. UTC
  This patch is the backend piece of my proposed fix to PR tree-opt/98335,
to allow C++ partial struct initialization to be as efficient/optimized
as full struct initialization.

With the middle-end patch just posted to gcc-patches, the test case
in the PR compiles on x86_64-pc-linux-gnu with -O2 to:

        xorl    %eax, %eax
        movb    c(%rip), %al
        ret

with this additional peephole2 (actually pair of peephole2s):

        movzbl  c(%rip), %eax
        ret


This patch has been tested on x86_64-pc-linux-gnu, on top of the
middle-end piece, with make bootstrap and make -k check with no
new failures.  Posted in pieces to simplify review.  Ok for mainline?


2022-03-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR tree-optimization/98335
	* config/i386/i386.md (peephole2): Transform xorl followed by
	a suitable movb or movw into the equivalent movz[bw]l.

gcc/testsuite/ChangeLog
	PR tree-optimization/98335
	* g++.target/i386/pr98335.C: New test case.


Thanks in advance,
Roger
--
  

Comments

Uros Bizjak March 7, 2022, 11:20 a.m. UTC | #1
On Mon, Mar 7, 2022 at 11:55 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch is the backend piece of my proposed fix to PR tree-opt/98335,
> to allow C++ partial struct initialization to be as efficient/optimized
> as full struct initialization.
>
> With the middle-end patch just posted to gcc-patches, the test case
> in the PR compiles on x86_64-pc-linux-gnu with -O2 to:
>
>         xorl    %eax, %eax
>         movb    c(%rip), %al
>         ret
>
> with this additional peephole2 (actually pair of peephole2s):
>
>         movzbl  c(%rip), %eax
>         ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu, on top of the
> middle-end piece, with make bootstrap and make -k check with no
> new failures.  Posted in pieces to simplify review.  Ok for mainline?

Is there a reason that only inserts to DImode registers are
implemented? IMO, these peepholes should also handle inserts to
SImode.

Uros.

>
> 2022-03-07  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR tree-optimization/98335
>         * config/i386/i386.md (peephole2): Transform xorl followed by
>         a suitable movb or movw into the equivalent movz[bw]l.
>
> gcc/testsuite/ChangeLog
>         PR tree-optimization/98335
>         * g++.target/i386/pr98335.C: New test case.
>
>
> Thanks in advance,
> Roger
> --
>
  
Roger Sayle March 7, 2022, 11:51 a.m. UTC | #2
Hi Uros,

> Is there a reason that only inserts to DImode registers are implemented?
> IMO, these peepholes should also handle inserts to SImode.

I wasn't able to construct a test case that produced a byte or word insert
into an SImode register.  The front-ends and middle-end end up producing
different code sequences, and -m32 changes the ABI so that small structs
get passed in memory rather than in registers.

Here's the expanded testcase that I investigated:

struct DataCL { char a; int b; };
struct DataWL { short a; int b; };
struct DataIL { int a; int b; };

struct DataCI { char a; short b; };
struct DataWI { short a; short b; };

char c;
short w;
int i;

DataCL foo_cl(int idx) { return { c }; }
DataCL bar_cl(int idx) { return { c, 0 }; }
DataWL foo_wl(int idx) { return { w }; }
DataWL bar_wl(int idx) { return { w, 0 }; }
DataIL foo_il(int idx) { return { i }; }
DataIL bar_il(int idx) { return { i, 0 }; }

DataCI foo_ci(int idx) { return { c }; }
DataCI bar_ci(int idx) { return { c, 0 }; }
DataWI foo_wi(int idx) { return { w }; }
DataWI bar_wi(int idx) { return { w, 0 }; }


I agree that for completeness similar peepholes handling inserts into 
SImode would be a good thing, but these wouldn't be restricted by
TARGET_64BIT and would therefore require additional -m32 testing.
The DImode peepholes I can justify for stage4 as PR tree-opt/98335
is a regression, SImode peepholes would be more of a "leap of faith".
If you’d be willing to accept a patch without a testcase, let me know.

It's also a pity that subreg handling in combine doesn't allow merging
these inserts into zero registers to be combined to zero_extends in a
machine independent way.  My recent patch for PR 95126 (awaiting
review) should also allow front-ends and middle-end passes more
flexibility in optimizing small struct constructors. 

Thanks (as always) for reviewing patches so quickly.

Roger
--
  
Jakub Jelinek March 7, 2022, 12:02 p.m. UTC | #3
On Mon, Mar 07, 2022 at 11:51:50AM -0000, Roger Sayle wrote:
> I agree that for completeness similar peepholes handling inserts into 
> SImode would be a good thing, but these wouldn't be restricted by
> TARGET_64BIT and would therefore require additional -m32 testing.
> The DImode peepholes I can justify for stage4 as PR tree-opt/98335
> is a regression, SImode peepholes would be more of a "leap of faith".
> If you’d be willing to accept a patch without a testcase, let me know.

For testcase, wouldn't something like:
struct Data { char a; short b; };

char c;

void val(void) {
  __asm__ __volatile__ ("" : : "r" ((struct Data) { c } ));
}
do it?

	Jakub
  
Uros Bizjak March 9, 2022, 7:36 a.m. UTC | #4
On Mon, Mar 7, 2022 at 12:51 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
>
> > Is there a reason that only inserts to DImode registers are implemented?
> > IMO, these peepholes should also handle inserts to SImode.
>
> I wasn't able to construct a test case that produced a byte or word insert
> into an SImode register.  The front-ends and middle-end end up producing
> different code sequences, and -m32 changes the ABI so that small structs
> get passed in memory rather than in registers.
>
> Here's the expanded testcase that I investigated:
>
> struct DataCL { char a; int b; };
> struct DataWL { short a; int b; };
> struct DataIL { int a; int b; };
>
> struct DataCI { char a; short b; };
> struct DataWI { short a; short b; };
>
> char c;
> short w;
> int i;
>
> DataCL foo_cl(int idx) { return { c }; }
> DataCL bar_cl(int idx) { return { c, 0 }; }
> DataWL foo_wl(int idx) { return { w }; }
> DataWL bar_wl(int idx) { return { w, 0 }; }
> DataIL foo_il(int idx) { return { i }; }
> DataIL bar_il(int idx) { return { i, 0 }; }
>
> DataCI foo_ci(int idx) { return { c }; }
> DataCI bar_ci(int idx) { return { c, 0 }; }
> DataWI foo_wi(int idx) { return { w }; }
> DataWI bar_wi(int idx) { return { w, 0 }; }
>
>
> I agree that for completeness similar peepholes handling inserts into
> SImode would be a good thing, but these wouldn't be restricted by
> TARGET_64BIT and would therefore require additional -m32 testing.
> The DImode peepholes I can justify for stage4 as PR tree-opt/98335
> is a regression, SImode peepholes would be more of a "leap of faith".
> If you’d be willing to accept a patch without a testcase, let me know.

We have plenty of these, where we assume that if the pattern works in
one mode, it also works in other modes. So, I think that by changing
DI mode to SWI48 mode iterator, we are on the safe side. We can also
trust bootstrap and regression tests here.

IMO, the patch with SWI48 mode iterator is OK.

Thanks,
Uros.

>
> It's also a pity that subreg handling in combine doesn't allow merging
> these inserts into zero registers to be combined to zero_extends in a
> machine independent way.  My recent patch for PR 95126 (awaiting
> review) should also allow front-ends and middle-end passes more
> flexibility in optimizing small struct constructors.
>
> Thanks (as always) for reviewing patches so quickly.
>
> Roger
> --
>
>
  
Roger Sayle March 9, 2022, 6:10 p.m. UTC | #5
Hi Uros,
Firstly many thanks for already (pre)approving half of this patch.  Jakub Jelinek's
suggestion for creating a testcase that exposes the SImode issue was extremely
helpful, but interestingly exposed another missed optimization opportunity that's
also addressed with this revision of my patch.

char c;
union Data { char a; short b; };
void val(void) { __asm__ __volatile__ ("" : : "r" ((union Data) { c } )); }

currently on x86_64 with -O2 on mainline generates following code:

        xorl    %eax, %eax
        movb    $0, %ah
        movb    c(%rip), %al
        ret

which contains the strict_low_part movb following an SI mode xor that we hope
to optimize, but infuriatingly we also have a completely redundant movb $0, %ah.
Hence, with this patch we have a new peephole2 that sees that in the first two
instructions the movb is redundant, which then allows the SImode/SWI48 xor
followed by movb peephole2 to optimize the rest to movzbl.  With the revised
patch below, the above testcase is now compiled as:

        movzbl  c(%rip), %eax
        ret

Tested on x86_64-pc-linux-gnu (on top of the revised middle-end patch) with
make bootstrap and make -k check with no new failures.  Is this revised version
(still) Ok for mainline?


2022-03-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR tree-optimization/98335
	* config/i386/i386.md (peephole2): Eliminate redundant insv.
	Combine movl followed by movb.  Transform xorl followed by
	a suitable movb or movw into the equivalent movz[bw]l.

gcc/testsuite/ChangeLog
	PR tree-optimization/98335
	* g++.target/i386/pr98335.C: New test case.
	* gcc.target/i386/pr98335.c: New test case.


Thanks again for your help/suggested revisions.
Roger
--

> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 09 March 2022 07:36
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] PR tree-optimization/98335: New peephole2
> xorl;movb -> movzbl
> 
> On Mon, Mar 7, 2022 at 12:51 PM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> >
> >
> > Hi Uros,
> >
> > > Is there a reason that only inserts to DImode registers are implemented?
> > > IMO, these peepholes should also handle inserts to SImode.
> >
> > I wasn't able to construct a test case that produced a byte or word
> > insert into an SImode register.  The front-ends and middle-end end up
> > producing different code sequences, and -m32 changes the ABI so that
> > small structs get passed in memory rather than in registers.
> >
> > Here's the expanded testcase that I investigated:
> >
> > struct DataCL { char a; int b; };
> > struct DataWL { short a; int b; };
> > struct DataIL { int a; int b; };
> >
> > struct DataCI { char a; short b; };
> > struct DataWI { short a; short b; };
> >
> > char c;
> > short w;
> > int i;
> >
> > DataCL foo_cl(int idx) { return { c }; } DataCL bar_cl(int idx) {
> > return { c, 0 }; } DataWL foo_wl(int idx) { return { w }; } DataWL
> > bar_wl(int idx) { return { w, 0 }; } DataIL foo_il(int idx) { return {
> > i }; } DataIL bar_il(int idx) { return { i, 0 }; }
> >
> > DataCI foo_ci(int idx) { return { c }; } DataCI bar_ci(int idx) {
> > return { c, 0 }; } DataWI foo_wi(int idx) { return { w }; } DataWI
> > bar_wi(int idx) { return { w, 0 }; }
> >
> >
> > I agree that for completeness similar peepholes handling inserts into
> > SImode would be a good thing, but these wouldn't be restricted by
> > TARGET_64BIT and would therefore require additional -m32 testing.
> > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 is
> > a regression, SImode peepholes would be more of a "leap of faith".
> > If you’d be willing to accept a patch without a testcase, let me know.
> 
> We have plenty of these, where we assume that if the pattern works in one
> mode, it also works in other modes. So, I think that by changing DI mode to
> SWI48 mode iterator, we are on the safe side. We can also trust bootstrap and
> regression tests here.
> 
> IMO, the patch with SWI48 mode iterator is OK.
> 
> Thanks,
> Uros.
> 
> >
> > It's also a pity that subreg handling in combine doesn't allow merging
> > these inserts into zero registers to be combined to zero_extends in a
> > machine independent way.  My recent patch for PR 95126 (awaiting
> > review) should also allow front-ends and middle-end passes more
> > flexibility in optimizing small struct constructors.
> >
> > Thanks (as always) for reviewing patches so quickly.
> >
> > Roger
> > --
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d15170e..c8fbf60 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3180,6 +3180,38 @@
 			     (const_int 8))
 	   (subreg:SWI248 (match_dup 1) 0))])
 
+;; Eliminate redundant insv, e.g. xorl %eax,%eax; movb $0, %ah
+(define_peephole2
+  [(parallel [(set (match_operand:SWI48 0 "general_reg_operand")
+		   (const_int 0))
+	      (clobber (reg:CC FLAGS_REG))])
+   (set (zero_extract:SWI248 (match_operand:SWI248 1 "general_reg_operand")
+			     (const_int 8)
+			     (const_int 8))
+	(const_int 0))]
+  "REGNO (operands[0]) == REGNO (operands[1])"
+  [(parallel [(set (match_operand:SWI48 0 "general_reg_operand")
+		   (const_int 0))
+	      (clobber (reg:CC FLAGS_REG))])])
+
+;; Combine movl followed by movb.
+(define_peephole2
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+	(match_operand:SWI48 1 "const_int_operand"))
+   (set (zero_extract:SWI248 (match_operand:SWI248 2 "general_reg_operand")
+			     (const_int 8)
+			     (const_int 8))
+	(match_operand:SWI248 3 "const_int_operand"))]
+  "REGNO (operands[0]) == REGNO (operands[2])"
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+	(match_dup 4))]
+{
+  HOST_WIDE_INT tmp = INTVAL (operands[1]) & ~(HOST_WIDE_INT)0xff00;
+  tmp |= (INTVAL (operands[3]) & 0xff) << 8;
+  operands[4] = gen_int_mode (tmp, <SWI48:MODE>mode);
+})
+
+
 (define_code_iterator any_extract [sign_extract zero_extract])
 
 (define_insn "*insvqi_2"
@@ -4276,6 +4308,24 @@
   [(set_attr "isa" "*,avx512dq,avx512dq")
    (set_attr "type" "imovx,mskmov,mskmov")
    (set_attr "mode" "SI,QI,QI")])
+
+;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l.
+(define_peephole2
+  [(parallel [(set (match_operand:SWI48 0 "general_reg_operand")
+		   (const_int 0))
+	      (clobber (reg:CC FLAGS_REG))])
+   (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand"))
+	(match_operand:SWI12 2 "nonimmediate_operand"))]
+  "REGNO (operands[0]) == REGNO (operands[1])"
+  [(set (match_dup 0) (zero_extend:SWI48 (match_dup 2)))])
+
+;; Likewise, but preserving FLAGS_REG.
+(define_peephole2
+  [(set (match_operand:SWI48 0 "general_reg_operand") (const_int 0))
+   (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand"))
+	(match_operand:SWI12 2 "nonimmediate_operand"))]
+  "REGNO (operands[0]) == REGNO (operands[1])"
+  [(set (match_dup 0) (zero_extend:SWI48 (match_dup 2)))])
 
 ;; Sign extension instructions
 

diff --git a/gcc/testsuite/g++.target/i386/pr98335.C b/gcc/testsuite/g++.target/i386/pr98335.C
new file mode 100644
index 0000000..2581b83
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr98335.C
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+struct Data {
+  char a;
+  int b;
+};
+
+char c;
+
+Data val(int idx) {
+  return { c };  // { dg-warning "extended initializer" "c++ 98"  { target { c++98_only } } }
+}
+
+/* { dg-final { scan-assembler "movzbl" } } */
+/* { dg-final { scan-assembler-not "xorl" } } */
+/* { dg-final { scan-assembler-not "movb" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr98335.c b/gcc/testsuite/gcc.target/i386/pr98335.c
new file mode 100644
index 0000000..7fa7ad7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98335.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+union Data { char a; short b; };
+
+char c;
+
+void val(void) {
+  __asm__ __volatile__ ("" : : "r" ((union Data) { c } )); } 
+
+/* { dg-final { scan-assembler "movzbl" } } */
+/* { dg-final { scan-assembler-not "xorl" } } */
+/* { dg-final { scan-assembler-not "movb" } } */
  
Uros Bizjak March 10, 2022, 7:17 a.m. UTC | #6
On Wed, Mar 9, 2022 at 7:10 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Firstly many thanks for already (pre)approving half of this patch.  Jakub Jelinek's
> suggestion for creating a testcase that exposes the SImode issue was extremely
> helpful, but interestingly exposed another missed optimization opportunity that's
> also addressed with this revision of my patch.
>
> char c;
> union Data { char a; short b; };
> void val(void) { __asm__ __volatile__ ("" : : "r" ((union Data) { c } )); }
>
> currently on x86_64 with -O2 on mainline generates following code:
>
>         xorl    %eax, %eax
>         movb    $0, %ah
>         movb    c(%rip), %al
>         ret
>
> which contains the strict_low_part movb following an SI mode xor that we hope
> to optimize, but infuriatingly we also have a completely redundant movb $0, %ah.
> Hence, with this patch we have a new peephole2 that sees that in the first two
> instructions the movb is redundant, which then allows the SImode/SWI48 xor
> followed by movb peephole2 to optimize the rest to movzbl.  With the revised
> patch below, the above testcase is now compiled as:
>
>         movzbl  c(%rip), %eax
>         ret
>
> Tested on x86_64-pc-linux-gnu (on top of the revised middle-end patch) with
> make bootstrap and make -k check with no new failures.  Is this revised version
> (still) Ok for mainline?
>
>
> 2022-03-09  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR tree-optimization/98335
>         * config/i386/i386.md (peephole2): Eliminate redundant insv.
>         Combine movl followed by movb.  Transform xorl followed by
>         a suitable movb or movw into the equivalent movz[bw]l.
>
> gcc/testsuite/ChangeLog
>         PR tree-optimization/98335
>         * g++.target/i386/pr98335.C: New test case.
>         * gcc.target/i386/pr98335.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks again for your help/suggested revisions.
> Roger
> --
>
> > -----Original Message-----
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: 09 March 2022 07:36
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [x86 PATCH] PR tree-optimization/98335: New peephole2
> > xorl;movb -> movzbl
> >
> > On Mon, Mar 7, 2022 at 12:51 PM Roger Sayle
> > <roger@nextmovesoftware.com> wrote:
> > >
> > >
> > > Hi Uros,
> > >
> > > > Is there a reason that only inserts to DImode registers are implemented?
> > > > IMO, these peepholes should also handle inserts to SImode.
> > >
> > > I wasn't able to construct a test case that produced a byte or word
> > > insert into an SImode register.  The front-ends and middle-end end up
> > > producing different code sequences, and -m32 changes the ABI so that
> > > small structs get passed in memory rather than in registers.
> > >
> > > Here's the expanded testcase that I investigated:
> > >
> > > struct DataCL { char a; int b; };
> > > struct DataWL { short a; int b; };
> > > struct DataIL { int a; int b; };
> > >
> > > struct DataCI { char a; short b; };
> > > struct DataWI { short a; short b; };
> > >
> > > char c;
> > > short w;
> > > int i;
> > >
> > > DataCL foo_cl(int idx) { return { c }; } DataCL bar_cl(int idx) {
> > > return { c, 0 }; } DataWL foo_wl(int idx) { return { w }; } DataWL
> > > bar_wl(int idx) { return { w, 0 }; } DataIL foo_il(int idx) { return {
> > > i }; } DataIL bar_il(int idx) { return { i, 0 }; }
> > >
> > > DataCI foo_ci(int idx) { return { c }; } DataCI bar_ci(int idx) {
> > > return { c, 0 }; } DataWI foo_wi(int idx) { return { w }; } DataWI
> > > bar_wi(int idx) { return { w, 0 }; }
> > >
> > >
> > > I agree that for completeness similar peepholes handling inserts into
> > > SImode would be a good thing, but these wouldn't be restricted by
> > > TARGET_64BIT and would therefore require additional -m32 testing.
> > > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 is
> > > a regression, SImode peepholes would be more of a "leap of faith".
> > > If you’d be willing to accept a patch without a testcase, let me know.
> >
> > We have plenty of these, where we assume that if the pattern works in one
> > mode, it also works in other modes. So, I think that by changing DI mode to
> > SWI48 mode iterator, we are on the safe side. We can also trust bootstrap and
> > regression tests here.
> >
> > IMO, the patch with SWI48 mode iterator is OK.
> >
> > Thanks,
> > Uros.
> >
> > >
> > > It's also a pity that subreg handling in combine doesn't allow merging
> > > these inserts into zero registers to be combined to zero_extends in a
> > > machine independent way.  My recent patch for PR 95126 (awaiting
> > > review) should also allow front-ends and middle-end passes more
> > > flexibility in optimizing small struct constructors.
> > >
> > > Thanks (as always) for reviewing patches so quickly.
> > >
> > > Roger
> > > --
>
>
  

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d15170e..f1eb62a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4276,6 +4276,26 @@ 
   [(set_attr "isa" "*,avx512dq,avx512dq")
    (set_attr "type" "imovx,mskmov,mskmov")
    (set_attr "mode" "SI,QI,QI")])
+
+;; Transform xorl; mov[bw] (set strict_low_part) into movz[bw]l.
+(define_peephole2
+  [(parallel [(set (match_operand:DI 0 "general_reg_operand")
+		   (const_int 0))
+	      (clobber (reg:CC FLAGS_REG))])
+   (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand"))
+	(match_operand:SWI12 2 "nonimmediate_operand"))]
+  "TARGET_64BIT
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(set (match_dup 0) (zero_extend:DI (match_dup 2)))])
+
+;; Likewise, but preserving FLAGS_REG.
+(define_peephole2
+  [(set (match_operand:DI 0 "general_reg_operand") (const_int 0))
+   (set (strict_low_part (match_operand:SWI12 1 "general_reg_operand"))
+	(match_operand:SWI12 2 "nonimmediate_operand"))]
+  "TARGET_64BIT
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(set (match_dup 0) (zero_extend:DI (match_dup 2)))])
 
 ;; Sign extension instructions
 
diff --git a/gcc/testsuite/g++.target/i386/pr98335.C b/gcc/testsuite/g++.target/i386/pr98335.C
new file mode 100644
index 0000000..2581b83
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr98335.C
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+struct Data {
+  char a;
+  int b;
+};
+
+char c;
+
+Data val(int idx) {
+  return { c };  // { dg-warning "extended initializer" "c++ 98"  { target { c++98_only } } }
+}
+
+/* { dg-final { scan-assembler "movzbl" } } */
+/* { dg-final { scan-assembler-not "xorl" } } */
+/* { dg-final { scan-assembler-not "movb" } } */
+