[take,#3] PR target/103773: Fix wrong-code with -Oz from pop to memory.
Commit Message
Hi Uros,
A huge thanks for the list of suggested improvements to the -Oz related patches.
I've combined them altogether in the submission below, which makes sense now
that everything is implemented using peephole2. The implementation of
push/pop via peephole2 is exactly as you've suggested, also checking that the
immediate value isn't zero (the value -1 is still a size win over OR), and extended
to include HImode (where it is a win), but not QImode (where it isn't).
For writes to memory, I've extended *mov<mode>_or to allow memory destinations
and HImode, but I've introduced a new *mov<mode>_and for writing zero to memory,
rather than complicate/overload *mov<mode>_xor (for example, it doesn't take an
immediate). In this form, only a single peephole2 is needed, that adds a clobber to
the instruction if the flags are dead.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and the new testcase checked
both with and without -m32. Ok for mainline?
2021-12-23 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/103773
* config/i386/i386.md (*mov<mode>_and): New define_insn for
writing a zero to memory using AND.
(*mov<mode>_or): Extend to allow memory destination and HImode.
(*movdi_internal): Remove -Oz push/pop optimization from here.
(*movsi_internal): Likewise.
(peephole2): Perform -Oz push/pop optimization here, only for
register destinations, values other than zero, and in functions
that don't used the red zone.
(peephole2): With -Oz, convert writes of 0 or -1 to memory into
their clobber forms, i.e. *mov<mode>_and and *mov<mode>_or resp.
gcc/testsuite/ChangeLog
PR target/103773
* gcc.target/pr103773-2.c: New test case.
* gcc.target/pr103773.c: New test case.
Many thanks again for your help.
Roger
--
> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 22 December 2021 15:24
> To: Roger Sayle <roger@nextmovesoftware.com>
> Subject: Re: [PATCH] PR target/103773: Fix wrong-code with -Oz from pop to
> memory.
>
> On Wed, Dec 22, 2021 at 3:19 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, Dec 22, 2021 at 2:57 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Wed, Dec 22, 2021 at 2:12 PM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> > > >
> > > >
> > > > Hi Uros,
> > > > I'm bootstrapping and regression testing your proposed patch now
> > > > (including the removal/reversion of my pieces in *mov[sd]i2_internal).
> > > > Many thanks for all of your help with this.
> > >
> > > Probably you want to avoid transformation of loads of 0 and -1,
> > > which should still be implemented via xor %reg, %ref and or $-1, %eax.
> >
> > This constraint will result in optimal conversion approach:
> >
> > + "optimize_insn_for_size_p () && optimize_size > 1
> > + && operands[1] != const0_rtx && operands[1] != constm1_rtx
> > + && IN_RANGE (INTVAL (operands[1]), -128, 127)
> > + && !ix86_red_zone_used"
>
> I think we should also convert HImode and QImode initializations, the pattern
> supports it by changing the mode iterator to SWI.
>
> Uros.
Comments
On Thu, Dec 23, 2021 at 10:35 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
> Hi Uros,
>
> A huge thanks for the list of suggested improvements to the -Oz related patches.
> I've combined them altogether in the submission below, which makes sense now
> that everything is implemented using peephole2. The implementation of
> push/pop via peephole2 is exactly as you've suggested, also checking that the
> immediate value isn't zero (the value -1 is still a size win over OR), and extended
> to include HImode (where it is a win), but not QImode (where it isn't).
>
> For writes to memory, I've extended *mov<mode>_or to allow memory destinations
> and HImode, but I've introduced a new *mov<mode>_and for writing zero to memory,
> rather than complicate/overload *mov<mode>_xor (for example, it doesn't take an
> immediate). In this form, only a single peephole2 is needed, that adds a clobber to
> the instruction if the flags are dead.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures, and the new testcase checked
> both with and without -m32. Ok for mainline?
>
>
> 2021-12-23 Roger Sayle <roger@nextmovesoftware.com>
> Uroš Bizjak <ubizjak@gmail.com>
>
> gcc/ChangeLog
> PR target/103773
> * config/i386/i386.md (*mov<mode>_and): New define_insn for
> writing a zero to memory using AND.
> (*mov<mode>_or): Extend to allow memory destination and HImode.
> (*movdi_internal): Remove -Oz push/pop optimization from here.
> (*movsi_internal): Likewise.
> (peephole2): Perform -Oz push/pop optimization here, only for
> register destinations, values other than zero, and in functions
> that don't used the red zone.
> (peephole2): With -Oz, convert writes of 0 or -1 to memory into
> their clobber forms, i.e. *mov<mode>_and and *mov<mode>_or resp.
>
> gcc/testsuite/ChangeLog
> PR target/103773
> * gcc.target/pr103773-2.c: New test case.
> * gcc.target/pr103773.c: New test case.
OK, but please add a small comment above new peephole2 patterns.
Thanks,
Uros.
@@ -2028,9 +2028,19 @@
(set_attr "mode" "SI")
(set_attr "length_immediate" "0")])
+(define_insn "*mov<mode>_and"
+ [(set (match_operand:SWI248 0 "memory_operand" "=m")
+ (match_operand:SWI248 1 "const0_operand"))
+ (clobber (reg:CC FLAGS_REG))]
+ "reload_completed"
+ "and{<imodesuffix>}\t{%1, %0|%0, %1}"
+ [(set_attr "type" "alu1")
+ (set_attr "mode" "<MODE>")
+ (set_attr "length_immediate" "1")])
+
(define_insn "*mov<mode>_or"
- [(set (match_operand:SWI48 0 "register_operand" "=r")
- (match_operand:SWI48 1 "constm1_operand"))
+ [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
+ (match_operand:SWI248 1 "constm1_operand"))
(clobber (reg:CC FLAGS_REG))]
"reload_completed"
"or{<imodesuffix>}\t{%1, %0|%0, %1}"
@@ -2218,14 +2228,7 @@
case TYPE_IMOV:
gcc_assert (!flag_pic || LEGITIMATE_PIC_OPERAND_P (operands[1]));
if (get_attr_mode (insn) == MODE_SI)
- {
- if (optimize_size > 1
- && TARGET_64BIT
- && CONST_INT_P (operands[1])
- && IN_RANGE (INTVAL (operands[1]), -128, 127))
- return "push{q}\t%1\n\tpop{q}\t%0";
- return "mov{l}\t{%k1, %k0|%k0, %k1}";
- }
+ return "mov{l}\t{%k1, %k0|%k0, %k1}";
else if (which_alternative == 4)
return "movabs{q}\t{%1, %0|%0, %1}";
else if (ix86_use_lea_for_mov (insn, operands))
@@ -2443,14 +2446,6 @@
gcc_assert (!flag_pic || LEGITIMATE_PIC_OPERAND_P (operands[1]));
if (ix86_use_lea_for_mov (insn, operands))
return "lea{l}\t{%E1, %0|%0, %E1}";
- else if (optimize_size > 1
- && CONST_INT_P (operands[1])
- && IN_RANGE (INTVAL (operands[1]), -128, 127))
- {
- if (TARGET_64BIT)
- return "push{q}\t%1\n\tpop{q}\t%q0";
- return "push{l}\t%1\n\tpop{l}\t%0";
- }
else
return "mov{l}\t{%1, %0|%0, %1}";
@@ -2514,6 +2509,34 @@
]
(symbol_ref "true")))])
+(define_peephole2
+ [(set (match_operand:SWI248 0 "general_reg_operand")
+ (match_operand:SWI248 1 "const_int_operand"))]
+ "optimize_insn_for_size_p () && optimize_size > 1
+ && operands[1] != const0_rtx
+ && IN_RANGE (INTVAL (operands[1]), -128, 127)
+ && !ix86_red_zone_used"
+ [(set (match_dup 2) (match_dup 1))
+ (set (match_dup 0) (match_dup 3))]
+{
+ if (GET_MODE (operands[0]) != word_mode)
+ operands[0] = gen_rtx_REG (word_mode, REGNO (operands[0]));
+
+ operands[2] = gen_rtx_MEM (word_mode,
+ gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx));
+ operands[3] = gen_rtx_MEM (word_mode,
+ gen_rtx_POST_INC (Pmode, stack_pointer_rtx));
+})
+
+(define_peephole2
+ [(set (match_operand:SWI248 0 "memory_operand")
+ (match_operand:SWI248 1 "const_int_operand"))]
+ "(operands[1] == const0_rtx || operands[1] == constm1_rtx)
+ && optimize_insn_for_size_p () && optimize_size > 1
+ && peep2_regno_dead_p (0, FLAGS_REG)"
+ [(parallel [(set (match_dup 0) (match_dup 1))
+ (clobber (reg:CC FLAGS_REG))])])
+
(define_insn "*movhi_internal"
[(set (match_operand:HI 0 "nonimmediate_operand"
"=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
new file mode 100644
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-Oz" } */
+short s;
+int i;
+long long l;
+
+void s0() { s = 0; }
+void sm1() { s = -1; }
+void i0() { i = 0; }
+void im1() { i = -1; }
+void l0() { l = 0; }
+void lm1() { l = -1; }
+
+/* { dg-final { scan-assembler-not "\tmov\[wlq\]\t\\\$0," } } */
+/* { dg-final { scan-assembler-not "\tmov\[wlq\]\t\\\$-1," } } */
+/* { dg-final { scan-assembler "\tandw\t\\\$0," } } */
+/* { dg-final { scan-assembler "\torw\t\\\$-1," } } */
+/* { dg-final { scan-assembler "\torl\t\\\$-1," } } */
+
new file mode 100644
@@ -0,0 +1,12 @@
+/* { dg-do run } */
+/* { dg-options "-Oz" } */
+
+unsigned long long x;
+
+int main (void)
+{
+ __builtin_memset (&x, 0xff, 4);
+ if (x != 0xffffffff)
+ __builtin_abort ();
+ return 0;
+}