x86: Shrink writing 0/-1 to memory using and/or with -Oz.

Message ID 03be01d7f67c$9a638530$cf2a8f90$@nextmovesoftware.com
State New
Headers
Series x86: Shrink writing 0/-1 to memory using and/or with -Oz. |

Commit Message

Roger Sayle Dec. 21, 2021, 3:08 p.m. UTC
  This is the second part of my fix to PR target/103773 where -Oz shouldn't
use push/pop on x86 to shrink writing small integer constants to memory.
Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem"
when writing -1 to memory when using -Oz.  This patch implements this
via peephole2 where we can confirm that its ok to clobber the flags.

On the CSiBE benchmark, this reduces total code size from 3664172 bytes
to 3663304 bytes, saving 868 bytes.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and the new testcase checked
both with and without -m32.  Ok for mainline?


2021-12-21  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* gcc/config/i386/i386.md (define_peephole2): With -Oz use
	andl $0,mem instead of movl $0,mem and orl $-1,mem instead of
	movl $-1,mem.

gcc/testsuite/ChangeLog
	* gcc.target/i386/pr103773-2.c: New test case.


Thanks in advance (and my apologies for the breakage).
Roger
--
  

Comments

Uros Bizjak Dec. 22, 2021, 8:02 a.m. UTC | #1
On Tue, Dec 21, 2021 at 4:08 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This is the second part of my fix to PR target/103773 where -Oz shouldn't
> use push/pop on x86 to shrink writing small integer constants to memory.
> Instead clang uses "andl $0, mem" for writing zero, and "orl $-1, mem"
> when writing -1 to memory when using -Oz.  This patch implements this
> via peephole2 where we can confirm that its ok to clobber the flags.
>
> On the CSiBE benchmark, this reduces total code size from 3664172 bytes
> to 3663304 bytes, saving 868 bytes.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures, and the new testcase checked
> both with and without -m32.  Ok for mainline?
>
>
> 2021-12-21  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * gcc/config/i386/i386.md (define_peephole2): With -Oz use
>         andl $0,mem instead of movl $0,mem and orl $-1,mem instead of
>         movl $-1,mem.

Your approach uses access to uninitialized memory, which may confuse optimizers.

Please rather enhance *mov<mode>_xor and *mov<mode>_or to accept
memory operand and convert to these patterns.

Uros.

> gcc/testsuite/ChangeLog
>         * gcc.target/i386/pr103773-2.c: New test case.
>
>
> Thanks in advance (and my apologies for the breakage).
> Roger
> --
>
  

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d25453f..d872824 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20940,6 +20942,19 @@ 
   DONE;
 })
 
+;; Aggressive size optimizations with -Oz
+(define_peephole2
+  [(set (match_operand:SWI248 0 "memory_operand") (const_int 0))]
+  "optimize_size > 1 && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(parallel [(set (match_dup 0) (and:SWI248 (match_dup 0) (const_int 0)))
+	      (clobber (reg:CC FLAGS_REG))])])
+
+(define_peephole2
+  [(set (match_operand:SWI248 0 "memory_operand") (const_int -1))]
+  "optimize_size > 1 && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(parallel [(set (match_dup 0) (ior:SWI248 (match_dup 0) (const_int -1)))
+	      (clobber (reg:CC FLAGS_REG))])])
+
 ;; Reload dislikes loading constants directly into class_likely_spilled
 ;; hard registers.  Try to tidy things up here.
 (define_peephole2
diff --git a/gcc/testsuite/gcc.target/i386/pr103773-2.c b/gcc/testsuite/gcc.target/i386/pr103773-2.c
new file mode 100644
index 0000000..9dafebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103773-2.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Oz" } */
+short s;
+int i;
+long long l;
+
+void s0() { s = 0; }
+void sm1() { s = -1; }
+void i0() { i = 0; }
+void im1() { i = -1; }
+void l0() { l = 0; }
+void lm1() { l = -1; }
+
+/* { dg-final { scan-assembler-not "\tmov\[wlq\]\t\\\$0," } } */
+/* { dg-final { scan-assembler-not "\tmov\[wlq\]\t\\\$-1," } } */
+/* { dg-final { scan-assembler "\tandw\t\\\$0," } } */
+/* { dg-final { scan-assembler "\torw\t\\\$-1," } } */
+/* { dg-final { scan-assembler "\torl\t\\\$-1," } } */
+