PR target/103611: Avoid generating orb $0, %ah on x86.

Message ID 006d01d7f01a$39d204c0$ad760e40$@nextmovesoftware.com
State New
Headers
Series PR target/103611: Avoid generating orb $0, %ah on x86. |

Commit Message

Roger Sayle Dec. 13, 2021, 12:09 p.m. UTC
  I'll post my proposed fix for PR target/103611 shortly, but this patch
fixes another missed optimization opportunity revealed by that PR.
Occasionally, reload materializes integer constants during register
allocation sometimes resulting in unnecessary instructions such as:

(insn 23 31 24 2 (parallel [
            (set (reg:SI 0 ax [99])
                (ior:SI (reg:SI 0 ax [99])
                    (const_int 0 [0])))
            (clobber (reg:CC 17 flags))
        ]) "pr103611.c":18:73 550 {*iorsi_1}
     (nil))

These then get "optimized" during the split2 pass, which realizes that
no bits outside of 0xff00 are set, so this operation can be implemented
by operating on just the highpart of a QIreg_operand, i.e. %ah, %bh, %ch
etc., which leads to the useless "orb $0, %ah" seen in the reported PR.

This fix catches the case of const0_rtx in relevant splitter, either
eliminating the instruction or turning it into a simple move.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without "--target_board='unix{-m32}'"
with no new failures.  OK for mainline?


2021-12-13  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386.md (define_split any_or:SWI248 -> orb %?h):
	Optimize the case where the integer constant operand is zero.

gcc/testsuite/ChangeLog
	* gcc.target/i386/pr103611-1.c: New test case.


Thanks in advance,
Roger
--
  

Comments

Uros Bizjak Dec. 13, 2021, 2:49 p.m. UTC | #1
On Mon, Dec 13, 2021 at 1:09 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> I'll post my proposed fix for PR target/103611 shortly, but this patch
> fixes another missed optimization opportunity revealed by that PR.
> Occasionally, reload materializes integer constants during register
> allocation sometimes resulting in unnecessary instructions such as:
>
> (insn 23 31 24 2 (parallel [
>             (set (reg:SI 0 ax [99])
>                 (ior:SI (reg:SI 0 ax [99])
>                     (const_int 0 [0])))
>             (clobber (reg:CC 17 flags))
>         ]) "pr103611.c":18:73 550 {*iorsi_1}
>      (nil))
>
> These then get "optimized" during the split2 pass, which realizes that
> no bits outside of 0xff00 are set, so this operation can be implemented
> by operating on just the highpart of a QIreg_operand, i.e. %ah, %bh, %ch
> etc., which leads to the useless "orb $0, %ah" seen in the reported PR.
>
> This fix catches the case of const0_rtx in relevant splitter, either
> eliminating the instruction or turning it into a simple move.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without "--target_board='unix{-m32}'"
> with no new failures.  OK for mainline?
>
>
> 2021-12-13  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (define_split any_or:SWI248 -> orb %?h):
>         Optimize the case where the integer constant operand is zero.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/pr103611-1.c: New test case.

OK with a small testsuite adjustment.

Thanks,
Uros.

+/* { dg-do compile } */
+/* { dg-options "-m32 -O2 -msse4" } */

Please add target selector to dg-do compile directive instead of an
explicit -m32 to dg-options, e.g.:

/* { dg-do compile { target ia32 } } */
  

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9d7d116..f6d9c4b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10542,6 +10542,15 @@ 
 	       (match_dup 2)) 0))
       (clobber (reg:CC FLAGS_REG))])]
 {
+  /* Handle the case where INTVAL (operands[2]) == 0.  */
+  if (operands[2] == const0_rtx)
+    {
+      if (!rtx_equal_p (operands[0], operands[1]))
+	emit_move_insn (operands[0], operands[1]);
+      else
+	emit_note (NOTE_INSN_DELETED);
+      DONE;
+    }
   operands[0] = gen_lowpart (SImode, operands[0]);
   operands[1] = gen_lowpart (SImode, operands[1]);
   operands[2] = gen_int_mode (INTVAL (operands[2]) >> 8, QImode);
diff --git a/gcc/testsuite/gcc.target/i386/pr103611-1.c b/gcc/testsuite/gcc.target/i386/pr103611-1.c
new file mode 100644
index 0000000..1fae91d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103611-1.c
@@ -0,0 +1,30 @@ 
+/* { dg-do compile } */
+/* { dg-options "-m32 -O2 -msse4" } */
+typedef int __v4si __attribute__ ((__vector_size__ (16)));
+
+long long ior_1(__v4si v) {
+  unsigned int loVal = (unsigned int)v[0];
+  unsigned int hiVal = (unsigned int)v[1];
+  return (long long)(loVal) | ((long long)(hiVal) << 32);
+}
+
+long long ior_2(__v4si v) {
+  unsigned int loVal = (unsigned int)v[2];
+  unsigned int hiVal = (unsigned int)v[3];
+  return (long long)(loVal) | ((long long)(hiVal) << 32);
+}
+
+long long xor_1(__v4si v) {
+  unsigned int loVal = (unsigned int)v[0];
+  unsigned int hiVal = (unsigned int)v[1];
+  return (long long)(loVal) ^ ((long long)(hiVal) << 32);
+}
+
+long long xor_2(__v4si v) {
+  unsigned int loVal = (unsigned int)v[2];
+  unsigned int hiVal = (unsigned int)v[3];
+  return (long long)(loVal) ^ ((long long)(hiVal) << 32);
+}
+/* { dg-final { scan-assembler-not "\torb\t\\\$0," } } */
+/* { dg-final { scan-assembler-not "\txorb\t\\\$0," } } */
+