[x86] Improved pre-reload split of double word comparison against -1.

Message ID 01ad01d8a663$6e062820$4a127860$@nextmovesoftware.com
State Committed
Commit e6b011bcfd52c245978ccd540e3f929571c59471
Headers
Series [x86] Improved pre-reload split of double word comparison against -1. |

Commit Message

Roger Sayle Aug. 2, 2022, 11:31 a.m. UTC
  This patch adds an extra optimization to *cmp<dwi>_doubleword to improve
the code generated for comparisons against -1.  Hypothetically, if a
comparison against -1 reached this splitter we'd currently generate code
that looks like:

        notq    %rdx            ; 3 bytes
        notq    %rax            ; 3 bytes
        orq     %rdx, %rax      ; 3 bytes
        setne   %al

With this patch we would instead generate the superior:

        andq    %rdx, %rax      ; 3 bytes
        cmpq    $-1, %rax       ; 4 bytes
        setne   %al

which is both faster and smaller, and also what's currently generated
thanks to the middle-end splitting double word comparisons against
zero and minus one during RTL expansion.  Should that change, this would
become a missed-optimization regression, but this patch also (potentially)
helps suitable comparisons created by CSE and combine.

This patch has been tested on x86_64-pc-linux-gnu, on its own and in
combination with a middle-end patch tweaking RTL expansion, both with
and without --target-board=unix{-m32}, with no new failures.
Ok for mainline?


2022-08-02  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386.md (*cmp<dwi>_doubleword): Add a special case
        to split comparisons against -1 using AND and CMP -1 instructions.


Thanks again,
Roger
--
  

Comments

Uros Bizjak Aug. 2, 2022, 5:24 p.m. UTC | #1
On Tue, Aug 2, 2022 at 1:31 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch adds an extra optimization to *cmp<dwi>_doubleword to improve
> the code generated for comparisons against -1.  Hypothetically, if a
> comparison against -1 reached this splitter we'd currently generate code
> that looks like:
>
>         notq    %rdx            ; 3 bytes
>         notq    %rax            ; 3 bytes
>         orq     %rdx, %rax      ; 3 bytes
>         setne   %al
>
> With this patch we would instead generate the superior:
>
>         andq    %rdx, %rax      ; 3 bytes
>         cmpq    $-1, %rax       ; 4 bytes
>         setne   %al
>
> which is both faster and smaller, and also what's currently generated
> thanks to the middle-end splitting double word comparisons against
> zero and minus one during RTL expansion.  Should that change, this would
> become a missed-optimization regression, but this patch also (potentially)
> helps suitable comparisons created by CSE and combine.
>
> This patch has been tested on x86_64-pc-linux-gnu, on its own and in
> combination with a middle-end patch tweaking RTL expansion, both with
> and without --target-board=unix{-m32}, with no new failures.
> Ok for mainline?
>
>
> 2022-08-02  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (*cmp<dwi>_doubleword): Add a special case
>         to split comparisons against -1 using AND and CMP -1 instructions.

OK.

Thanks,
Uros.
  

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f1158e1..e8f3851 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1526,6 +1526,15 @@ 
       operands[i] = force_reg (<MODE>mode, operands[i]);
 
   operands[4] = gen_reg_rtx (<MODE>mode);
+
+  /* Special case comparisons against -1.  */
+  if (operands[1] == constm1_rtx && operands[3] == constm1_rtx)
+    {
+      emit_insn (gen_and<mode>3 (operands[4], operands[0], operands[2]));
+      emit_insn (gen_cmp_1 (<MODE>mode, operands[4], constm1_rtx));
+      DONE;
+    }
+
   if (operands[1] == const0_rtx)
     emit_move_insn (operands[4], operands[0]);
   else if (operands[0] == const0_rtx)