[x86_64] Introduce insvti_highpart define_insn_and_split.

Message ID 001501d9210f$694bed70$3be3c850$@nextmovesoftware.com
State New
Headers
Series [x86_64] Introduce insvti_highpart define_insn_and_split. |

Commit Message

Roger Sayle Jan. 5, 2023, 2:10 p.m. UTC
  This patch adds a convenient post-reload splitter for setting/updating
the highpart of a TImode variable, using the recently added
split_double_concat infrastructure.

For the new test case below:

__int128 foo(__int128 x, unsigned long long y)
{
  __int128 t = (__int128)y << 64;
  __int128 r = (x & ~0ull) | t;
  return r;
}

mainline GCC with -O2 currently generates:

foo:    movq    %rdi, %rcx
        xorl    %eax, %eax
        xorl    %edi, %edi
        orq     %rcx, %rax
        orq     %rdi, %rdx
        ret

with this patch, GCC instead now generates the much better:

foo:    movq    %rdi, %rcx
        movq    %rcx, %rax
        ret

[which interestingly shows the deeper (ABI) issue that I'm trying
to fix, this new define_insn_and_split being the next piece].

It turns out that the -m32 equivalent of this testcase, already
avoids using explict orl/xor instructions, as it gets optimized
(in combine) by a completely different path.  Given that this idiom
isn't seen in 32-bit code (and therefore this pattern wouldn't match),
and also that the shorter 32-bit AND bitmask is represented as a
CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
is implemented for just TARGET_64BIT rather than contort a "generic"
implementation using DWI mode iterators.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?  Please let me know if you'd
prefer a different pattern name [insv seemed better than mov].


2023-01-05  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386.md (any_or_plus): Move definition earlier.
	(*insvti_highpart_1): New define_insn_and_split to overwrite
	(insv) the highpart of a TImode register/memory.

gcc/testsuite/ChangeLog
	* gcc.target/i386/insvti_highpart-1.c: New test case.


Thanks in advance,
Roger
--
  

Comments

Uros Bizjak Jan. 5, 2023, 8:26 p.m. UTC | #1
On Thu, Jan 5, 2023 at 3:10 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch adds a convenient post-reload splitter for setting/updating
> the highpart of a TImode variable, using the recently added
> split_double_concat infrastructure.
>
> For the new test case below:
>
> __int128 foo(__int128 x, unsigned long long y)
> {
>   __int128 t = (__int128)y << 64;
>   __int128 r = (x & ~0ull) | t;
>   return r;
> }
>
> mainline GCC with -O2 currently generates:
>
> foo:    movq    %rdi, %rcx
>         xorl    %eax, %eax
>         xorl    %edi, %edi
>         orq     %rcx, %rax
>         orq     %rdi, %rdx
>         ret
>
> with this patch, GCC instead now generates the much better:
>
> foo:    movq    %rdi, %rcx
>         movq    %rcx, %rax
>         ret
>
> [which interestingly shows the deeper (ABI) issue that I'm trying
> to fix, this new define_insn_and_split being the next piece].
>
> It turns out that the -m32 equivalent of this testcase, already
> avoids using explict orl/xor instructions, as it gets optimized
> (in combine) by a completely different path.  Given that this idiom
> isn't seen in 32-bit code (and therefore this pattern wouldn't match),
> and also that the shorter 32-bit AND bitmask is represented as a
> CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
> is implemented for just TARGET_64BIT rather than contort a "generic"
> implementation using DWI mode iterators.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?  Please let me know if you'd
> prefer a different pattern name [insv seemed better than mov].
>
>
> 2023-01-05  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (any_or_plus): Move definition earlier.
>         (*insvti_highpart_1): New define_insn_and_split to overwrite
>         (insv) the highpart of a TImode register/memory.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/insvti_highpart-1.c: New test case.

LGTM, but the new pattern looks complex enough to make me a bit
nervous at this development stage. If the patch just extended the
existing SI/DI mode pattern to also handle DI/TI case for 64-bit
targets, I would approve it without hesitation, but since 128 bit
types are not that common ATM, I'd rather postpone introduction of
complex new pattern to the next stage 1.

Target specific part of the compiler does have some more freedom to
introduce new functionality during "general bugfixing" development
stage, so simple patches that extend existing patterns to handle new
modes or constraints should be OK, but let's not stretch this rule too
much.

To follow with the example, your recent extendditi2 splitter patch was
OK even at this development stage, because it just extended existing
patterns and existing approach to also handle 64bit instructions (that
are actually same as 32bit ones).

Uros.
  

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 890c4c8..9f3c62b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3401,6 +3401,31 @@ 
   "mov{b}\t{%h1, %h0|%h0, %h1}"
   [(set_attr "type" "imov")
    (set_attr "mode" "QI")])
+
+(define_code_iterator any_or_plus [plus ior xor])
+(define_insn_and_split "*insvti_highpart_1"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=ro,r,r,&r")
+	(any_or_plus:TI
+	  (and:TI
+	    (match_operand:TI 1 "nonimmediate_operand" "r,m,r,m")
+	    (match_operand:TI 3 "const_scalar_int_operand" "n,n,n,n"))
+	  (ashift:TI
+	    (zero_extend:TI
+	      (match_operand:DI 2 "nonimmediate_operand" "r,r,m,m"))
+	    (const_int 64))))]
+  "TARGET_64BIT
+   && CONST_WIDE_INT_P (operands[3])
+   && CONST_WIDE_INT_NUNITS (operands[3]) == 2
+   && CONST_WIDE_INT_ELT (operands[3], 0) == -1
+   && CONST_WIDE_INT_ELT (operands[3], 1) == 0"
+  "#"
+  "&& reload_completed"
+  [(clobber (const_int 0))]
+{
+  operands[4] = gen_lowpart (DImode, operands[1]);
+  split_double_concat (TImode, operands[0], operands[4], operands[2]);
+  DONE;
+})
 
 ;; Floating point push instructions.
 
@@ -11418,7 +11443,6 @@ 
    (set_attr "mode" "QI")])
 
 ;; Split DST = (HI<<32)|LO early to minimize register usage.
-(define_code_iterator any_or_plus [plus ior xor])
 (define_insn_and_split "*concat<mode><dwi>3_1"
   [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
 	(any_or_plus:<DWI>
diff --git a/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c b/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c
new file mode 100644
index 0000000..4ae9ccf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x, unsigned long long y)
+{
+  __int128 t = (__int128)y << 64;
+  __int128 r = (x & ~0ull) | t;
+  return r;
+}
+
+/* { dg-final { scan-assembler-not "xorl" } } */
+/* { dg-final { scan-assembler-not "orq" } } */