[x86_64] Improved Scalar-To-Vector (STV) support for TImode to V1TImode.

  This patch upgrades x86_64's scalar-to-vector (STV) pass to more
aggressively transform 128-bit scalar TImode operations into vector
V1TImode operations performed on SSE registers.  TImode functionality
already exists in STV, but only for move operations, this changes
brings support for logical operations (AND, IOR, XOR, NOT and ANDN)
and comparisons.

The effect of these changes are conveniently demonstrated by the new
sse4_1-stv-5.c test case:

__int128 a[16];
__int128 b[16];
__int128 c[16];

void foo()
{
  for (unsigned int i=0; i<16; i++)
    a[i] = b[i] & ~c[i];
}

which when currently compiled on mainline wtih -O2 -msse4 produces:

foo:    xorl    %eax, %eax
.L2:    movq    c(%rax), %rsi
        movq    c+8(%rax), %rdi
        addq    $16, %rax
        notq    %rsi
        notq    %rdi
        andq    b-16(%rax), %rsi
        andq    b-8(%rax), %rdi
        movq    %rsi, a-16(%rax)
        movq    %rdi, a-8(%rax)
        cmpq    $256, %rax
        jne     .L2
        ret

but with this patch now produces:

foo:    xorl    %eax, %eax
.L2:    movdqa  c(%rax), %xmm0
        pandn   b(%rax), %xmm0
        addq    $16, %rax
        movaps  %xmm0, a-16(%rax)
        cmpq    $256, %rax
        jne     .L2
        ret

Technically, the STV pass is implemented by three C++ classes, a common
abstract base class "scalar_chain" that contains common functionality,
and two derived classes: general_scalar_chain (which handles SI and
DI modes) and timode_scalar_chain (which handles TI modes).  As
mentioned previously, because only TI mode moves were handled the
two worker classes behaved significantly differently.  These changes
bring the functionality of these two classes closer together, which
is reflected by refactoring more shared code from general_scalar_chain
to the parent scalar_chain and reusing it from timode.  There still
remain significant differences (and simplifications) so the existing
division of classes (as specializations) continues to make sense.

Obviously, there are more changes to come (shifts and rotates),
and compute_convert_gain doesn't yet have its final (tuned) form,
but is already an improvement over the "return 1;" used previously.

This patch has been tested on x86_64-pc-linux-gnu with make boostrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2022-07-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386-features.h (scalar_chain): Add fields
        insns_conv, n_sse_to_integer and n_integer_to_sse to this
        parent class, moved from general_scalar_chain.
        (scalar_chain::convert_compare): Protected method moved
        from general_scalar_chain.
        (mark_dual_mode_def): Make protected, not private virtual.
        (scalar_chain:convert_op): New private virtual method.

        (general_scalar_chain::general_scalar_chain): Simplify constructor.
        (general_scalar_chain::~general_scalar_chain): Delete destructor.
        (general_scalar_chain): Move insns_conv, n_sse_to_integer and
        n_integer_to_sse fields to parent class, scalar_chain.
        (general_scalar_chain::mark_dual_mode_def): Delete prototype.
        (general_scalar_chain::convert_compare): Delete prototype.

        (timode_scalar_chain::compute_convert_gain): Remove simplistic
        implementation, convert to a method prototype.
        (timode_scalar_chain::mark_dual_mode_def): Delete prototype.
        (timode_scalar_chain::convert_op): Prototype new virtual method.

        * config/i386/i386-features.cc (scalar_chain::scalar_chain):
        Allocate insns_conv and initialize n_sse_to_integer and
        n_integer_to_sse fields in constructor.
        (scalar_chain::scalar_chain): Free insns_conv in destructor.

        (general_scalar_chain::general_scalar_chain): Delete
        constructor, now defined in the class declaration.
        (general_scalar_chain::~general_scalar_chain): Delete destructor.

        (scalar_chain::mark_dual_mode_def): Renamed from
        general_scalar_chain::mark_dual_mode_def.
        (timode_scalar_chain::mark_dual_mode_def): Delete.
        (scalar_chain::convert_compare): Renamed from
        general_scalar_chain::convert_compare.

        (timode_scalar_chain::compute_convert_gain): New method to
        determine the gain from converting a TImode chain to V1TImode.
        (timode_scalar_chain::convert_op): New method to convert an
        operand from TImode to V1TImode.

        (timode_scalar_chain::convert_insn) <case REG>: Only PUT_MODE
        on REG_EQUAL notes that were originally TImode (not CONST_INT).
        Handle AND, ANDN, XOR, IOR, NOT and COMPARE.
        (timode_mem_p): Helper predicate to check where operand is
        memory reference with sufficient alignment for TImode STV.
        (timode_scalar_to_vector_candidate_p): Use convertible_comparison_p
        to check whether COMPARE is convertible.  Handle SET_DESTs that
        that are REG_P or MEM_P and SET_SRCs that are REG, CONST_INT,
        CONST_WIDE_INT, MEM, AND, ANDN, IOR, XOR or NOT.

gcc/testsuite/ChangeLog
        * gcc.target/i386/sse4_1-stv-2.c: New test case, pand.
        * gcc.target/i386/sse4_1-stv-3.c: New test case, por.
        * gcc.target/i386/sse4_1-stv-4.c: New test case, pxor.
        * gcc.target/i386/sse4_1-stv-5.c: New test case, pandn.
        * gcc.target/i386/sse4_1-stv-6.c: New test case, ptest.

Roger
--

Message ID	000901d8938d$ead4dc40$c07e94c0$@nextmovesoftware.com
State	New
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 426F33858282 From: "Roger Sayle" <roger@nextmovesoftware.com> To: <gcc-patches@gcc.gnu.org> Subject: [x86_64 PATCH] Improved Scalar-To-Vector (STV) support for TImode to V1TImode. Date: Sat, 9 Jul 2022 13:17:53 +0100 Message-ID: <000901d8938d$ead4dc40$c07e94c0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_000A_01D89396.4C9B8E30" Thread-Index: AdiTjS7dNbX43LhjQ0SVVqmuyatN2w== Content-Language: en-gb Precedence: list Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	[x86_64] Improved Scalar-To-Vector (STV) support for TImode to V1TImode. \| [x86_64] Improved Scalar-To-Vector (STV) support for TImode to V1TImode.

[x86_64] Improved Scalar-To-Vector (STV) support for TImode to V1TImode.

Commit Message

Comments

Patch