Ping: [PATCH] Add a late-combine pass [PR106594]

  Ping

Richard Sandiford <richard.sandiford@arm.com> writes:
> This patch adds a combine pass that runs late in the pipeline.
> There are two instances: one between combine and split1, and one
> after postreload.
>
> The pass currently has a single objective: remove definitions by
> substituting into all uses.  The pre-RA version tries to restrict
> itself to cases that are likely to have a neutral or beneficial
> effect on register pressure.
>
> The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
> in the aarch64 test results, mostly due to making proper use of
> MOVPRFX in cases where we didn't previously.  I hope it would
> also help with Robin's vec_duplicate testcase, although the
> pressure heuristic might need tweaking for that case.
>
> This is just a first step..  I'm hoping that the pass could be
> used for other combine-related optimisations in future.  In particular,
> the post-RA version doesn't need to restrict itself to cases where all
> uses are substitutitable, since it doesn't have to worry about register
> pressure.  If we did that, and if we extended it to handle multi-register
> REGs, the pass might be a viable replacement for regcprop, which in
> turn might reduce the cost of having a post-RA instance of the new pass.
>
> I've run an assembly comparison with one target per CPU directory,
> and it seems to be a win for all targets except nvptx (which is hard
> to measure, being a higher-level asm).  The biggest winner seemed
> to be AVR.
>
> I'd originally hoped to enable the pass by default at -O2 and above
> on all targets.  But in the end, I don't think that's possible,
> because it interacts badly with x86's STV and partial register
> dependency passes.
>
> For example, gcc.target/i386/minmax-6.c tests whether the code
> compiles without any spilling.  The RTL created by STV contains:
>
> (insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
>         (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
>             (const_vector:V4SI [
>                     (const_int 0 [0]) repeated x4
>                 ])
>             (const_int 1 [0x1]))) -1
>      (nil))
> (insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
>         (subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
>      (expr_list:REG_DEAD (reg:SI 120)
>         (nil)))
> (insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
>         (reg:SI 118)) -1
>      (nil))
>
> and it's crucial for the test that reg 108 is kept, rather than
> propagated into uses.  As things stand, 118 can be allocated
> a vector register and 108 a scalar register.  If 108 is propagated,
> there will be scalar and vector uses of 118, and so it will be
> spilled to memory.
>
> That one could be solved by running STV2 later.  But RPAD is
> a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:
>
> (insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
>         (sqrt:DF (mem/c:DF (symbol_ref:DI ("d2"))))) {*sqrtdf2_sse}
>      (nil))
>
> into:
>
> (insn 45 26 44 6 (set (reg:V4SF 108)
>         (const_vector:V4SF [
>                 (const_double:SF 0.0 [0x0.0p+0]) repeated x4
>             ])) -1
>      (nil))
> (insn 44 45 27 6 (set (reg:V2DF 109)
>         (vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI ("d2")))))
>             (subreg:V2DF (reg:V4SF 108) 0)
>             (const_int 1 [0x1]))) -1
>      (nil))
> (insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
>         (subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
>      (nil))
>
> But both the pre-RA and post-RA passes are able to combine these
> instructions back to the original form.
>
> The patch therefore enables the pass by default only on AArch64.
> However, I did test the patch with it enabled on x86_64-linux-gnu
> as well, which was useful for debugging.
>
> Bootstrapped & regression-tested on aarch64-linux-gnu and
> x86_64-linux-gnu (as posted, with no regressions, and with the
> pass enabled by default, with some gcc.target/i386 regressions).
> OK to install?
>
> Richard

gcc/
	PR rtl-optimization/106594
	* Makefile.in (OBJS): Add late-combine.o.
	* common.opt (flate-combine-instructions): New option.
	* doc/invoke.texi: Document it.
	* common/config/aarch64/aarch64-common.cc: Enable it by default
	at -O2 and above.
	* tree-pass.h (make_pass_late_combine): Declare.
	* late-combine.cc: New file.
	* passes.def: Add two instances of late_combine.

gcc/testsuite/
	PR rtl-optimization/106594
	* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
	targets.
	* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
	* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
	* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
	* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
	* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
	* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
	described in the comment.
	* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
	* gcc.target/aarch64/pr106594_1.c: New test.
---
 gcc/Makefile.in                               |   1 +
 gcc/common.opt                                |   5 +
 gcc/common/config/aarch64/aarch64-common.cc   |   1 +
 gcc/doc/invoke.texi                           |  11 +-
 gcc/late-combine.cc                           | 718 ++++++++++++++++++
 gcc/passes.def                                |   2 +
 gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c  |   2 +-
 gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c  |   2 +-
 gcc/testsuite/gcc.dg/stack-check-4.c          |   2 +-
 gcc/testsuite/gcc.target/aarch64/pr106594_1.c |  20 +
 .../gcc.target/aarch64/sve/cond_asrd_3.c      |  10 +-
 .../gcc.target/aarch64/sve/cond_convert_3.c   |   8 +-
 .../gcc.target/aarch64/sve/cond_convert_6.c   |   8 +-
 .../gcc.target/aarch64/sve/cond_fabd_5.c      |  11 +-
 .../gcc.target/aarch64/sve/cond_unary_4.c     |  13 +-
 gcc/tree-pass.h                               |   1 +
 16 files changed, 780 insertions(+), 35 deletions(-)
 create mode 100644 gcc/late-combine.cc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594_1.c

Message ID	mpt34wne3hn.fsf@arm.com
State	New
Headers	DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 421C83858D37 From: Richard Sandiford <richard.sandiford@arm.com> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, jlaw@ventanamicro.com, rdapp.gcc@gmail.com, richard.sandiford@arm.com Cc: jlaw@ventanamicro.com, rdapp.gcc@gmail.com Subject: Ping: [PATCH] Add a late-combine pass [PR106594] References: <mptr0ljn9eh.fsf@arm.com> Date: Thu, 30 Nov 2023 14:10:28 +0000 In-Reply-To: <mptr0ljn9eh.fsf@arm.com> (Richard Sandiford's message of "Tue, 24 Oct 2023 19:49:10 +0100") Message-ID: <mpt34wne3hn.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: list Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Series	Ping: [PATCH] Add a late-combine pass [PR106594] \| Ping: [PATCH] Add a late-combine pass [PR106594]

Context	Check	Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm	success	Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64	success	Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm	success	Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64	success	Testing passed

Ping: [PATCH] Add a late-combine pass [PR106594]

Checks

Commit Message

Patch