Ping: [PATCH] Add a late-combine pass [PR106594]

  Ping

---

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.  I hope it would
also help with Robin's vec_duplicate testcase, although the
pressure heuristic might need tweaking for that case.

This is just a first step..  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutitable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

I've run an assembly comparison with one target per CPU directory,
and it seems to be a win for all targets except nvptx (which is hard
to measure, being a higher-level asm).  The biggest winner seemed
to be AVR.

I'd originally hoped to enable the pass by default at -O2 and above
on all targets.  But in the end, I don't think that's possible,
because it interacts badly with x86's STV and partial register
dependency passes.

For example, gcc.target/i386/minmax-6.c tests whether the code
compiles without any spilling.  The RTL created by STV contains:

(insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
        (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
            (const_vector:V4SI [
                    (const_int 0 [0]) repeated x4
                ])
            (const_int 1 [0x1]))) -1
     (nil))
(insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
        (subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
     (expr_list:REG_DEAD (reg:SI 120)
        (nil)))
(insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
        (reg:SI 118)) -1
     (nil))

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

That one could be solved by running STV2 later.  But RPAD is
a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:

(insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
        (sqrt:DF (mem/c:DF (symbol_ref:DI ("d2"))))) {*sqrtdf2_sse}
     (nil))

into:

(insn 45 26 44 6 (set (reg:V4SF 108)
        (const_vector:V4SF [
                (const_double:SF 0.0 [0x0.0p+0]) repeated x4
            ])) -1
     (nil))
(insn 44 45 27 6 (set (reg:V2DF 109)
        (vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI ("d2")))))
            (subreg:V2DF (reg:V4SF 108) 0)
            (const_int 1 [0x1]))) -1
     (nil))
(insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
        (subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
     (nil))

But both the pre-RA and post-RA passes are able to combine these
instructions back to the original form.

The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?

Richard

gcc/
	PR rtl-optimization/106594
	* Makefile.in (OBJS): Add late-combine.o.
	* common.opt (flate-combine-instructions): New option.
	* doc/invoke.texi: Document it.
	* common/config/aarch64/aarch64-common.cc: Enable it by default
	at -O2 and above.
	* tree-pass.h (make_pass_late_combine): Declare.
	* late-combine.cc: New file.
	* passes.def: Add two instances of late_combine.

gcc/testsuite/
	PR rtl-optimization/106594
	* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
	targets.
	* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
	* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
	* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
	* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
	* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
	* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
	described in the comment.
	* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
	* gcc.target/aarch64/pr106594_1.c: New test.
---
 gcc/Makefile.in                               |   1 +
 gcc/common.opt                                |   5 +
 gcc/common/config/aarch64/aarch64-common.cc   |   1 +
 gcc/doc/invoke.texi                           |  11 +-
 gcc/late-combine.cc                           | 718 ++++++++++++++++++
 gcc/passes.def                                |   2 +
 gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c  |   2 +-
 gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c  |   2 +-
 gcc/testsuite/gcc.dg/stack-check-4.c          |   2 +-
 gcc/testsuite/gcc.target/aarch64/pr106594_1.c |  20 +
 .../gcc.target/aarch64/sve/cond_asrd_3.c      |  10 +-
 .../gcc.target/aarch64/sve/cond_convert_3.c   |   8 +-
 .../gcc.target/aarch64/sve/cond_convert_6.c   |   8 +-
 .../gcc.target/aarch64/sve/cond_fabd_5.c      |  11 +-
 .../gcc.target/aarch64/sve/cond_unary_4.c     |  13 +-
 gcc/tree-pass.h                               |   1 +
 16 files changed, 780 insertions(+), 35 deletions(-)
 create mode 100644 gcc/late-combine.cc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594_1.c

Message ID	mptttooepa1.fsf@arm.com
State	New
Headers	Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5C61E38582A8 for <patchwork@sourceware.org>; Mon, 11 Dec 2023 15:24:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D1B0B3858C78 for <gcc-patches@gcc.gnu.org>; Mon, 11 Dec 2023 15:23:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D1B0B3858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D1B0B3858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702308237; cv=none; b=P5DHoHCL1Dh6PGA2T2+5+jRxHsJAwziNZGeJ93eQ67h7D1qy54NgpJneS6TwlvtyWWj6pFUzv1ahNQTaWaCKV1ftfch17eU2M4OelOJ+8LsSy/MGPDEf0Q1KnRGfAdG19QSH6IWW2Pm3T7THtsLXbSC81GxOLMl09k8/cJR9mrc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702308237; c=relaxed/simple; bh=wbTfUN1ohp8rkoiFcWaMhRFWP94iGpwmK2zk9h1Y5WU=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=jmjQzqmPbLycaJzUR13FuY6uslR3lr70qkuMfp7FZqhmkTOjut0eIHfrsdQ5yOSUNKVMUgrTIlKsQVqgZYZZqWmJCmdblU+olBis2F5D7PNzCroeMQlbWzZEiHSHhywYZs11pf7MMiR0mPP0v32YAPczIGH3p49IDW7C5EIPmzY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1161EFEC for <gcc-patches@gcc.gnu.org>; Mon, 11 Dec 2023 07:24:39 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E6D873F738 for <gcc-patches@gcc.gnu.org>; Mon, 11 Dec 2023 07:23:51 -0800 (PST) From: Richard Sandiford <richard.sandiford@arm.com> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Ping: [PATCH] Add a late-combine pass [PR106594] References: <mptr0ljn9eh.fsf@arm.com> Date: Mon, 11 Dec 2023 15:23:50 +0000 In-Reply-To: <mptr0ljn9eh.fsf@arm.com> (Richard Sandiford's message of "Tue, 24 Oct 2023 19:49:10 +0100") Message-ID: <mptttooepa1.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-18.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPAM_BODY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Series	Ping: [PATCH] Add a late-combine pass [PR106594] \| Ping: [PATCH] Add a late-combine pass [PR106594]

Context	Check	Description
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64	fail	Patch failed to apply
linaro-tcwg-bot/tcwg_gcc_build--master-arm	fail	Patch failed to apply

Ping: [PATCH] Add a late-combine pass [PR106594]

Checks

Commit Message

Comments

Patch