vect+aarch64: Fix ldp_stp_* regressions

From: Richard Biener <rguenther@suse.de>

  ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since
vectorisation was enabled at -O2.  In all three cases SLP is
generating vector code when scalar code would be better.

The problem is that the target costs do not model whether STP could
be used for the scalar or vector code, so the normal latency-based
costs for store-heavy code can be way off.  It would be good to fix
that “properly” at some point, but it isn't easy; see the existing
discussion in aarch64_sve_adjust_stmt_cost for more details.

This patch therefore adds an on-the-side check for whether the
code is doing nothing more than set-up+stores.  It then applies
STP-based costs to those cases only, in addition to the normal
latency-based costs.  (That is, the vector code has to win on
both counts rather than on one count individually.)

However, at the moment, SLP costs one vector set-up instruction
for every vector in an SLP node, even if the contents are the
same as a previous vector in the same node.  Fixing the STP costs
without fixing that would regress other cases, tested in the patch.

The patch therefore makes the SLP costing code check for duplicates
within a node.  Ideally we'd check for duplicates more globally,
but that would require a more global approach to costs: the cost
of an initialisation should be amoritised across all trees that
use the initialisation, rather than fully counted against one
arbitrarily-chosen subtree.

Back on aarch64: an earlier version of the patch tried to apply
the new heuristic to constant stores.  However, that didn't work
too well in practice; see the comments for details.  The patch
therefore just tests the status quo for constant cases, leaving out
a match if the current choice is dubious.

ldp_stp_5.c was affected by the same thing.  The test would be
worth vectorising if we generated better vector code, but:

(1) We do a bad job of moving the { -1, 1 } constant, given that
    we have { -1, -1 } and { 1, 1 } to hand.

(2) The vector code has 6 pairable stores to misaligned offsets.
    We have peephole patterns to handle such misalignment for
    4 pairable stores, but not 6.

So the SLP decision isn't wrong as such.  It's just being let
down by later codegen.

The patch therefore adds -mstrict-align to preserve the original
intention of the test while adding ldp_stp_19.c to check for the
preferred vector code (XFAILed for now).

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK for the vectoriser bits?

Thanks,
Richard

gcc/
	* tree-vectorizer.h (vect_scalar_ops_slice): New struct.
	(vect_scalar_ops_slice_hash): Likewise.
	(vect_scalar_ops_slice::op): New function.
	* tree-vect-slp.cc (vect_scalar_ops_slice::all_same_p): New function.
	(vect_scalar_ops_slice_hash::hash): Likewise.
	(vect_scalar_ops_slice_hash::equal): Likewise.
	(vect_prologue_cost_for_slp): Check for duplicate vectors.
	* config/aarch64/aarch64.cc
	(aarch64_vector_costs::m_stp_sequence_cost): New member variable.
	(aarch64_aligned_constant_offset_p): New function.
	(aarch64_stp_sequence_cost): Likewise.
	(aarch64_vector_costs::add_stmt_cost): Handle new STP heuristic.
	(aarch64_vector_costs::finish_cost): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/ldp_stp_5.c: Require -mstrict-align.
	* gcc.target/aarch64/ldp_stp_14.h,
	* gcc.target/aarch64/ldp_stp_14.c: New test.
	* gcc.target/aarch64/ldp_stp_15.c: Likewise.
	* gcc.target/aarch64/ldp_stp_16.c: Likewise.
	* gcc.target/aarch64/ldp_stp_17.c: Likewise.
	* gcc.target/aarch64/ldp_stp_18.c: Likewise.
	* gcc.target/aarch64/ldp_stp_19.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc                 | 140 ++++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_14.c |  89 +++++++++++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_14.h |  50 +++++++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c | 137 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c | 133 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c | 120 +++++++++++++++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c | 123 +++++++++++++++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_19.c |   6 +
 gcc/testsuite/gcc.target/aarch64/ldp_stp_5.c  |   2 +-
 gcc/tree-vect-slp.cc                          |  75 ++++++----
 gcc/tree-vectorizer.h                         |  35 +++++
 11 files changed, 884 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_14.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_14.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_19.c

Message ID	mpt1r05xxzs.fsf@arm.com
State	New
Headers	Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F302E3858413 for <patchwork@sourceware.org>; Mon, 14 Feb 2022 15:36:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F302E3858413 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644852986; bh=OdWZVlTMGEU3A+fQM7Tt0sgiEYR8R007JcnwMil01Hk=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=anPdSx9KCmUCQ06h58b9SN5y6qaObj0Tt7bLIqkxNgbRZMYnFWPl2Wlp3R2KzKfvM PZsMK2beyVNnKPy8V1ByBdquQ19vRLUC+G30ZudJ1gcGyWS9EE2UeME/6LbMg9ZFdn ZP52L9O8SsCodpNzx7TY5KGzW06rToXl3Nw3Rd5E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id C7AB63858024 for <gcc-patches@gcc.gnu.org>; Mon, 14 Feb 2022 15:35:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C7AB63858024 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5F96D1063; Mon, 14 Feb 2022 07:35:05 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ADA093F70D; Mon, 14 Feb 2022 07:35:04 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Subject: [PATCH] vect+aarch64: Fix ldp_stp_* regressions Date: Mon, 14 Feb 2022 15:35:03 +0000 Message-ID: <mpt1r05xxzs.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Richard Sandiford <richard.sandiford@arm.com> Cc: rguenther@suse.de Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	vect+aarch64: Fix ldp_stp_* regressions \| vect+aarch64: Fix ldp_stp_* regressions

vect+aarch64: Fix ldp_stp_* regressions

Commit Message

Comments

Patch