From patchwork Mon Oct 18 13:41:17 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Aldy Hernandez <aldyh@redhat.com>
X-Patchwork-Id: 46348
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9B8993858426
	for <patchwork@sourceware.org>; Mon, 18 Oct 2021 13:42:05 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B8993858426
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1634564525;
	bh=LYJIjnbav6y5zI+nLttBz7fFik5ygsOpCp2f2hLSYLE=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=gTWiNbaCKnNCfiaGB5d2+CajVn+JFaHa4/MaLFAPThepizNHPRNLKk0CueA1Y75O/
	 Mg0aLzXVSXzsCJFULJn5D1cj1VRS0EfyLLk1fcOyyKqXe61Nmc7jgtIxgEKatnLTsv
	 KSrMsKxcOg1/eZUjH8vjjmKF8TeiHGXwMa1ttAZw=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id BB4153858400
 for <gcc-patches@gcc.gnu.org>; Mon, 18 Oct 2021 13:41:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BB4153858400
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-338-Ck1gCX8BNbW0HNy6w2Kf6A-1; Mon, 18 Oct 2021 09:41:32 -0400
X-MC-Unique: Ck1gCX8BNbW0HNy6w2Kf6A-1
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com
 [10.5.11.14])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4F7D910151E0;
 Mon, 18 Oct 2021 13:41:31 +0000 (UTC)
Received: from abulafia.quesejoda.com (unknown [10.39.193.104])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id 0EA4D5DF21;
 Mon, 18 Oct 2021 13:41:27 +0000 (UTC)
Received: from abulafia.quesejoda.com (localhost [127.0.0.1])
 by abulafia.quesejoda.com (8.16.1/8.15.2) with ESMTPS id 19IDfPSe410154
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
 Mon, 18 Oct 2021 15:41:26 +0200
Received: (from aldyh@localhost)
 by abulafia.quesejoda.com (8.16.1/8.16.1/Submit) id 19IDfPNk410153;
 Mon, 18 Oct 2021 15:41:25 +0200
To: Jeff Law <jeffreyalaw@gmail.com>,
 Richard Biener <richard.guenther@gmail.com>,
 GCC patches <gcc-patches@gcc.gnu.org>
Subject: [RFC] Remove VRP threader passes in exchange for better threading
 pre-VRP.
Date: Mon, 18 Oct 2021 15:41:17 +0200
Message-Id: <20211018134117.410106-1-aldyh@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Aldy Hernandez via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Aldy Hernandez <aldyh@redhat.com>
Reply-To: Aldy Hernandez <aldyh@redhat.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

The jump threading bits seem to have stabilized.  The one or two open
PRs can be fixed by the pending loop threading restrictions to loop
rotation and loop headers.  With all the pieces in play, we can
finally explore altering the pipeline to reduce the jump threading
passes.

I know the jump threaders have become a confusing mess, and I have
added to this sphaghetti, but please bear with me, the goal is to
reduce the threaders to one code base.

As a quick birds-eye view, we have 2 engines:

1. The new backward threader using a path solver based on the ranger.
It can work in two modes-- a quick mode that assumes any SSA outside
the path is VARYING, and a fully resolving mode.

All the *.thread passes are running with this engine, but in quick
mode.

2. The old-old forward threader used by VRP and DOM.  The DOM client
uses its internal structures as well as evrp to resolve conditionals.
Whereas the VRP threader uses the engine in #1 in fully resolving
mode.

The VRP threaders are running with this engine, but using the solver
in #1.  That is, the VRP threaders use the old forward threader for
path discovery, but the new backward threader to solve candidate
paths (hybrid-threader).  This was always a stop-gap while we moved
everyone to #1.

The DOM threader is the last remaining threader with no changes
whatsoever from the previous release.  It uses the forward threader,
with all the evrp + DOM goo.

It doesn't matter if you're all confused, we're about to make things
simpler.

I've been experimenting with reducing the total number of threading
passes, and I'd like to see if there's consensus/stomach for altering
the pipeline.  Note, that the goal is to remove forward threader clients,
not the other way around.  So, we should prefer to remove a VRP threader
instance over a *.thread one immediately before VRP.

After some playing, it looks like if we enable fully-resolving mode in
the *.thread passes immediately preceeding VRP, we can remove the VRP
threading passes altogether, thus removing 2 threading passes (and
forward threading passes at that!).

The numbers look really good.  We get 6874 more jump threading passes
over my boostrap .ii files for a total 3.74% increase.  And we get that
while running marginally faster (0.19% faster, so noise).

The details are:

*** Mainline (with the loop rotation patch):
          ethread:64722
	  dom:31246
	  thread:73709
	  vrp-thread:14357
	  total:  184034

*** Removing all the VRP threaders.
         ethread:64722
         thread-full:76493
         dom:33648
         thread:16045
         total:  190908

Notice that not only do we get a lot more threads in thread-full
(resolving mode), but even DOM can get more jump threads.

This doesn't come without risks though.  The main issue is that we would
be removing one engine (forward threader), with another one (backward
threader).  But the good news is that (a) we've been using the new
backward threader for a while now (b) even the VRP threader in
mainline is using the backward threader solver.  So, all that would
really be changing would be the path discovery bits and custom copier
in the forward threader, with the backward threader bit and the
generic copier.

I personally don't think this is a big risk, because we've done all
the hard work already and it's all being stressed in one way or another.

The untested patch below is all that would need to happen, albeit with
copius changes to tests.

I'd like to see where we all stand on this before I start chugging away
at testing and other time consuming tasks.

Note, that all the relevant bits will still be tested in this release,
so I'm not gonna cry one way or another.  But it'd be nice to start
reducing passes, especially if we get a 3.74% increase in jump threads
for no time penalty.

Finally, even if we all agree, I think we should give us a week after the
loop rotation restrictions go in, because threading changes always cause
a party of unexpected things to happen.

Shoot!

diff --git a/gcc/passes.def b/gcc/passes.def
index c11c237f6d2..96fc230e780 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -210,9 +210,8 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_return_slot);
       NEXT_PASS (pass_fre, true /* may_iterate */);
       NEXT_PASS (pass_merge_phi);
-      NEXT_PASS (pass_thread_jumps);
+      NEXT_PASS (pass_thread_jumps_full);
       NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
-      NEXT_PASS (pass_vrp_threader);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_dce);
       /* pass_stdarg is always run and at this point we execute
@@ -336,9 +335,9 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_thread_jumps);
       NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
       NEXT_PASS (pass_strlen);
-      NEXT_PASS (pass_thread_jumps);
+      NEXT_PASS (pass_thread_jumps_full);
       NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
-      NEXT_PASS (pass_vrp_threader);
+      /* ?? Is this still needed.  ?? */
       /* Threading can leave many const/copy propagations in the IL.
 	 Clean them up.  Instead of just copy_prop, we use ccp to
 	 compute alignment and nonzero bits.  */

gcc/ChangeLog:

	* passes.def: Replace pass_thread_jumps that happen before VRP
	with pass_thread_jumps_full.  Remove pass_vrp_threader passes.
---
 gcc/passes.def | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index c11c237f6d2..96fc230e780 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -210,9 +210,8 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_return_slot);
       NEXT_PASS (pass_fre, true /* may_iterate */);
       NEXT_PASS (pass_merge_phi);
-      NEXT_PASS (pass_thread_jumps);
+      NEXT_PASS (pass_thread_jumps_full);
       NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
-      NEXT_PASS (pass_vrp_threader);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_dce);
       /* pass_stdarg is always run and at this point we execute
@@ -336,9 +335,9 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_thread_jumps);
       NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
       NEXT_PASS (pass_strlen);
-      NEXT_PASS (pass_thread_jumps);
+      NEXT_PASS (pass_thread_jumps_full);
       NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
-      NEXT_PASS (pass_vrp_threader);
+      /* ?? Is this still needed.  ?? */
       /* Threading can leave many const/copy propagations in the IL.
 	 Clean them up.  Instead of just copy_prop, we use ccp to
 	 compute alignment and nonzero bits.  */