From patchwork Mon Oct 18 13:41:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aldy Hernandez X-Patchwork-Id: 46348 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B8993858426 for ; Mon, 18 Oct 2021 13:42:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B8993858426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1634564525; bh=LYJIjnbav6y5zI+nLttBz7fFik5ygsOpCp2f2hLSYLE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=gTWiNbaCKnNCfiaGB5d2+CajVn+JFaHa4/MaLFAPThepizNHPRNLKk0CueA1Y75O/ Mg0aLzXVSXzsCJFULJn5D1cj1VRS0EfyLLk1fcOyyKqXe61Nmc7jgtIxgEKatnLTsv KSrMsKxcOg1/eZUjH8vjjmKF8TeiHGXwMa1ttAZw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id BB4153858400 for ; Mon, 18 Oct 2021 13:41:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BB4153858400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-338-Ck1gCX8BNbW0HNy6w2Kf6A-1; Mon, 18 Oct 2021 09:41:32 -0400 X-MC-Unique: Ck1gCX8BNbW0HNy6w2Kf6A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4F7D910151E0; Mon, 18 Oct 2021 13:41:31 +0000 (UTC) Received: from abulafia.quesejoda.com (unknown [10.39.193.104]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0EA4D5DF21; Mon, 18 Oct 2021 13:41:27 +0000 (UTC) Received: from abulafia.quesejoda.com (localhost [127.0.0.1]) by abulafia.quesejoda.com (8.16.1/8.15.2) with ESMTPS id 19IDfPSe410154 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 18 Oct 2021 15:41:26 +0200 Received: (from aldyh@localhost) by abulafia.quesejoda.com (8.16.1/8.16.1/Submit) id 19IDfPNk410153; Mon, 18 Oct 2021 15:41:25 +0200 To: Jeff Law , Richard Biener , GCC patches Subject: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP. Date: Mon, 18 Oct 2021 15:41:17 +0200 Message-Id: <20211018134117.410106-1-aldyh@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aldy Hernandez via Gcc-patches From: Aldy Hernandez Reply-To: Aldy Hernandez Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" The jump threading bits seem to have stabilized. The one or two open PRs can be fixed by the pending loop threading restrictions to loop rotation and loop headers. With all the pieces in play, we can finally explore altering the pipeline to reduce the jump threading passes. I know the jump threaders have become a confusing mess, and I have added to this sphaghetti, but please bear with me, the goal is to reduce the threaders to one code base. As a quick birds-eye view, we have 2 engines: 1. The new backward threader using a path solver based on the ranger. It can work in two modes-- a quick mode that assumes any SSA outside the path is VARYING, and a fully resolving mode. All the *.thread passes are running with this engine, but in quick mode. 2. The old-old forward threader used by VRP and DOM. The DOM client uses its internal structures as well as evrp to resolve conditionals. Whereas the VRP threader uses the engine in #1 in fully resolving mode. The VRP threaders are running with this engine, but using the solver in #1. That is, the VRP threaders use the old forward threader for path discovery, but the new backward threader to solve candidate paths (hybrid-threader). This was always a stop-gap while we moved everyone to #1. The DOM threader is the last remaining threader with no changes whatsoever from the previous release. It uses the forward threader, with all the evrp + DOM goo. It doesn't matter if you're all confused, we're about to make things simpler. I've been experimenting with reducing the total number of threading passes, and I'd like to see if there's consensus/stomach for altering the pipeline. Note, that the goal is to remove forward threader clients, not the other way around. So, we should prefer to remove a VRP threader instance over a *.thread one immediately before VRP. After some playing, it looks like if we enable fully-resolving mode in the *.thread passes immediately preceeding VRP, we can remove the VRP threading passes altogether, thus removing 2 threading passes (and forward threading passes at that!). The numbers look really good. We get 6874 more jump threading passes over my boostrap .ii files for a total 3.74% increase. And we get that while running marginally faster (0.19% faster, so noise). The details are: *** Mainline (with the loop rotation patch): ethread:64722 dom:31246 thread:73709 vrp-thread:14357 total: 184034 *** Removing all the VRP threaders. ethread:64722 thread-full:76493 dom:33648 thread:16045 total: 190908 Notice that not only do we get a lot more threads in thread-full (resolving mode), but even DOM can get more jump threads. This doesn't come without risks though. The main issue is that we would be removing one engine (forward threader), with another one (backward threader). But the good news is that (a) we've been using the new backward threader for a while now (b) even the VRP threader in mainline is using the backward threader solver. So, all that would really be changing would be the path discovery bits and custom copier in the forward threader, with the backward threader bit and the generic copier. I personally don't think this is a big risk, because we've done all the hard work already and it's all being stressed in one way or another. The untested patch below is all that would need to happen, albeit with copius changes to tests. I'd like to see where we all stand on this before I start chugging away at testing and other time consuming tasks. Note, that all the relevant bits will still be tested in this release, so I'm not gonna cry one way or another. But it'd be nice to start reducing passes, especially if we get a 3.74% increase in jump threads for no time penalty. Finally, even if we all agree, I think we should give us a week after the loop rotation restrictions go in, because threading changes always cause a party of unexpected things to happen. Shoot! diff --git a/gcc/passes.def b/gcc/passes.def index c11c237f6d2..96fc230e780 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -210,9 +210,8 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_return_slot); NEXT_PASS (pass_fre, true /* may_iterate */); NEXT_PASS (pass_merge_phi); - NEXT_PASS (pass_thread_jumps); + NEXT_PASS (pass_thread_jumps_full); NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */); - NEXT_PASS (pass_vrp_threader); NEXT_PASS (pass_dse); NEXT_PASS (pass_dce); /* pass_stdarg is always run and at this point we execute @@ -336,9 +335,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); NEXT_PASS (pass_strlen); - NEXT_PASS (pass_thread_jumps); + NEXT_PASS (pass_thread_jumps_full); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); - NEXT_PASS (pass_vrp_threader); + /* ?? Is this still needed. ?? */ /* Threading can leave many const/copy propagations in the IL. Clean them up. Instead of just copy_prop, we use ccp to compute alignment and nonzero bits. */ gcc/ChangeLog: * passes.def: Replace pass_thread_jumps that happen before VRP with pass_thread_jumps_full. Remove pass_vrp_threader passes. --- gcc/passes.def | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index c11c237f6d2..96fc230e780 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -210,9 +210,8 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_return_slot); NEXT_PASS (pass_fre, true /* may_iterate */); NEXT_PASS (pass_merge_phi); - NEXT_PASS (pass_thread_jumps); + NEXT_PASS (pass_thread_jumps_full); NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */); - NEXT_PASS (pass_vrp_threader); NEXT_PASS (pass_dse); NEXT_PASS (pass_dce); /* pass_stdarg is always run and at this point we execute @@ -336,9 +335,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); NEXT_PASS (pass_strlen); - NEXT_PASS (pass_thread_jumps); + NEXT_PASS (pass_thread_jumps_full); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); - NEXT_PASS (pass_vrp_threader); + /* ?? Is this still needed. ?? */ /* Threading can leave many const/copy propagations in the IL. Clean them up. Instead of just copy_prop, we use ccp to compute alignment and nonzero bits. */