From patchwork Tue Jan 16 01:12:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 84148 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A47A53858425 for ; Tue, 16 Jan 2024 01:13:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 656313858CDB for ; Tue, 16 Jan 2024 01:12:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 656313858CDB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 656313858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705367575; cv=none; b=DIl//lH73HXrMV4Nnq+tRrv84Q8CMhxgC0Z3wzrTLD8pNp9zdhI2s9Jfvd4C7c4iIQxqcEC+K7JOuq8oHeRtrckRQhM1e92uF9SRNAoJBx/+4jjJdQzclJxr2+CjH1za3cxYbd8ldkVag44jEfyHWLxorWdeDr1iF0Ji83u2NaA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705367575; c=relaxed/simple; bh=zDj8leStnJTmRVtvBmbss1lSK7K07b/GPQI7JYeUUMo=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Ck55Y+VucvShC2AS3sVYXRz1dU+VW8ROK5b9+QRLoS7C5nrqQjw7O3HVnC17h2uDEI4f9bu/n41cY1Sk4QZD+vw1OYk9fuxfrvipRwdFIv0A8JXJF2RAOzBVKC9UU2o9SE1JRsvpuGVFa04/lY/c20pg0t9LQkBLqhF5chShHcM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=OLs7eMT5mBqw+i0i31/mZxbU9trUa39fAHuUAPnSrFM=; b=S7q0n5F2cjkXPMTjGff7+izxhb 9Olh5hLQPq3sQQuzoaIe8o9fYYk6qigDiHnVOxBEc/yb+4SxFW3JPkLR/cyBpaknm4DYFJEoY3FRd MbfCQ+rTq0K7Huf8VqYtpOBDcn410ozZiAO/ScwW83KJwQ/sEFqaOcn7sKzGbb5flYbHfK3qZyh4Q aBmPpLVFvgLRq9fusbr8GHAnaspVuVvo3n6lA0L0Qg+jiS8Jwon6U150j03gxIS3sjmn3madosds/ hPvx2gieoZTbNj1+/gmq9ICEhHqKjc388/35WHXAgEuH54BoVAxC1XPjGPMPBhedWHvNG1crbLm9+ hl9t6u2Q==; Received: from host109-154-238-190.range109-154.btcentralplus.com ([109.154.238.190]:57709 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1rPY0a-00020z-1S for gcc-patches@gcc.gnu.org; Mon, 15 Jan 2024 20:12:52 -0500 From: "Roger Sayle" To: Subject: [PATCH] PR rtl-optimization/111267: Improved forward propagation. Date: Tue, 16 Jan 2024 01:12:50 -0000 Message-ID: <022001da4819$20d652b0$6282f810$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdpIGBMhH4R/7gtJRPuu7nKtqdalhg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org This patch resolves PR rtl-optimization/111267 by improving RTL-level forward propagation. This x86_64 code quality regression was caused (exposed) by my changes to improve how x86's (TImode) argument passing is represented at the RTL-level (reducing the use of SUBREGs to catch more optimization opportunities in combine). The pitfall is that the more complex RTL representations expose a limitation in RTL's fwprop pass. At the heart of fwprop, in try_fwprop_subst_pattern, the logic can be summarized as three steps. Step 1 is a heuristic that rejects the propagation attempt if the expression is too complex, step 2 calls the backend's recog to see if the propagated/simplified instruction is recognizable/valid, and step 3 then calls src_cost to compare the rtx costs of the replacement vs. the original, and accepts the transformation if the final cost is the same of better. The logic error (or missed optimization opportunity) is that the step 1 heuristic that attempts to predict (second guess) the process is flawed. Ultimately the decision on whether to fwprop or not should depend solely on actual improvement, as measured by RTX costs. Hence the prototype fix in the bugzilla PR removes the heuristic of calling prop.profitable_p entirely, relying entirely on the cost comparison in step 3. Unfortunately, things are a tiny bit more complicated. The cost comparison in fwprop uses the older set_src_cost API and not the newer (preffered) insn_cost API as currently used in combine. This means that the cost improvement comparisons are only done for single_set instructions (more complex PARALLELs etc. aren't supported). Hence we can only rely on skipping step 1 for that subset of instructions actually evaluated by step 3. The other subtlety is that to avoid potential infinite loops in fwprop we should only reply purely on rtx costs when the transformation is obviously an improvement. If the replacement has the same cost as the original, we can use the prop.profitable_p test to preserve the current behavior. Finally, to answer Richard Biener's remaining question about this approach: yes, there is an asymmetry between how patterns are handled and how REG_EQUAL notes are handled. For example, at the moment propagation into notes doesn't use rtx costs at all, and ultimately when fwprop is updated to use insn_cost, this (and recog) obviously isn't applicable to notes. There's no reason the logic need be identical between patterns and notes, and during stage4 we only need update propagation into patterns to fix this P1 regression (notes and use of cost_insn can be done for GCC 15). For Jakub's reduced testcase: struct S { float a, b, c, d; }; int bar (struct S x, struct S y) { return x.b <= y.d && x.c >= y.a; } On x86_64-pc-linux-gnu with -O2 gcc currently generates: bar: movq %xmm2, %rdx movq %xmm3, %rax movq %xmm0, %rsi xchgq %rdx, %rax movq %rsi, %rcx movq %rax, %rsi movq %rdx, %rax shrq $32, %rcx shrq $32, %rax movd %ecx, %xmm4 movd %eax, %xmm0 comiss %xmm4, %xmm0 jb .L6 movd %esi, %xmm0 xorl %eax, %eax comiss %xmm0, %xmm1 setnb %al ret .L6: xorl %eax, %eax ret with this simple patch to fwprop, we now generate: bar: shufps $85, %xmm0, %xmm0 shufps $85, %xmm3, %xmm3 comiss %xmm0, %xmm3 jb .L6 xorl %eax, %eax comiss %xmm2, %xmm1 setnb %al ret .L6: xorl %eax, %eax ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Additionally, it also resolves the FAIL for gcc.target/i386/pr82580.c. Ok for mainline? 2024-01-16 Roger Sayle gcc/ChangeLog PR rtl-optimization/111267 * fwprop.cc (try_fwprop_subst_pattern): Only bail-out early when !prop.profitable_p for instructions that are not single sets. When comparing costs, bail-out if the cost is unchanged and !prop.profitable_p. gcc/testsuite/ChangeLog PR rtl-optimization/111267 * gcc.target/i386/pr111267.c: New test case. Thanks in advance (and to Jeff Law for his guidance/help), Roger diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 0c588f8..f06225a 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -449,7 +449,10 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, if (prop.num_replacements == 0) return false; - if (!prop.profitable_p ()) + if (!prop.profitable_p () + && (prop.changed_mem_p () + || use_insn->is_asm () + || !single_set (use_rtl))) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "cannot propagate from insn %d into" @@ -481,7 +484,8 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, redo_changes (0); auto new_cost = set_src_cost (SET_SRC (use_set), GET_MODE (SET_DEST (use_set)), speed); - if (new_cost > old_cost) + if (new_cost > old_cost + || (new_cost == old_cost && !prop.profitable_p ())) { if (dump_file) fprintf (dump_file, "change not profitable" diff --git a/gcc/testsuite/gcc.target/i386/pr111267.c b/gcc/testsuite/gcc.target/i386/pr111267.c new file mode 100644 index 0000000..e3d549d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr111267.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +struct S { float a, b, c, d; }; + +int +bar (struct S x, struct S y) +{ + return x.b <= y.d && x.c >= y.a; +} + +/* { dg-final { scan-assembler-not "movq" } } */ +/* { dg-final { scan-assembler-not "xchgq" } } */ +/* { dg-final { scan-assembler-not "shrq" } } */ +/* { dg-final { scan-assembler-not "movd" } } */