| Message ID | 20250916043304.2212685-1-andrew.pinski@oss.qualcomm.com |
|---|---|
| State | New |
| Headers |
Return-Path: <gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B7C2B3857358 for <patchwork@sourceware.org>; Tue, 16 Sep 2025 04:34:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B7C2B3857358 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=qualcomm.com header.i=@qualcomm.com header.a=rsa-sha256 header.s=qcppdkim1 header.b=Yf+m+jiD X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by sourceware.org (Postfix) with ESMTPS id 3E6273857706 for <gcc-patches@gcc.gnu.org>; Tue, 16 Sep 2025 04:33:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3E6273857706 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=oss.qualcomm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=oss.qualcomm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3E6273857706 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757997204; cv=none; b=F5/FaK6hzVckPSFsh1tDrd0i726KzPKq+Eb8lQfgpQIm2NFsgE3+CtmQYb2iYhD+P0TU0cle8gYyLRUARy+IIWbp0b9Mxjk04cnfvbITHsVj+U7LYW1LxbbjKsKQFDhjOeDuUq3MLDZBavfkhBEIUva0EbwhvXgl064kQEQosJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757997204; c=relaxed/simple; bh=O/AXXXMVPgwsRYOymq3zRsaDh7HSf8Tenew/8QnyUGg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=XM/718/X7VnT+daybVLb5rWvOBbbA43G+HuqUbKK1iB1AmNvoTP2xGe218eZLZSAQMWrtC3h4Dl666Zjrstwvrv0T57Bvv6GZb1ExQBduxDfNxV6caudtjdezHXqMKnDsg1ZoUeEUxqHbzhhbo3FO3Wk3ASJVHs1OOBtezvxy9A= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3E6273857706 Received: from pps.filterd (m0279871.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 58G3qiRO014551 for <gcc-patches@gcc.gnu.org>; Tue, 16 Sep 2025 04:33:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:date:from:message-id:mime-version :subject:to; s=qcppdkim1; bh=Pk89YnEl1LrR/NSdjpOCnmQerBOT0bgOnUl 1kzG9SG4=; b=Yf+m+jiDFq5KztNFIyZOCh4Eihl8EYBZ9OiR2/QQ4WmWIjSSdK+ KUeSLTADu+SiaMzDQrGRGn5U8U6UoR0w2GM3xHB2BAjzFH8Ya63xErWJXHUYQDBH H5lmmq2k+2vsBmxwKGuzBZ7lTH+X+KStl1LrxrGAiaLrDy52OMuf9yik0d+HSHU7 8PQYGbZi/obI22Ubbd8UXUvNngp0cjb/31QI/4viXHXkG3fdzno8UAOUZWIerY8B s0O0QKrZ4q8CtppK8u7xm5qg8Utr9p/egRdAQsPNLb86EfqkzEoW27SlJycOUrx1 bD1vx1HE/i5r9CsIQyrn4cSA60TjixOvyHA== Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 495072qcxs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <gcc-patches@gcc.gnu.org>; Tue, 16 Sep 2025 04:33:23 +0000 (GMT) Received: by mail-pl1-f197.google.com with SMTP id d9443c01a7336-248d9301475so74059605ad.0 for <gcc-patches@gcc.gnu.org>; Mon, 15 Sep 2025 21:33:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757997202; x=1758602002; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Pk89YnEl1LrR/NSdjpOCnmQerBOT0bgOnUl1kzG9SG4=; b=AZ+AaH/avYGXIzAe38ZR5w+l/+ebh+iK8ddel4vBsDP0MtW1UnRw0hRPG2YXeav6oc x5ak44nl0sK2wJ6vUqFqJRvbPl4uO37fU0aVmEm4F9CYXyfZJIzMJ0dGI64nTUP1pWB1 PLgUUlfzEKryA81HH3W9MzTRptwStk4I0XHRwWa++37EdKwBggL13jewzMsUVc25gcnC NEUbsuC/LOLkIdckkF8153jpmlt6ojMa2rT//mHQM7kF5dvyqIpYaX3Em7DUnPzmqWjx Da/Vhg3fzyE+JCAw63+1ojpoa4tbaAXpahmWDf3lQOkPbx66+xY4A1H2ZzGEgcvP6YlH JDCg== X-Gm-Message-State: AOJu0YxxWTezndMlYzw4mF4fwq2smDxwz9rB62st5veDgVmCpnt/VCEo JDmgVPcR8BZ4aw0aVBvuSpMP+U9aIJ6NO+crW5BGPLqt+k6dGK1q3X74PmtumqBDIbDnbFrQ9Oz KgEuigPlUzwMkRnfBx1V/pISH7ec3rjGBxa+9auhaFGaDl+VNMPGD0EUu+HaiQrsFkWLF X-Gm-Gg: ASbGncsuNvgOY39kY3dm/vUa6PqEX8/FzLo8N5U58jxZ2ULv6Wvzm8xuZIXndHAHMF3 q4gnIzXXavdLlOaOTLru7j/OVzl2Po0AOxPLlYW5Rvqg+9hXAO4cnlAd+sAaphSn8zhxFbV2iia e6At6zGW1jEZWrWlPUnt6yMDI4SZcF+la5CRcyy9qHcBEjavNWtH5JM/BRVuFf1/6SN3WwquWG/ q0Smb0woai47RRv3SYg3mXiykgF/sBOmOUzjIz1nipyFubswDjj4aOQTpGL1+Hvuq6yPf7s6+F0 oSbUL0cXxBa8w6zaZxV6I5EimhP52uT8Zh5qn3KxgV2OQ9SEeFvLAApt1QmqjmGkx/8= X-Received: by 2002:a17:903:1aa3:b0:250:6d0e:1e50 with SMTP id d9443c01a7336-25d24da752bmr206893265ad.23.1757997201711; Mon, 15 Sep 2025 21:33:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE/mFj50qMeDcm5AZCdyRislOANo9Mt3VH+vqFOHRp74ALHSI7ptnZ+PP7aMgAmW+xvOZ7ktA== X-Received: by 2002:a17:903:1aa3:b0:250:6d0e:1e50 with SMTP id d9443c01a7336-25d24da752bmr206892525ad.23.1757997200890; Mon, 15 Sep 2025 21:33:20 -0700 (PDT) Received: from xeond2.wrightpinski.org ([98.97.32.112]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-267c5e5dfb6sm9964755ad.125.2025.09.15.21.33.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Sep 2025 21:33:20 -0700 (PDT) From: Andrew Pinski <andrew.pinski@oss.qualcomm.com> To: gcc-patches@gcc.gnu.org Cc: Andrew Pinski <andrew.pinski@oss.qualcomm.com> Subject: [PATCH] forwprop: Add a simple DSE after a clobber Date: Mon, 15 Sep 2025 21:33:04 -0700 Message-ID: <20250916043304.2212685-1-andrew.pinski@oss.qualcomm.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwOTEzMDAyNSBTYWx0ZWRfXz97Q2vssn0fZ XPUyVAbL0TsM5Y7eIn2qXGz7nN9ahmMOZ30NYmN+1Gyf8R3RrHd/T5vAF5qJF6FGm1/BVxbF6aM DizHgLKj3by+h0sNrRqB03uFVWCnrtS0wfwyPO6UhBAkJHdYaldPTDpgtmK+FsIUuSyC8yPezTX Qb8sRNd2eYFPmwfNfSCgkR5+IRPANxPphnkvy416X2wrYdArRv1eB5Xzhdax+h4PwQUeFVec4Bw XGYnRo6OKsZQwb9eAuI2SuXd+ZmdiNut1mh4YHKdGXMewssPFhdiQOQT6vEt8qs6RJw1SXuUd8K qNQqRc8LnppCkKSDnQQ9Q2ulQI059UzPVvOxsU7QGO2PL4uwGwa0MTTAFOnoM+iuhaHpHrmpO8a 320jQXdG X-Proofpoint-GUID: QtRb-F_B6d2VT2BliFFtq_3K4tdJiA0i X-Authority-Analysis: v=2.4 cv=WcsMa1hX c=1 sm=1 tr=0 ts=68c8e893 cx=c_pps a=cmESyDAEBpBGqyK7t0alAg==:117 a=Qewv++OWYhgdhNLbbCvMAQ==:17 a=yJojWOMRYYMA:10 a=EUspDBNiAAAA:8 a=ihFhDOLbEHb6-fTr9ucA:9 a=1OuFwYUASf3TG4hYMiVC:22 X-Proofpoint-ORIG-GUID: QtRb-F_B6d2VT2BliFFtq_3K4tdJiA0i X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1117,Hydra:6.1.9,FMLib:17.12.80.40 definitions=2025-09-16_01,2025-09-12_01,2025-03-28_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 adultscore=0 bulkscore=0 spamscore=0 suspectscore=0 phishscore=0 clxscore=1015 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2507300000 definitions=main-2509130025 X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces~patchwork=sourceware.org@gcc.gnu.org |
| Series |
forwprop: Add a simple DSE after a clobber
|
|
Commit Message
Andrew Pinski
Sept. 16, 2025, 4:33 a.m. UTC
After copy propagation for aggregates patches we might end up with
now:
```
tmp = a;
b = a; // was b = tmp;
tmp = {CLOBBER};
```
To help out ESRA, it would be a good idea to remove the `tmp = a` statement as
there is no DSE between frowprop and ESRA. copy-prop-aggregate-sra-1.c is an example
where the removal of the copy helps ESRA.
This adds a simple DSE which is only designed to remove the `tmp = a` statement.
This shows up a few times in many C++ code including the code from the javascript
interpreter in ladybird, and in the "fake" testcase in PR 108653 and in the aarch64
specific PR 89967.
This is disabled for -Og as we don't do dse there either.
intent_optimize_10.f90 testcase needed to be updated as the constant
shows up in a debug statement now.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (do_simple_agr_dse): New function.
(pass_forwprop::execute): Call do_simple_agr_dse for clobbers.
gcc/testsuite/ChangeLog:
* gfortran.dg/intent_optimize_10.f90: Update so -g won't fail.
* gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c: New testcase.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
---
.../tree-ssa/copy-prop-aggregate-sra-1.c | 33 +++++++
.../gfortran.dg/intent_optimize_10.f90 | 2 +-
gcc/tree-ssa-forwprop.cc | 88 +++++++++++++++++++
3 files changed, 122 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c
Comments
On Tue, Sep 16, 2025 at 6:34 AM Andrew Pinski <andrew.pinski@oss.qualcomm.com> wrote: > > After copy propagation for aggregates patches we might end up with > now: > ``` > tmp = a; > b = a; // was b = tmp; > tmp = {CLOBBER}; > ``` > To help out ESRA, it would be a good idea to remove the `tmp = a` statement as > there is no DSE between frowprop and ESRA. copy-prop-aggregate-sra-1.c is an example > where the removal of the copy helps ESRA. Meh. You know, forwprop becomes a kitchen-sink. But well. Comments below. > This adds a simple DSE which is only designed to remove the `tmp = a` statement. > This shows up a few times in many C++ code including the code from the javascript > interpreter in ladybird, and in the "fake" testcase in PR 108653 and in the aarch64 > specific PR 89967. > > This is disabled for -Og as we don't do dse there either. > intent_optimize_10.f90 testcase needed to be updated as the constant > shows up in a debug statement now. > > Bootstrapped and tested on x86_64-linux-gnu. > > gcc/ChangeLog: > > * tree-ssa-forwprop.cc (do_simple_agr_dse): New function. > (pass_forwprop::execute): Call do_simple_agr_dse for clobbers. > > gcc/testsuite/ChangeLog: > > * gfortran.dg/intent_optimize_10.f90: Update so -g won't fail. > * gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c: New testcase. > > Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com> > --- > .../tree-ssa/copy-prop-aggregate-sra-1.c | 33 +++++++ > .../gfortran.dg/intent_optimize_10.f90 | 2 +- > gcc/tree-ssa-forwprop.cc | 88 +++++++++++++++++++ > 3 files changed, 122 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > new file mode 100644 > index 00000000000..baabecf5d27 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > @@ -0,0 +1,33 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fdump-tree-forwprop1-details -fdump-tree-esra-details " } */ > + > +struct s1 > +{ > + int t[1024]; > +}; > + > +struct s1 f(void); > + > +void g(int a, int b, int ); > +void p(struct s1); > +static inline void p1(struct s1 inner) > +{ > + p(inner); > +} > + > +void h(struct s1 outer) > +{ > + p1(outer); > + g(outer.t[0], outer.t[1], outer.t[2]); > +} > +/* Forwprop should be able to copy prop the copy of `inner = outer` to the call of p. > + Also remove this copy. */ > + > +/* { dg-final { scan-tree-dump-times "after previous" 1 "forwprop1" } } */ > +/* { dg-final { scan-tree-dump-times "Removing dead store stmt inner = outer" 1 "forwprop1" } } */ > + > +/* The extra copy that was done by inlining is removed so SRA should not decide to cause > + inner nor outer to be scalarized even for the 3 elements accessed afterwards. */ > +/* { dg-final { scan-tree-dump-times "Disqualifying inner" 2 "esra" } } */ > +/* { dg-final { scan-tree-dump-times "Disqualifying outer" 1 "esra" } } */ > + > diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > index d8bc1bb3b7b..214f04cf70d 100644 > --- a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > +++ b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > @@ -63,4 +63,4 @@ end program main > > ! There is a clobber for tc, so we should manage to optimize away the associated initialization constant (but not other > ! initialization constants). > -! { dg-final { scan-tree-dump-not "123456789" "optimized" { target __OPTIMIZE__ } } } > +! { dg-final { scan-tree-dump-not "= 123456789" "optimized" { target __OPTIMIZE__ } } } > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > index 1eacff01587..17592383e2b 100644 > --- a/gcc/tree-ssa-forwprop.cc > +++ b/gcc/tree-ssa-forwprop.cc > @@ -1708,6 +1708,89 @@ optimize_agr_copyprop (gimple_stmt_iterator *gsip) > return changed; > } > > +/* Simple DSE of the lhs from a clobber STMT. > + This is used mostly to clean up from optimize_agr_copyprop and > + to remove extra copy that might confuse SRA. */ > +static void > +do_simple_agr_dse (gassign *stmt, bool full_walk) > +{ > + /* Don't do this while in -Og. */ > + if (optimize_debug) > + return; > + ao_ref read; > + basic_block bb = gimple_bb (stmt); > + tree lhs = gimple_assign_lhs (stmt); > + if (!DECL_P (lhs)) > + return; > + ao_ref_init (&read, lhs); > + tree vuse = gimple_vuse (stmt); > + unsigned limit = full_walk ? param_sccvn_max_alias_queries_per_access : 2; > + while (limit) > + { > + gimple *ostmt = SSA_NAME_DEF_STMT (vuse); > + if (is_a<gphi*>(ostmt) || gimple_nop_p (ostmt)) > + break; > + basic_block obb = gimple_bb (ostmt); > + /* If the clobber is not fully dominating the statement define, > + then it is not "simple" to detect if the define is fully clobbered. */ > + if (obb != bb && !dominated_by_p (CDI_DOMINATORS, bb, obb)) > + return; > + /* Quickly see if the reference was used in previous statements. */ > + bool used = false; > + gimple *use_stmt; > + imm_use_iterator iter; > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, gimple_vdef (ostmt)) > + { > + basic_block ubb = gimple_bb (use_stmt); > + if (stmt == use_stmt) > + continue; > + /* The use needs to be dominating the clobber. */ you want to decrease limit here as well. This loop is going to be the most expensive part. > + if ((ubb != bb && !dominated_by_p (CDI_DOMINATORS, bb, ubb)) > + || ref_maybe_used_by_stmt_p (use_stmt, &read, false)) > + { > + used = true; > + break; > + } > + } > + if (used) > + break; > + vuse = gimple_vuse (ostmt); > + > + /* This a store to the clobbered decl, > + then remove it. */ > + if (is_a <gassign*>(ostmt) What about calls? > + && gimple_store_p (ostmt) > + && !gimple_clobber_p (ostmt) > + && operand_equal_p (lhs, gimple_assign_lhs (ostmt))) Since LHS is a decl a pointer compare would be enough here. > + { > + /* Don't remove statements that are needed for non-call > + eh to work. */ > + if (stmt_unremovable_because_of_non_call_eh_p (cfun, ostmt)) > + return; > + /* If we delete a stmt that could throw, mark the block > + in to_purge to cleanup afterwards. */ > + if (stmt_could_throw_p (cfun, ostmt)) stmt_can_throw_internal? I don't think you can elide throwing stmts, at least below you are removing the whole stmt, not just the LHS when this is a call. > + bitmap_set_bit (to_purge, obb->index); > + gimple_stmt_iterator gsi = gsi_for_stmt (ostmt); > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "Removing dead store stmt "); > + print_gimple_stmt (dump_file, ostmt, 0); > + fprintf (dump_file, "\n"); > + } > + unlink_stmt_vdef (ostmt); > + release_defs (ostmt); > + gsi_remove (&gsi, true); > + statistics_counter_event (cfun, "delete dead store", 1); So you remove at most exactly one store? > + break; > + } > + if (stmt_may_clobber_ref_p_1 (ostmt, &read, false) Why stop on clobbers? I'm not saying that's not OK, but I think this deserves a comment. > + || ref_maybe_used_by_stmt_p (ostmt, &read, false)) > + break; > + limit--; > + } > +} > + > /* Optimizes builtin memcmps for small constant sizes. > GSI_P is the GSI for the call. STMT is the call itself. > */ > @@ -5259,6 +5342,11 @@ pass_forwprop::execute (function *fun) > changed = true; > break; > } > + if (gimple_clobber_p (stmt)) > + { > + do_simple_agr_dse (as_a<gassign*>(stmt), full_walk); > + break; > + } > > if (TREE_CODE_CLASS (code) == tcc_comparison) > changed |= forward_propagate_into_comparison (&gsi); > -- > 2.43.0 >
On Tue, Sep 16, 2025 at 5:54 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Tue, Sep 16, 2025 at 6:34 AM Andrew Pinski > <andrew.pinski@oss.qualcomm.com> wrote: > > > > After copy propagation for aggregates patches we might end up with > > now: > > ``` > > tmp = a; > > b = a; // was b = tmp; > > tmp = {CLOBBER}; > > ``` > > To help out ESRA, it would be a good idea to remove the `tmp = a` statement as > > there is no DSE between frowprop and ESRA. copy-prop-aggregate-sra-1.c is an example > > where the removal of the copy helps ESRA. > > Meh. You know, forwprop becomes a kitchen-sink. But well. Comments below. Another option is to move both the copy propagation for aggregates (and a few other mem* optimizations which started to collect in forwprop since 2010) into its own pass. Run it after forwprop1 and once after forwprop2. Maybe once after the loop optimizations are done due to the ldist optimization. I hate adding a new pass too. The original 2 things forwprop was used for propagation of addresses and comparisons. Comparisons are almost all done by match-and-simplify; there are still some few that are not (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120206) Propagation of addresses: I wonder if there is a way to get ccp to do that instead; treat in some cases `a_N ptr+ CST` as a "constant" like how it does copy prop already too. So maybe my plan is to remove forwprop and change it into the memory based optimizations with a few other things thrown in. Plus as I mentioned in a different thread to move the funtionality of pass_fold_builtins and pass_optimize_widening_mul into other passes. We can talk more in person at Cauldron on these plans. I hope others can help with this cleanup and not just myself really. > > > This adds a simple DSE which is only designed to remove the `tmp = a` statement. > > This shows up a few times in many C++ code including the code from the javascript > > interpreter in ladybird, and in the "fake" testcase in PR 108653 and in the aarch64 > > specific PR 89967. > > > > This is disabled for -Og as we don't do dse there either. > > intent_optimize_10.f90 testcase needed to be updated as the constant > > shows up in a debug statement now. > > > > Bootstrapped and tested on x86_64-linux-gnu. > > > > gcc/ChangeLog: > > > > * tree-ssa-forwprop.cc (do_simple_agr_dse): New function. > > (pass_forwprop::execute): Call do_simple_agr_dse for clobbers. > > > > gcc/testsuite/ChangeLog: > > > > * gfortran.dg/intent_optimize_10.f90: Update so -g won't fail. > > * gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c: New testcase. > > > > Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com> > > --- > > .../tree-ssa/copy-prop-aggregate-sra-1.c | 33 +++++++ > > .../gfortran.dg/intent_optimize_10.f90 | 2 +- > > gcc/tree-ssa-forwprop.cc | 88 +++++++++++++++++++ > > 3 files changed, 122 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > new file mode 100644 > > index 00000000000..baabecf5d27 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > @@ -0,0 +1,33 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O1 -fdump-tree-forwprop1-details -fdump-tree-esra-details " } */ > > + > > +struct s1 > > +{ > > + int t[1024]; > > +}; > > + > > +struct s1 f(void); > > + > > +void g(int a, int b, int ); > > +void p(struct s1); > > +static inline void p1(struct s1 inner) > > +{ > > + p(inner); > > +} > > + > > +void h(struct s1 outer) > > +{ > > + p1(outer); > > + g(outer.t[0], outer.t[1], outer.t[2]); > > +} > > +/* Forwprop should be able to copy prop the copy of `inner = outer` to the call of p. > > + Also remove this copy. */ > > + > > +/* { dg-final { scan-tree-dump-times "after previous" 1 "forwprop1" } } */ > > +/* { dg-final { scan-tree-dump-times "Removing dead store stmt inner = outer" 1 "forwprop1" } } */ > > + > > +/* The extra copy that was done by inlining is removed so SRA should not decide to cause > > + inner nor outer to be scalarized even for the 3 elements accessed afterwards. */ > > +/* { dg-final { scan-tree-dump-times "Disqualifying inner" 2 "esra" } } */ > > +/* { dg-final { scan-tree-dump-times "Disqualifying outer" 1 "esra" } } */ > > + > > diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > index d8bc1bb3b7b..214f04cf70d 100644 > > --- a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > +++ b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > @@ -63,4 +63,4 @@ end program main > > > > ! There is a clobber for tc, so we should manage to optimize away the associated initialization constant (but not other > > ! initialization constants). > > -! { dg-final { scan-tree-dump-not "123456789" "optimized" { target __OPTIMIZE__ } } } > > +! { dg-final { scan-tree-dump-not "= 123456789" "optimized" { target __OPTIMIZE__ } } } > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > > index 1eacff01587..17592383e2b 100644 > > --- a/gcc/tree-ssa-forwprop.cc > > +++ b/gcc/tree-ssa-forwprop.cc > > @@ -1708,6 +1708,89 @@ optimize_agr_copyprop (gimple_stmt_iterator *gsip) > > return changed; > > } > > > > +/* Simple DSE of the lhs from a clobber STMT. > > + This is used mostly to clean up from optimize_agr_copyprop and > > + to remove extra copy that might confuse SRA. */ > > +static void > > +do_simple_agr_dse (gassign *stmt, bool full_walk) > > +{ > > + /* Don't do this while in -Og. */ > > + if (optimize_debug) > > + return; > > + ao_ref read; > > + basic_block bb = gimple_bb (stmt); > > + tree lhs = gimple_assign_lhs (stmt); > > + if (!DECL_P (lhs)) > > + return; > > + ao_ref_init (&read, lhs); > > + tree vuse = gimple_vuse (stmt); > > + unsigned limit = full_walk ? param_sccvn_max_alias_queries_per_access : 2; I should mention a limit of 2 really should be enough for most common cases. I had thought about changing it over to that fully. That is case where this shows up the most is with small wrapper functions which after inlining we get: ``` a = b; c = a; // or call(a); a = {CLOBBER}; ``` And copy prop changes the above to: ``` a = b; c = b; // or call(b); a = {CLOBBER}; ``` And we want to remove the stmt `a = b;` as SRA will scalarize a even though it is dead later on. Very similar to the reasons why you moved DSE earlier before SRA in GCC 12 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912#c6) . > > + while (limit) > > + { > > + gimple *ostmt = SSA_NAME_DEF_STMT (vuse); > > + if (is_a<gphi*>(ostmt) || gimple_nop_p (ostmt)) > > + break; > > + basic_block obb = gimple_bb (ostmt); > > + /* If the clobber is not fully dominating the statement define, > > + then it is not "simple" to detect if the define is fully clobbered. */ > > + if (obb != bb && !dominated_by_p (CDI_DOMINATORS, bb, obb)) > > + return; > > + /* Quickly see if the reference was used in previous statements. */ > > + bool used = false; > > + gimple *use_stmt; > > + imm_use_iterator iter; > > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, gimple_vdef (ostmt)) > > + { > > + basic_block ubb = gimple_bb (use_stmt); > > + if (stmt == use_stmt) > > + continue; > > + /* The use needs to be dominating the clobber. */ > > you want to decrease limit here as well. This loop is going to be the > most expensive part. True, let me look more into this. > > > + if ((ubb != bb && !dominated_by_p (CDI_DOMINATORS, bb, ubb)) > > + || ref_maybe_used_by_stmt_p (use_stmt, &read, false)) > > + { > > + used = true; > > + break; > > + } > > + } > > + if (used) > > + break; > > + vuse = gimple_vuse (ostmt); > > + > > + /* This a store to the clobbered decl, > > + then remove it. */ > > + if (is_a <gassign*>(ostmt) > > What about calls? I was trying to do as simple as possible DSE and the cases I wanted to remove was when there was a copy and that copy might cause SRA to do something off. > > > + && gimple_store_p (ostmt) > > + && !gimple_clobber_p (ostmt) > > + && operand_equal_p (lhs, gimple_assign_lhs (ostmt))) > > Since LHS is a decl a pointer compare would be enough here. True, will change it. > > > + { > > + /* Don't remove statements that are needed for non-call > > + eh to work. */ > > + if (stmt_unremovable_because_of_non_call_eh_p (cfun, ostmt)) > > + return; > > + /* If we delete a stmt that could throw, mark the block > > + in to_purge to cleanup afterwards. */ > > + if (stmt_could_throw_p (cfun, ostmt)) > > stmt_can_throw_internal? > > I don't think you can elide throwing stmts, at least below you are > removing the whole stmt, not just the LHS when this is a call. We can because of the check stmt_unremovable_because_of_non_call_eh_p above. This is the same check that DSE/DCE do (since r12-475-g8ebf6b99952ada). > > > + bitmap_set_bit (to_purge, obb->index); > > + gimple_stmt_iterator gsi = gsi_for_stmt (ostmt); > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + { > > + fprintf (dump_file, "Removing dead store stmt "); > > + print_gimple_stmt (dump_file, ostmt, 0); > > + fprintf (dump_file, "\n"); > > + } > > + unlink_stmt_vdef (ostmt); > > + release_defs (ostmt); > > + gsi_remove (&gsi, true); > > + statistics_counter_event (cfun, "delete dead store", 1); > > So you remove at most exactly one store? Yes, at most one store for each clobber. As I tried to mention this is designed to do a quick cleanup after a copy propagation of aggregates happens but before SRA. It is not to catch that much more. > > > + break; > > + } > > + if (stmt_may_clobber_ref_p_1 (ostmt, &read, false) > > Why stop on clobbers? I'm not saying that's not OK, but I think > this deserves a comment. Will add a comment. The idea is just not to worry about anything except for a copy into the decl. So stopping if the decl was clobbered we stop the walk as soon as possible. Thanks, Andrew > > > + || ref_maybe_used_by_stmt_p (ostmt, &read, false)) > > + break; > > + limit--; > > + } > > +} > > + > > /* Optimizes builtin memcmps for small constant sizes. > > GSI_P is the GSI for the call. STMT is the call itself. > > */ > > @@ -5259,6 +5342,11 @@ pass_forwprop::execute (function *fun) > > changed = true; > > break; > > } > > + if (gimple_clobber_p (stmt)) > > + { > > + do_simple_agr_dse (as_a<gassign*>(stmt), full_walk); > > + break; > > + } > > > > if (TREE_CODE_CLASS (code) == tcc_comparison) > > changed |= forward_propagate_into_comparison (&gsi); > > -- > > 2.43.0 > >
On Wed, Sep 17, 2025 at 2:40 AM Andrew Pinski <andrew.pinski@oss.qualcomm.com> wrote: > > On Tue, Sep 16, 2025 at 5:54 AM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Tue, Sep 16, 2025 at 6:34 AM Andrew Pinski > > <andrew.pinski@oss.qualcomm.com> wrote: > > > > > > After copy propagation for aggregates patches we might end up with > > > now: > > > ``` > > > tmp = a; > > > b = a; // was b = tmp; > > > tmp = {CLOBBER}; > > > ``` > > > To help out ESRA, it would be a good idea to remove the `tmp = a` statement as > > > there is no DSE between frowprop and ESRA. copy-prop-aggregate-sra-1.c is an example > > > where the removal of the copy helps ESRA. > > > > Meh. You know, forwprop becomes a kitchen-sink. But well. Comments below. > > > Another option is to move both the copy propagation for aggregates > (and a few other mem* optimizations which started to collect in > forwprop since 2010) into its own pass. > Run it after forwprop1 and once after forwprop2. Maybe once after the > loop optimizations are done due to the ldist optimization. My original idea was that aggregate copyprop fits best into SRA since it already performs that in it's own way but to a limited set of candidates. But then SRA should work flow-sensitive so it doesn't restrict itself as much on "global" TREE_ADDRESSABLE. SRA should work out memcpy and friends as well. It would probably make sense to write a new SRA from scratch with aggregate copyprop in mind though, so much work, which is why I acked the "smallish" thing in forwprop. > I hate adding a new pass too. Likewise, esp. for these kind of "imperfect" local things. > The original 2 things forwprop was used for propagation of addresses > and comparisons. > Comparisons are almost all done by match-and-simplify; there are still > some few that are not > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120206) > Propagation of addresses: I wonder if there is a way to get ccp to do > that instead; treat in some cases `a_N ptr+ CST` as a "constant" like > how it does copy prop already too. The lattice currently doesn't allow it, but sure. I also always wanted to get rid of copyprop (and now we have that scccopy thing as well), given CCP gets most of what copyprop does (and forwprop also does some copyprop). > So maybe my plan is to remove forwprop and change it into the memory > based optimizations with a few other things thrown in. But forwprop is what drives match.pd. But yes, it has "two parts" (we do two loops over each BB), dissecting that might make sense. > Plus as I mentioned in a different thread to move the funtionality of > pass_fold_builtins and pass_optimize_widening_mul into other passes. > We can talk more in person at Cauldron on these plans. I hope others > can help with this cleanup and not just myself really. Yeah. Writing up (passes.texi?) what "scalar optimization" passes we have and (wiki?) how we think this might be cleaned up would be nice. > > > > > This adds a simple DSE which is only designed to remove the `tmp = a` statement. > > > This shows up a few times in many C++ code including the code from the javascript > > > interpreter in ladybird, and in the "fake" testcase in PR 108653 and in the aarch64 > > > specific PR 89967. > > > > > > This is disabled for -Og as we don't do dse there either. > > > intent_optimize_10.f90 testcase needed to be updated as the constant > > > shows up in a debug statement now. > > > > > > Bootstrapped and tested on x86_64-linux-gnu. > > > > > > gcc/ChangeLog: > > > > > > * tree-ssa-forwprop.cc (do_simple_agr_dse): New function. > > > (pass_forwprop::execute): Call do_simple_agr_dse for clobbers. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gfortran.dg/intent_optimize_10.f90: Update so -g won't fail. > > > * gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c: New testcase. > > > > > > Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com> > > > --- > > > .../tree-ssa/copy-prop-aggregate-sra-1.c | 33 +++++++ > > > .../gfortran.dg/intent_optimize_10.f90 | 2 +- > > > gcc/tree-ssa-forwprop.cc | 88 +++++++++++++++++++ > > > 3 files changed, 122 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > > new file mode 100644 > > > index 00000000000..baabecf5d27 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c > > > @@ -0,0 +1,33 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O1 -fdump-tree-forwprop1-details -fdump-tree-esra-details " } */ > > > + > > > +struct s1 > > > +{ > > > + int t[1024]; > > > +}; > > > + > > > +struct s1 f(void); > > > + > > > +void g(int a, int b, int ); > > > +void p(struct s1); > > > +static inline void p1(struct s1 inner) > > > +{ > > > + p(inner); > > > +} > > > + > > > +void h(struct s1 outer) > > > +{ > > > + p1(outer); > > > + g(outer.t[0], outer.t[1], outer.t[2]); > > > +} > > > +/* Forwprop should be able to copy prop the copy of `inner = outer` to the call of p. > > > + Also remove this copy. */ > > > + > > > +/* { dg-final { scan-tree-dump-times "after previous" 1 "forwprop1" } } */ > > > +/* { dg-final { scan-tree-dump-times "Removing dead store stmt inner = outer" 1 "forwprop1" } } */ > > > + > > > +/* The extra copy that was done by inlining is removed so SRA should not decide to cause > > > + inner nor outer to be scalarized even for the 3 elements accessed afterwards. */ > > > +/* { dg-final { scan-tree-dump-times "Disqualifying inner" 2 "esra" } } */ > > > +/* { dg-final { scan-tree-dump-times "Disqualifying outer" 1 "esra" } } */ > > > + > > > diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > > index d8bc1bb3b7b..214f04cf70d 100644 > > > --- a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > > +++ b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 > > > @@ -63,4 +63,4 @@ end program main > > > > > > ! There is a clobber for tc, so we should manage to optimize away the associated initialization constant (but not other > > > ! initialization constants). > > > -! { dg-final { scan-tree-dump-not "123456789" "optimized" { target __OPTIMIZE__ } } } > > > +! { dg-final { scan-tree-dump-not "= 123456789" "optimized" { target __OPTIMIZE__ } } } > > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > > > index 1eacff01587..17592383e2b 100644 > > > --- a/gcc/tree-ssa-forwprop.cc > > > +++ b/gcc/tree-ssa-forwprop.cc > > > @@ -1708,6 +1708,89 @@ optimize_agr_copyprop (gimple_stmt_iterator *gsip) > > > return changed; > > > } > > > > > > +/* Simple DSE of the lhs from a clobber STMT. > > > + This is used mostly to clean up from optimize_agr_copyprop and > > > + to remove extra copy that might confuse SRA. */ > > > +static void > > > +do_simple_agr_dse (gassign *stmt, bool full_walk) > > > +{ > > > + /* Don't do this while in -Og. */ > > > + if (optimize_debug) > > > + return; > > > + ao_ref read; > > > + basic_block bb = gimple_bb (stmt); > > > + tree lhs = gimple_assign_lhs (stmt); > > > + if (!DECL_P (lhs)) > > > + return; > > > + ao_ref_init (&read, lhs); > > > + tree vuse = gimple_vuse (stmt); > > > + unsigned limit = full_walk ? param_sccvn_max_alias_queries_per_access : 2; > > I should mention a limit of 2 really should be enough for most common > cases. I had thought about changing it over to that fully. > That is case where this shows up the most is with small wrapper > functions which after inlining we get: > ``` > a = b; > c = a; // or call(a); > a = {CLOBBER}; > ``` > And copy prop changes the above to: > ``` > a = b; > c = b; // or call(b); > a = {CLOBBER}; > ``` > And we want to remove the stmt `a = b;` as SRA will scalarize a even > though it is dead later on. Very similar to the reasons why you moved > DSE earlier before SRA in GCC 12 > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99912#c6) . OK, please add a comment on how and why this is a limited DSE. > > > + while (limit) > > > + { > > > + gimple *ostmt = SSA_NAME_DEF_STMT (vuse); > > > + if (is_a<gphi*>(ostmt) || gimple_nop_p (ostmt)) > > > + break; > > > + basic_block obb = gimple_bb (ostmt); > > > + /* If the clobber is not fully dominating the statement define, > > > + then it is not "simple" to detect if the define is fully clobbered. */ > > > + if (obb != bb && !dominated_by_p (CDI_DOMINATORS, bb, obb)) > > > + return; > > > + /* Quickly see if the reference was used in previous statements. */ > > > + bool used = false; > > > + gimple *use_stmt; > > > + imm_use_iterator iter; > > > + FOR_EACH_IMM_USE_STMT (use_stmt, iter, gimple_vdef (ostmt)) > > > + { > > > + basic_block ubb = gimple_bb (use_stmt); > > > + if (stmt == use_stmt) > > > + continue; > > > + /* The use needs to be dominating the clobber. */ > > > > you want to decrease limit here as well. This loop is going to be the > > most expensive part. > > True, let me look more into this. > > > > > > + if ((ubb != bb && !dominated_by_p (CDI_DOMINATORS, bb, ubb)) > > > + || ref_maybe_used_by_stmt_p (use_stmt, &read, false)) > > > + { > > > + used = true; > > > + break; > > > + } > > > + } > > > + if (used) > > > + break; > > > + vuse = gimple_vuse (ostmt); > > > + > > > + /* This a store to the clobbered decl, > > > + then remove it. */ > > > + if (is_a <gassign*>(ostmt) > > > > What about calls? > I was trying to do as simple as possible DSE and the cases I wanted to > remove was when there was a copy and that copy might cause SRA to do > something off. Fair enough. > > > > > + && gimple_store_p (ostmt) > > > + && !gimple_clobber_p (ostmt) > > > + && operand_equal_p (lhs, gimple_assign_lhs (ostmt))) > > > > Since LHS is a decl a pointer compare would be enough here. > > True, will change it. > > > > > > + { > > > + /* Don't remove statements that are needed for non-call > > > + eh to work. */ > > > + if (stmt_unremovable_because_of_non_call_eh_p (cfun, ostmt)) > > > + return; > > > + /* If we delete a stmt that could throw, mark the block > > > + in to_purge to cleanup afterwards. */ > > > + if (stmt_could_throw_p (cfun, ostmt)) > > > > stmt_can_throw_internal? Still this. > > > > I don't think you can elide throwing stmts, at least below you are > > removing the whole stmt, not just the LHS when this is a call. > We can because of the check stmt_unremovable_because_of_non_call_eh_p above. > This is the same check that DSE/DCE do (since r12-475-g8ebf6b99952ada). It's probably OK for non-calls indeed. > > > > > + bitmap_set_bit (to_purge, obb->index); > > > + gimple_stmt_iterator gsi = gsi_for_stmt (ostmt); > > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > > + { > > > + fprintf (dump_file, "Removing dead store stmt "); > > > + print_gimple_stmt (dump_file, ostmt, 0); > > > + fprintf (dump_file, "\n"); > > > + } > > > + unlink_stmt_vdef (ostmt); > > > + release_defs (ostmt); > > > + gsi_remove (&gsi, true); > > > + statistics_counter_event (cfun, "delete dead store", 1); > > > > So you remove at most exactly one store? > > Yes, at most one store for each clobber. As I tried to mention this is > designed to do a quick cleanup after a copy propagation of aggregates > happens but before SRA. It is not to catch that much more. > > > > > > + break; > > > + } > > > + if (stmt_may_clobber_ref_p_1 (ostmt, &read, false) > > > > Why stop on clobbers? I'm not saying that's not OK, but I think > > this deserves a comment. > > Will add a comment. The idea is just not to worry about anything > except for a copy into the decl. So stopping if the decl was clobbered > we stop the walk as soon as possible. Thanks, it looked mostly OK apart from some missing comments and the minor adjustments. Richard. > Thanks, > Andrew > > > > > > + || ref_maybe_used_by_stmt_p (ostmt, &read, false)) > > > + break; > > > + limit--; > > > + } > > > +} > > > + > > > /* Optimizes builtin memcmps for small constant sizes. > > > GSI_P is the GSI for the call. STMT is the call itself. > > > */ > > > @@ -5259,6 +5342,11 @@ pass_forwprop::execute (function *fun) > > > changed = true; > > > break; > > > } > > > + if (gimple_clobber_p (stmt)) > > > + { > > > + do_simple_agr_dse (as_a<gassign*>(stmt), full_walk); > > > + break; > > > + } > > > > > > if (TREE_CODE_CLASS (code) == tcc_comparison) > > > changed |= forward_propagate_into_comparison (&gsi); > > > -- > > > 2.43.0 > > >
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c new file mode 100644 index 00000000000..baabecf5d27 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-forwprop1-details -fdump-tree-esra-details " } */ + +struct s1 +{ + int t[1024]; +}; + +struct s1 f(void); + +void g(int a, int b, int ); +void p(struct s1); +static inline void p1(struct s1 inner) +{ + p(inner); +} + +void h(struct s1 outer) +{ + p1(outer); + g(outer.t[0], outer.t[1], outer.t[2]); +} +/* Forwprop should be able to copy prop the copy of `inner = outer` to the call of p. + Also remove this copy. */ + +/* { dg-final { scan-tree-dump-times "after previous" 1 "forwprop1" } } */ +/* { dg-final { scan-tree-dump-times "Removing dead store stmt inner = outer" 1 "forwprop1" } } */ + +/* The extra copy that was done by inlining is removed so SRA should not decide to cause + inner nor outer to be scalarized even for the 3 elements accessed afterwards. */ +/* { dg-final { scan-tree-dump-times "Disqualifying inner" 2 "esra" } } */ +/* { dg-final { scan-tree-dump-times "Disqualifying outer" 1 "esra" } } */ + diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 index d8bc1bb3b7b..214f04cf70d 100644 --- a/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 +++ b/gcc/testsuite/gfortran.dg/intent_optimize_10.f90 @@ -63,4 +63,4 @@ end program main ! There is a clobber for tc, so we should manage to optimize away the associated initialization constant (but not other ! initialization constants). -! { dg-final { scan-tree-dump-not "123456789" "optimized" { target __OPTIMIZE__ } } } +! { dg-final { scan-tree-dump-not "= 123456789" "optimized" { target __OPTIMIZE__ } } } diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc index 1eacff01587..17592383e2b 100644 --- a/gcc/tree-ssa-forwprop.cc +++ b/gcc/tree-ssa-forwprop.cc @@ -1708,6 +1708,89 @@ optimize_agr_copyprop (gimple_stmt_iterator *gsip) return changed; } +/* Simple DSE of the lhs from a clobber STMT. + This is used mostly to clean up from optimize_agr_copyprop and + to remove extra copy that might confuse SRA. */ +static void +do_simple_agr_dse (gassign *stmt, bool full_walk) +{ + /* Don't do this while in -Og. */ + if (optimize_debug) + return; + ao_ref read; + basic_block bb = gimple_bb (stmt); + tree lhs = gimple_assign_lhs (stmt); + if (!DECL_P (lhs)) + return; + ao_ref_init (&read, lhs); + tree vuse = gimple_vuse (stmt); + unsigned limit = full_walk ? param_sccvn_max_alias_queries_per_access : 2; + while (limit) + { + gimple *ostmt = SSA_NAME_DEF_STMT (vuse); + if (is_a<gphi*>(ostmt) || gimple_nop_p (ostmt)) + break; + basic_block obb = gimple_bb (ostmt); + /* If the clobber is not fully dominating the statement define, + then it is not "simple" to detect if the define is fully clobbered. */ + if (obb != bb && !dominated_by_p (CDI_DOMINATORS, bb, obb)) + return; + /* Quickly see if the reference was used in previous statements. */ + bool used = false; + gimple *use_stmt; + imm_use_iterator iter; + FOR_EACH_IMM_USE_STMT (use_stmt, iter, gimple_vdef (ostmt)) + { + basic_block ubb = gimple_bb (use_stmt); + if (stmt == use_stmt) + continue; + /* The use needs to be dominating the clobber. */ + if ((ubb != bb && !dominated_by_p (CDI_DOMINATORS, bb, ubb)) + || ref_maybe_used_by_stmt_p (use_stmt, &read, false)) + { + used = true; + break; + } + } + if (used) + break; + vuse = gimple_vuse (ostmt); + + /* This a store to the clobbered decl, + then remove it. */ + if (is_a <gassign*>(ostmt) + && gimple_store_p (ostmt) + && !gimple_clobber_p (ostmt) + && operand_equal_p (lhs, gimple_assign_lhs (ostmt))) + { + /* Don't remove statements that are needed for non-call + eh to work. */ + if (stmt_unremovable_because_of_non_call_eh_p (cfun, ostmt)) + return; + /* If we delete a stmt that could throw, mark the block + in to_purge to cleanup afterwards. */ + if (stmt_could_throw_p (cfun, ostmt)) + bitmap_set_bit (to_purge, obb->index); + gimple_stmt_iterator gsi = gsi_for_stmt (ostmt); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Removing dead store stmt "); + print_gimple_stmt (dump_file, ostmt, 0); + fprintf (dump_file, "\n"); + } + unlink_stmt_vdef (ostmt); + release_defs (ostmt); + gsi_remove (&gsi, true); + statistics_counter_event (cfun, "delete dead store", 1); + break; + } + if (stmt_may_clobber_ref_p_1 (ostmt, &read, false) + || ref_maybe_used_by_stmt_p (ostmt, &read, false)) + break; + limit--; + } +} + /* Optimizes builtin memcmps for small constant sizes. GSI_P is the GSI for the call. STMT is the call itself. */ @@ -5259,6 +5342,11 @@ pass_forwprop::execute (function *fun) changed = true; break; } + if (gimple_clobber_p (stmt)) + { + do_simple_agr_dse (as_a<gassign*>(stmt), full_walk); + break; + } if (TREE_CODE_CLASS (code) == tcc_comparison) changed |= forward_propagate_into_comparison (&gsi);