Message ID | 20211208055416.1415283-2-luoxhu@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D0BB43858D3C for <patchwork@sourceware.org>; Wed, 8 Dec 2021 05:56:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D0BB43858D3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638942975; bh=UiOQW7/Sz/EVFDL+vwVRTcsqLrmLXOq/81YhMyS2/48=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=L7tHToVAlWKM1RLEPwNsdmZEzkWxnB+8o8eMVWucwfcDCKG3cCJMYJ59cbzxm+XZY 137F9EP4IMcDaRbchQ+XB4heKUYeccfpSPtO5QFEaqiHtYKD8SmDBCNSUPA03sERnW MScl3D9ATd3kDYWGpr7lHRZFMOX5s8f7RnB/C1yw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8AA993858D3C; Wed, 8 Dec 2021 05:54:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8AA993858D3C Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1B84MSYx032750; Wed, 8 Dec 2021 05:54:49 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ctnmg9d5p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Dec 2021 05:54:49 +0000 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1B85snxr014842; Wed, 8 Dec 2021 05:54:49 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ctnmg9d57-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Dec 2021 05:54:48 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1B85rrk4029744; Wed, 8 Dec 2021 05:54:46 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma03ams.nl.ibm.com with ESMTP id 3cqyya44qd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Dec 2021 05:54:46 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1B85shJX31523114 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 8 Dec 2021 05:54:43 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4B9F752057; Wed, 8 Dec 2021 05:54:43 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id D971152059; Wed, 8 Dec 2021 05:54:41 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL Date: Tue, 7 Dec 2021 23:54:14 -0600 Message-Id: <20211208055416.1415283-2-luoxhu@linux.ibm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211208055416.1415283-1-luoxhu@linux.ibm.com> References: <20211208055416.1415283-1-luoxhu@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 4KwNhTYVQhiJnWL5kpx97OuVpB5FDQBQ X-Proofpoint-ORIG-GUID: SvS9dBh3-4SaBDSb5N3gpoeEpNXWvtiz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2021-12-08_01,2021-12-06_02,2021-12-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 lowpriorityscore=0 suspectscore=0 priorityscore=1501 clxscore=1015 mlxlogscore=811 phishscore=0 adultscore=0 malwarescore=0 spamscore=0 impostorscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2112080038 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Xionghu Luo via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Xionghu Luo <luoxhu@linux.ibm.com> Cc: segher@kernel.crashing.org, Xionghu Luo <luoxhu@linux.ibm.com>, hubicka@kam.mff.cuni.cz, wschmidt@linux.ibm.com, linkw@gcc.gnu.org, dje.gcc@gmail.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
Dependency patches for hoist LIM code to cold loop
|
|
Commit Message
Xionghu Luo
Dec. 8, 2021, 5:54 a.m. UTC
gcc/ChangeLog: * loop-invariant.c (find_invariants_bb): Check profile count before motion. (find_invariants_body): Add argument. --- gcc/loop-invariant.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
Comments
On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote: > gcc/ChangeLog: > > * loop-invariant.c (find_invariants_bb): Check profile count > before motion. > (find_invariants_body): Add argument. OK jeff
> gcc/ChangeLog: > > * loop-invariant.c (find_invariants_bb): Check profile count > before motion. > (find_invariants_body): Add argument. > --- > gcc/loop-invariant.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index 5eee2e5c9f8..c61c8612fae 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) > call. */ > > static void > -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) > +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, > + bool always_executed) > { > rtx_insn *insn; > + basic_block preheader = loop_preheader_edge (loop)->src; > + > + if (preheader->count > bb->count) > + return; Please add a comment explaining the conditional and if possible also a testcase. Since profile updating and use is sensitive topic and it may trigger regressions later, it is important to keep track of info why given tests was added. I wonder why the cost model chose to move any invariatns to preheader why preheader->count > bb->count? Honza > > FOR_BB_INSNS (bb, insn) > { > @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body, > unsigned i; > > for (i = 0; i < loop->num_nodes; i++) > - find_invariants_bb (body[i], > - bitmap_bit_p (always_reached, i), > + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i), > bitmap_bit_p (always_executed, i)); > } > > -- > 2.25.1 >
> > gcc/ChangeLog: > > > > * loop-invariant.c (find_invariants_bb): Check profile count > > before motion. > > (find_invariants_body): Add argument. > > --- > > gcc/loop-invariant.c | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > > index 5eee2e5c9f8..c61c8612fae 100644 > > --- a/gcc/loop-invariant.c > > +++ b/gcc/loop-invariant.c > > @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) > > call. */ > > > > static void > > -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) > > +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, > > + bool always_executed) > > { > > rtx_insn *insn; > > + basic_block preheader = loop_preheader_edge (loop)->src; > > + > > + if (preheader->count > bb->count) > > + return; > > Please add a comment explaining the conditional and if possible also a > testcase. Since profile updating and use is sensitive topic and it may > trigger regressions later, it is important to keep track of info why > given tests was added. > > I wonder why the cost model chose to move any invariatns to preheader > why preheader->count > bb->count? Thinking about this more, you want to test: if (!always_executed && preheader->count > bb->count) return; This will rule out some profile misupates. Also the code updating always_reached is worng: if (always_reached && CALL_P (insn) && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn) || ! RTL_CONST_OR_PURE_CALL_P (insn))) always_reached = false; It misses the case where statement can trhow external exception or volatile asms that can also terminate execution. I bit worry on how often the test will have false positives with guessed profile where earlier loop optimizations reduced trip count below realistic estimate. Honza > > Honza > > > > FOR_BB_INSNS (bb, insn) > > { > > @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body, > > unsigned i; > > > > for (i = 0; i < loop->num_nodes; i++) > > - find_invariants_bb (body[i], > > - bitmap_bit_p (always_reached, i), > > + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i), > > bitmap_bit_p (always_executed, i)); > > } > > > > -- > > 2.25.1 > >
On 2021/12/13 18:24, Jan Hubicka wrote: >>> gcc/ChangeLog: >>> >>> * loop-invariant.c (find_invariants_bb): Check profile count >>> before motion. >>> (find_invariants_body): Add argument. >>> --- >>> gcc/loop-invariant.c | 10 +++++++--- >>> 1 file changed, 7 insertions(+), 3 deletions(-) >>> >>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c >>> index 5eee2e5c9f8..c61c8612fae 100644 >>> --- a/gcc/loop-invariant.c >>> +++ b/gcc/loop-invariant.c >>> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) >>> call. */ >>> >>> static void >>> -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) >>> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, >>> + bool always_executed) >>> { >>> rtx_insn *insn; >>> + basic_block preheader = loop_preheader_edge (loop)->src; >>> + >>> + if (preheader->count > bb->count) >>> + return; >> >> Please add a comment explaining the conditional and if possible also a >> testcase. Since profile updating and use is sensitive topic and it may >> trigger regressions later, it is important to keep track of info why >> given tests was added. OK. Comments like? /* Don't move insn of cold BB out of loop to preheader to reduce calculations and register live range in hot loop with cold BB. */ And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant. --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, basic_block preheader = loop_preheader_edge (loop)->src; if (preheader->count > bb->count) - return; + { + if (dump_file) + fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n", + bb->index, loop->num); + return; + } This case could reflect the patch's effect: gcc/testsuite/gcc.dg/loop-invariant-2.c /* { dg-do compile } */ /* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */ volatile int x; void bar (int, char *, char *); void foo (int *a, int n, int k) { int i; for (i = 0; i < n; i++) { if (__builtin_expect (x, 0)) bar (k / 5, "one", "two"); a[i] = k; } } /* { dg-final { scan-rtl-dump-not "Decided to move invariant" "loop2_invariant" } } */ insn 27,28,29 was hoisted out of loop, with the count test patch, they are kept in loop body. diff -U15 base/ssa-lim-23.c.271r.loop2_invariant patched/ssa-lim-23.c.271r.loop2_invariant *****ending processing of loop 1 ****** -Set in insn 27 is invariant (0), cost 16, depends on -Set in insn 28 is invariant (1), cost 16, depends on -Set in insn 29 is invariant (2), cost 8, depends on -Set in insn 30 is invariant (3), cost 0, depends on 0 -Set in insn 31 is invariant (4), cost 0, depends on 1 -Set in insn 32 is invariant (5), cost 0, depends on 2 -Decided to move invariant 0 -- gain 16 -Decided to move invariant 1 -- gain 16 -Decided to move invariant 2 -- gain 8 -deferring rescan insn with uid = 27. -deferring rescan insn with uid = 30. -deferring rescan insn with uid = 61. -changing bb of uid 27 - from 5 to 3 -deferring rescan insn with uid = 28. -deferring rescan insn with uid = 31. -deferring rescan insn with uid = 62. -changing bb of uid 28 - from 5 to 3 -deferring rescan insn with uid = 29. -deferring rescan insn with uid = 32. -deferring rescan insn with uid = 63. -changing bb of uid 29 - from 5 to 3 starting the processing of deferred insns -rescanning insn with uid = 27. -rescanning insn with uid = 28. -rescanning insn with uid = 29. -rescanning insn with uid = 30. -rescanning insn with uid = 31. -rescanning insn with uid = 32. -rescanning insn with uid = 61. -rescanning insn with uid = 62. -rescanning insn with uid = 63. ending the processing of deferred insns starting the processing of deferred insns ending the processing of deferred insns ... 55: r138:DI=unspec[`*.LANCHOR0',%2:DI] 47 REG_EQUAL `*.LANCHOR0' - 27: r139:DI=unspec[`*.LC0',%2:DI] 47 - 28: r140:DI=unspec[`*.LC1',%2:DI] 47 - 29: r141:DI=sign_extend(r118:SI) 39: L39: 21: NOTE_INSN_BASIC_BLOCK 4 23: r117:SI=[r138:DI] 24: r133:CC=cmp(r117:SI,0) REG_DEAD r117:SI 25: pc={(r133:CC==0)?L34:pc} REG_DEAD r133:CC REG_BR_PROB 966367644 26: NOTE_INSN_BASIC_BLOCK 5 - 61: r134:DI=r139:DI - 62: r135:DI=r140:DI - 63: r136:DI=r141:DI - 30: %5:DI=r139:DI + 27: r134:DI=unspec[`*.LC0',%2:DI] 47 + REG_EQUAL `*.LC0' + 28: r135:DI=unspec[`*.LC1',%2:DI] 47 + REG_EQUAL `*.LC1' + 29: r136:DI=sign_extend(r118:SI) + 30: %5:DI=r134:DI REG_DEAD r134:DI REG_EQUAL `*.LC0' - 31: %4:DI=r140:DI + 31: %4:DI=r135:DI REG_DEAD r135:DI REG_EQUAL `*.LC1' - 32: %3:DI=r141:DI + 32: %3:DI=r136:DI REG_DEAD r136:DI 33: call [`bar'] argc:0 REG_DEAD %5:DI REG_DEAD %4:DI REG_DEAD %3:DI REG_CALL_DECL `bar' 34: L34: 35: NOTE_INSN_BASIC_BLOCK 6 >> >> I wonder why the cost model chose to move any invariatns to preheader >> why preheader->count > bb->count? > > Thinking about this more, you want to test: > if (!always_executed && preheader->count > bb->count) > return; > This will rule out some profile misupates. > > Also the code updating always_reached is worng: > if (always_reached > && CALL_P (insn) > && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn) > || ! RTL_CONST_OR_PURE_CALL_P (insn))) > always_reached = false; > It misses the case where statement can trhow external exception or > volatile asms that can also terminate execution. > > I bit worry on how often the test will have false positives with guessed > profile where earlier loop optimizations reduced trip count below > realistic estimate. Sorry I don't understand why always_executed and always_reached matters here? the test is based on BB before the FOR_BB_INSNS loop... > > Honza >> >> Honza >>> >>> FOR_BB_INSNS (bb, insn) >>> { >>> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body, >>> unsigned i; >>> >>> for (i = 0; i < loop->num_nodes; i++) >>> - find_invariants_bb (body[i], >>> - bitmap_bit_p (always_reached, i), >>> + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i), >>> bitmap_bit_p (always_executed, i)); >>> } >>> >>> -- >>> 2.25.1 >>>
> > OK. Comments like? > > /* Don't move insn of cold BB out of loop to preheader to reduce calculations > and register live range in hot loop with cold BB. */ Looks good. > > > And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant. > > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, > basic_block preheader = loop_preheader_edge (loop)->src; > > if (preheader->count > bb->count) > - return; > + { > + if (dump_file) > + fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n", > + bb->index, loop->num); > + return; > + } This is also a good idea - testcases are quite important for this typo of stuff. > > > > Thinking about this more, you want to test: > > if (!always_executed && preheader->count > bb->count) > > return; > > This will rule out some profile misupates. > > > > Also the code updating always_reached is worng: > > if (always_reached > > && CALL_P (insn) > > && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn) > > || ! RTL_CONST_OR_PURE_CALL_P (insn))) > > always_reached = false; > > It misses the case where statement can trhow external exception or > > volatile asms that can also terminate execution. > > > > I bit worry on how often the test will have false positives with guessed > > profile where earlier loop optimizations reduced trip count below > > realistic estimate. > > Sorry I don't understand why always_executed and always_reached matters here? > the test is based on BB before the FOR_BB_INSNS loop... always_executed is useful here to avoid the situation where bb->count or preheader->count is wrong and it would disable wanted transformation. always_executed is just a bug I noticed while looking at the code. I will produce testcase for bugzilla. Honza
On 2021/12/16 19:20, Jan Hubicka wrote: >> >> OK. Comments like? >> >> /* Don't move insn of cold BB out of loop to preheader to reduce calculations >> and register live range in hot loop with cold BB. */ > > Looks good. >> >> >> And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant. >> >> --- a/gcc/loop-invariant.c >> +++ b/gcc/loop-invariant.c >> @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, >> basic_block preheader = loop_preheader_edge (loop)->src; >> >> if (preheader->count > bb->count) >> - return; >> + { >> + if (dump_file) >> + fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n", >> + bb->index, loop->num); >> + return; >> + } > > This is also a good idea - testcases are quite important for this typo > of stuff. >>> >>> Thinking about this more, you want to test: >>> if (!always_executed && preheader->count > bb->count) >>> return; >>> This will rule out some profile misupates. >>> >>> Also the code updating always_reached is worng: >>> if (always_reached >>> && CALL_P (insn) >>> && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn) >>> || ! RTL_CONST_OR_PURE_CALL_P (insn))) >>> always_reached = false; >>> It misses the case where statement can trhow external exception or >>> volatile asms that can also terminate execution. >>> >>> I bit worry on how often the test will have false positives with guessed >>> profile where earlier loop optimizations reduced trip count below >>> realistic estimate. >> >> Sorry I don't understand why always_executed and always_reached matters here? >> the test is based on BB before the FOR_BB_INSNS loop... > > always_executed is useful here to avoid the situation where bb->count or > preheader->count is wrong and it would disable wanted transformation. > > always_executed is just a bug I noticed while looking at the code. I > will produce testcase for bugzilla. > > Honza Thanks, so is this OK to commit now? And any additional comments for the "[PATCH 3/3] Fix loop split incorrect count and probability" (https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586374.html)? Updated comments and testcase: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL From: Xiong Hu Luo <luoxhu@linux.ibm.com> gcc/ChangeLog: * loop-invariant.c (find_invariants_bb): Check profile count before motion. (find_invariants_body): Add argument. gcc/testsuite/ChangeLog: * gcc.dg/loop-invariant-2.c: New. --- gcc/loop-invariant.c | 17 ++++++++++++++--- gcc/testsuite/gcc.dg/loop-invariant-2.c | 20 ++++++++++++++++++++ 2 files changed, 34 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/loop-invariant-2.c diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index fca0c2b24be..690f7704a0b 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1183,9 +1183,21 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) call. */ static void -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, + bool always_executed) { rtx_insn *insn; + basic_block preheader = loop_preheader_edge (loop)->src; + + /* Don't move insn of cold BB out of loop to preheader to reduce calculations + and register live range in hot loop with cold BB. */ + if (!always_executed && preheader->count > bb->count) + { + if (dump_file) + fprintf (dump_file, "Don't move invariant from bb: %d out of loop %d\n", + bb->index, loop->num); + return; + } FOR_BB_INSNS (bb, insn) { @@ -1214,8 +1226,7 @@ find_invariants_body (class loop *loop, basic_block *body, unsigned i; for (i = 0; i < loop->num_nodes; i++) - find_invariants_bb (body[i], - bitmap_bit_p (always_reached, i), + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i), bitmap_bit_p (always_executed, i)); } diff --git a/gcc/testsuite/gcc.dg/loop-invariant-2.c b/gcc/testsuite/gcc.dg/loop-invariant-2.c new file mode 100644 index 00000000000..df3d8458569 --- /dev/null +++ b/gcc/testsuite/gcc.dg/loop-invariant-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */ + +volatile int x; +void +bar (int, char *, char *); +void +foo (int *a, int n, int k) +{ + int i; + + for (i = 0; i < n; i++) + { + if (__builtin_expect (x, 0)) + bar (k / 5, "one", "two"); + a[i] = k; + } +} + +/* { dg-final { scan-rtl-dump "Don't move invariant from bb: .*out of loop" "loop2_invariant" } } */
On 2021/12/17 09:30, Xionghu Luo via Gcc-patches wrote: > > > On 2021/12/16 19:20, Jan Hubicka wrote: >>> >>> OK. Comments like? >>> >>> /* Don't move insn of cold BB out of loop to preheader to reduce calculations >>> and register live range in hot loop with cold BB. */ >> >> Looks good. >>> >>> >>> And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant. >>> >>> --- a/gcc/loop-invariant.c >>> +++ b/gcc/loop-invariant.c >>> @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, >>> basic_block preheader = loop_preheader_edge (loop)->src; >>> >>> if (preheader->count > bb->count) >>> - return; >>> + { >>> + if (dump_file) >>> + fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n", >>> + bb->index, loop->num); >>> + return; >>> + } >> >> This is also a good idea - testcases are quite important for this typo >> of stuff. >>>> >>>> Thinking about this more, you want to test: >>>> if (!always_executed && preheader->count > bb->count) >>>> return; >>>> This will rule out some profile misupates. >>>> >>>> Also the code updating always_reached is worng: >>>> if (always_reached >>>> && CALL_P (insn) >>>> && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn) >>>> || ! RTL_CONST_OR_PURE_CALL_P (insn))) >>>> always_reached = false; >>>> It misses the case where statement can trhow external exception or >>>> volatile asms that can also terminate execution. >>>> >>>> I bit worry on how often the test will have false positives with guessed >>>> profile where earlier loop optimizations reduced trip count below >>>> realistic estimate. >>> >>> Sorry I don't understand why always_executed and always_reached matters here? >>> the test is based on BB before the FOR_BB_INSNS loop... >> >> always_executed is useful here to avoid the situation where bb->count or >> preheader->count is wrong and it would disable wanted transformation. >> >> always_executed is just a bug I noticed while looking at the code. I >> will produce testcase for bugzilla. >> >> Honza > > > Thanks, so is this OK to commit now? And any additional comments for > the "[PATCH 3/3] Fix loop split incorrect count and probability" > (https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586374.html)? > > > Updated comments and testcase: > > [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL This is the last patch for the "cold bb in hot loop optimization" I'd like commit soon if not other comments, to let it fully tested more broadly before stage4. Thanks. Regression tested pass on powerpc64le-linux-gnu {P10,P9}, powerpc64-linux-gnu {P8, P7} and X86 though. > > From: Xiong Hu Luo <luoxhu@linux.ibm.com> > > gcc/ChangeLog: > > * loop-invariant.c (find_invariants_bb): Check profile count > before motion. > (find_invariants_body): Add argument. > > gcc/testsuite/ChangeLog: > > * gcc.dg/loop-invariant-2.c: New. > --- > gcc/loop-invariant.c | 17 ++++++++++++++--- > gcc/testsuite/gcc.dg/loop-invariant-2.c | 20 ++++++++++++++++++++ > 2 files changed, 34 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/loop-invariant-2.c > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index fca0c2b24be..690f7704a0b 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1183,9 +1183,21 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) > call. */ > > static void > -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) > +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, > + bool always_executed) > { > rtx_insn *insn; > + basic_block preheader = loop_preheader_edge (loop)->src; > + > + /* Don't move insn of cold BB out of loop to preheader to reduce calculations > + and register live range in hot loop with cold BB. */ > + if (!always_executed && preheader->count > bb->count) > + { > + if (dump_file) > + fprintf (dump_file, "Don't move invariant from bb: %d out of loop %d\n", > + bb->index, loop->num); > + return; > + } > > FOR_BB_INSNS (bb, insn) > { > @@ -1214,8 +1226,7 @@ find_invariants_body (class loop *loop, basic_block *body, > unsigned i; > > for (i = 0; i < loop->num_nodes; i++) > - find_invariants_bb (body[i], > - bitmap_bit_p (always_reached, i), > + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i), > bitmap_bit_p (always_executed, i)); > } > > diff --git a/gcc/testsuite/gcc.dg/loop-invariant-2.c b/gcc/testsuite/gcc.dg/loop-invariant-2.c > new file mode 100644 > index 00000000000..df3d8458569 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/loop-invariant-2.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */ > + > +volatile int x; > +void > +bar (int, char *, char *); > +void > +foo (int *a, int n, int k) > +{ > + int i; > + > + for (i = 0; i < n; i++) > + { > + if (__builtin_expect (x, 0)) > + bar (k / 5, "one", "two"); > + a[i] = k; > + } > +} > + > +/* { dg-final { scan-rtl-dump "Don't move invariant from bb: .*out of loop" "loop2_invariant" } } */
> > > > From: Xiong Hu Luo <luoxhu@linux.ibm.com> > > > > gcc/ChangeLog: > > > > * loop-invariant.c (find_invariants_bb): Check profile count > > before motion. > > (find_invariants_body): Add argument. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/loop-invariant-2.c: New. OK, thanks! Honza
On 2021/12/29 20:55, Jan Hubicka wrote: >>> >>> From: Xiong Hu Luo <luoxhu@linux.ibm.com> >>> >>> gcc/ChangeLog: >>> >>> * loop-invariant.c (find_invariants_bb): Check profile count >>> before motion. >>> (find_invariants_body): Add argument. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.dg/loop-invariant-2.c: New. > OK, > thanks! > Honza Thanks, committed to r12-6149.
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 5eee2e5c9f8..c61c8612fae 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) call. */ static void -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached, + bool always_executed) { rtx_insn *insn; + basic_block preheader = loop_preheader_edge (loop)->src; + + if (preheader->count > bb->count) + return; FOR_BB_INSNS (bb, insn) { @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body, unsigned i; for (i = 0; i < loop->num_nodes; i++) - find_invariants_bb (body[i], - bitmap_bit_p (always_reached, i), + find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i), bitmap_bit_p (always_executed, i)); }