[1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL

Message ID 20211208055416.1415283-2-luoxhu@linux.ibm.com
State New
Headers
Series Dependency patches for hoist LIM code to cold loop |

Commit Message

Xionghu Luo Dec. 8, 2021, 5:54 a.m. UTC
  gcc/ChangeLog:

	* loop-invariant.c (find_invariants_bb): Check profile count
	before motion.
	(find_invariants_body): Add argument.
---
 gcc/loop-invariant.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
  

Comments

Jeff Law Dec. 8, 2021, 11:26 p.m. UTC | #1
On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote:
> gcc/ChangeLog:
>
> 	* loop-invariant.c (find_invariants_bb): Check profile count
> 	before motion.
> 	(find_invariants_body): Add argument.
OK
jeff
  
Jan Hubicka Dec. 13, 2021, 9:14 a.m. UTC | #2
> gcc/ChangeLog:
> 
> 	* loop-invariant.c (find_invariants_bb): Check profile count
> 	before motion.
> 	(find_invariants_body): Add argument.
> ---
>  gcc/loop-invariant.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 5eee2e5c9f8..c61c8612fae 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
>     call.  */
>  
>  static void
> -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> +		    bool always_executed)
>  {
>    rtx_insn *insn;
> +  basic_block preheader = loop_preheader_edge (loop)->src;
> +
> +  if (preheader->count > bb->count)
> +    return;

Please add a comment explaining the conditional and if possible also a
testcase.  Since profile updating and use is sensitive topic and it may
trigger regressions later, it is important to keep track of info why
given tests was added.

I wonder why the cost model chose to move any invariatns to preheader
why preheader->count > bb->count?

Honza
>  
>    FOR_BB_INSNS (bb, insn)
>      {
> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
>    unsigned i;
>  
>    for (i = 0; i < loop->num_nodes; i++)
> -    find_invariants_bb (body[i],
> -			bitmap_bit_p (always_reached, i),
> +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>  			bitmap_bit_p (always_executed, i));
>  }
>  
> -- 
> 2.25.1
>
  
Jan Hubicka Dec. 13, 2021, 10:24 a.m. UTC | #3
> > gcc/ChangeLog:
> > 
> > 	* loop-invariant.c (find_invariants_bb): Check profile count
> > 	before motion.
> > 	(find_invariants_body): Add argument.
> > ---
> >  gcc/loop-invariant.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> > index 5eee2e5c9f8..c61c8612fae 100644
> > --- a/gcc/loop-invariant.c
> > +++ b/gcc/loop-invariant.c
> > @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
> >     call.  */
> >  
> >  static void
> > -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
> > +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> > +		    bool always_executed)
> >  {
> >    rtx_insn *insn;
> > +  basic_block preheader = loop_preheader_edge (loop)->src;
> > +
> > +  if (preheader->count > bb->count)
> > +    return;
> 
> Please add a comment explaining the conditional and if possible also a
> testcase.  Since profile updating and use is sensitive topic and it may
> trigger regressions later, it is important to keep track of info why
> given tests was added.
> 
> I wonder why the cost model chose to move any invariatns to preheader
> why preheader->count > bb->count?

Thinking about this more, you want to test:
  if (!always_executed && preheader->count > bb->count)
    return;
This will rule out some profile misupates.

Also the code updating always_reached is worng:
      if (always_reached
          && CALL_P (insn)
          && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
              || ! RTL_CONST_OR_PURE_CALL_P (insn)))
  always_reached = false;
It misses the case where statement can trhow external exception or
volatile asms that can also terminate execution.

I bit worry on how often the test will have false positives with guessed
profile where earlier loop optimizations reduced trip count below
realistic estimate.

Honza
> 
> Honza
> >  
> >    FOR_BB_INSNS (bb, insn)
> >      {
> > @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
> >    unsigned i;
> >  
> >    for (i = 0; i < loop->num_nodes; i++)
> > -    find_invariants_bb (body[i],
> > -			bitmap_bit_p (always_reached, i),
> > +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
> >  			bitmap_bit_p (always_executed, i));
> >  }
> >  
> > -- 
> > 2.25.1
> >
  
Xionghu Luo Dec. 14, 2021, 9:21 a.m. UTC | #4
On 2021/12/13 18:24, Jan Hubicka wrote:
>>> gcc/ChangeLog:
>>>
>>> 	* loop-invariant.c (find_invariants_bb): Check profile count
>>> 	before motion.
>>> 	(find_invariants_body): Add argument.
>>> ---
>>>  gcc/loop-invariant.c | 10 +++++++---
>>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>>> index 5eee2e5c9f8..c61c8612fae 100644
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
>>>     call.  */
>>>  
>>>  static void
>>> -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
>>> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>> +		    bool always_executed)
>>>  {
>>>    rtx_insn *insn;
>>> +  basic_block preheader = loop_preheader_edge (loop)->src;
>>> +
>>> +  if (preheader->count > bb->count)
>>> +    return;
>>
>> Please add a comment explaining the conditional and if possible also a
>> testcase.  Since profile updating and use is sensitive topic and it may
>> trigger regressions later, it is important to keep track of info why
>> given tests was added.
 
OK. Comments like?

/* Don't move insn of cold BB out of loop to preheader to reduce calculations
   and register live range in hot loop with cold BB.  */


And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.

--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
   basic_block preheader = loop_preheader_edge (loop)->src;

   if (preheader->count > bb->count)
-    return;
+    {
+      if (dump_file)
+       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
+                bb->index, loop->num);
+      return;
+    }


This case could reflect the patch's effect:


gcc/testsuite/gcc.dg/loop-invariant-2.c
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */

volatile int x;
void
bar (int, char *, char *);
void
foo (int *a, int n, int k)
{
  int i;

  for (i = 0; i < n; i++)
    {
      if (__builtin_expect (x, 0))
        bar (k / 5, "one", "two");
      a[i] = k;
    }
}

/* { dg-final { scan-rtl-dump-not "Decided to move invariant" "loop2_invariant" } } */


insn 27,28,29 was hoisted out of loop, with the count test patch, they are kept in
loop body.

 diff -U15 base/ssa-lim-23.c.271r.loop2_invariant patched/ssa-lim-23.c.271r.loop2_invariant

 *****ending processing of loop 1 ******
-Set in insn 27 is invariant (0), cost 16, depends on
-Set in insn 28 is invariant (1), cost 16, depends on
-Set in insn 29 is invariant (2), cost 8, depends on
-Set in insn 30 is invariant (3), cost 0, depends on 0
-Set in insn 31 is invariant (4), cost 0, depends on 1
-Set in insn 32 is invariant (5), cost 0, depends on 2
-Decided to move invariant 0 -- gain 16
-Decided to move invariant 1 -- gain 16
-Decided to move invariant 2 -- gain 8
-deferring rescan insn with uid = 27.
-deferring rescan insn with uid = 30.
-deferring rescan insn with uid = 61.
-changing bb of uid 27
-  from 5 to 3
-deferring rescan insn with uid = 28.
-deferring rescan insn with uid = 31.
-deferring rescan insn with uid = 62.
-changing bb of uid 28
-  from 5 to 3
-deferring rescan insn with uid = 29.
-deferring rescan insn with uid = 32.
-deferring rescan insn with uid = 63.
-changing bb of uid 29
-  from 5 to 3
 starting the processing of deferred insns
-rescanning insn with uid = 27.
-rescanning insn with uid = 28.
-rescanning insn with uid = 29.
-rescanning insn with uid = 30.
-rescanning insn with uid = 31.
-rescanning insn with uid = 32.
-rescanning insn with uid = 61.
-rescanning insn with uid = 62.
-rescanning insn with uid = 63.
 ending the processing of deferred insns
 starting the processing of deferred insns
 ending the processing of deferred insns

...

    55: r138:DI=unspec[`*.LANCHOR0',%2:DI] 47
       REG_EQUAL `*.LANCHOR0'
-   27: r139:DI=unspec[`*.LC0',%2:DI] 47
-   28: r140:DI=unspec[`*.LC1',%2:DI] 47
-   29: r141:DI=sign_extend(r118:SI)
    39: L39:
    21: NOTE_INSN_BASIC_BLOCK 4
    23: r117:SI=[r138:DI]
    24: r133:CC=cmp(r117:SI,0)
       REG_DEAD r117:SI
    25: pc={(r133:CC==0)?L34:pc}
       REG_DEAD r133:CC
       REG_BR_PROB 966367644
    26: NOTE_INSN_BASIC_BLOCK 5
-   61: r134:DI=r139:DI
-   62: r135:DI=r140:DI
-   63: r136:DI=r141:DI
-   30: %5:DI=r139:DI
+   27: r134:DI=unspec[`*.LC0',%2:DI] 47
+      REG_EQUAL `*.LC0'
+   28: r135:DI=unspec[`*.LC1',%2:DI] 47
+      REG_EQUAL `*.LC1'
+   29: r136:DI=sign_extend(r118:SI)
+   30: %5:DI=r134:DI
       REG_DEAD r134:DI
       REG_EQUAL `*.LC0'
-   31: %4:DI=r140:DI
+   31: %4:DI=r135:DI
       REG_DEAD r135:DI
       REG_EQUAL `*.LC1'
-   32: %3:DI=r141:DI
+   32: %3:DI=r136:DI
       REG_DEAD r136:DI
    33: call [`bar'] argc:0
       REG_DEAD %5:DI
       REG_DEAD %4:DI
       REG_DEAD %3:DI
       REG_CALL_DECL `bar'
    34: L34:
    35: NOTE_INSN_BASIC_BLOCK 6

>>
>> I wonder why the cost model chose to move any invariatns to preheader
>> why preheader->count > bb->count?
> 
> Thinking about this more, you want to test:
>   if (!always_executed && preheader->count > bb->count)
>     return;
> This will rule out some profile misupates.
> 
> Also the code updating always_reached is worng:
>       if (always_reached
>           && CALL_P (insn)
>           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
>   always_reached = false;
> It misses the case where statement can trhow external exception or
> volatile asms that can also terminate execution.
> 
> I bit worry on how often the test will have false positives with guessed
> profile where earlier loop optimizations reduced trip count below
> realistic estimate.

Sorry I don't understand why always_executed and always_reached matters here?
the test is based on BB before the FOR_BB_INSNS loop...

> 
> Honza
>>
>> Honza
>>>  
>>>    FOR_BB_INSNS (bb, insn)
>>>      {
>>> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block *body,
>>>    unsigned i;
>>>  
>>>    for (i = 0; i < loop->num_nodes; i++)
>>> -    find_invariants_bb (body[i],
>>> -			bitmap_bit_p (always_reached, i),
>>> +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>>>  			bitmap_bit_p (always_executed, i));
>>>  }
>>>  
>>> -- 
>>> 2.25.1
>>>
  
Jan Hubicka Dec. 16, 2021, 11:20 a.m. UTC | #5
>  
> OK. Comments like?
> 
> /* Don't move insn of cold BB out of loop to preheader to reduce calculations
>    and register live range in hot loop with cold BB.  */

Looks good.
> 
> 
> And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.
> 
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>    basic_block preheader = loop_preheader_edge (loop)->src;
> 
>    if (preheader->count > bb->count)
> -    return;
> +    {
> +      if (dump_file)
> +       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
> +                bb->index, loop->num);
> +      return;
> +    }

This is also a good idea - testcases are quite important for this typo
of stuff.
> > 
> > Thinking about this more, you want to test:
> >   if (!always_executed && preheader->count > bb->count)
> >     return;
> > This will rule out some profile misupates.
> > 
> > Also the code updating always_reached is worng:
> >       if (always_reached
> >           && CALL_P (insn)
> >           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
> >               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
> >   always_reached = false;
> > It misses the case where statement can trhow external exception or
> > volatile asms that can also terminate execution.
> > 
> > I bit worry on how often the test will have false positives with guessed
> > profile where earlier loop optimizations reduced trip count below
> > realistic estimate.
> 
> Sorry I don't understand why always_executed and always_reached matters here?
> the test is based on BB before the FOR_BB_INSNS loop...

always_executed is useful here to avoid the situation where bb->count or
preheader->count is wrong and it would disable wanted transformation.

always_executed is just a bug I noticed while looking at the code.  I
will produce testcase for bugzilla.

Honza
  
Xionghu Luo Dec. 17, 2021, 1:30 a.m. UTC | #6
On 2021/12/16 19:20, Jan Hubicka wrote:
>>  
>> OK. Comments like?
>>
>> /* Don't move insn of cold BB out of loop to preheader to reduce calculations
>>    and register live range in hot loop with cold BB.  */
> 
> Looks good.
>>
>>
>> And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.
>>
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>    basic_block preheader = loop_preheader_edge (loop)->src;
>>
>>    if (preheader->count > bb->count)
>> -    return;
>> +    {
>> +      if (dump_file)
>> +       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
>> +                bb->index, loop->num);
>> +      return;
>> +    }
> 
> This is also a good idea - testcases are quite important for this typo
> of stuff.
>>>
>>> Thinking about this more, you want to test:
>>>   if (!always_executed && preheader->count > bb->count)
>>>     return;
>>> This will rule out some profile misupates.
>>>
>>> Also the code updating always_reached is worng:
>>>       if (always_reached
>>>           && CALL_P (insn)
>>>           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>>>               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
>>>   always_reached = false;
>>> It misses the case where statement can trhow external exception or
>>> volatile asms that can also terminate execution.
>>>
>>> I bit worry on how often the test will have false positives with guessed
>>> profile where earlier loop optimizations reduced trip count below
>>> realistic estimate.
>>
>> Sorry I don't understand why always_executed and always_reached matters here?
>> the test is based on BB before the FOR_BB_INSNS loop...
> 
> always_executed is useful here to avoid the situation where bb->count or
> preheader->count is wrong and it would disable wanted transformation.
> 
> always_executed is just a bug I noticed while looking at the code.  I
> will produce testcase for bugzilla.
> 
> Honza


Thanks, so is this OK to commit now?  And any additional comments for
the "[PATCH 3/3] Fix loop split incorrect count and probability"
(https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586374.html)?
 

Updated comments and testcase:

[PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL

From: Xiong Hu Luo <luoxhu@linux.ibm.com>

gcc/ChangeLog:

	* loop-invariant.c (find_invariants_bb): Check profile count
	before motion.
	(find_invariants_body): Add argument.

gcc/testsuite/ChangeLog:

	* gcc.dg/loop-invariant-2.c: New.
---
 gcc/loop-invariant.c                    | 17 ++++++++++++++---
 gcc/testsuite/gcc.dg/loop-invariant-2.c | 20 ++++++++++++++++++++
 2 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/loop-invariant-2.c

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index fca0c2b24be..690f7704a0b 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1183,9 +1183,21 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
    call.  */
 
 static void
-find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
+find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
+		    bool always_executed)
 {
   rtx_insn *insn;
+  basic_block preheader = loop_preheader_edge (loop)->src;
+
+  /* Don't move insn of cold BB out of loop to preheader to reduce calculations
+     and register live range in hot loop with cold BB.  */
+  if (!always_executed && preheader->count > bb->count)
+    {
+      if (dump_file)
+	fprintf (dump_file, "Don't move invariant from bb: %d out of loop %d\n",
+		 bb->index, loop->num);
+      return;
+    }
 
   FOR_BB_INSNS (bb, insn)
     {
@@ -1214,8 +1226,7 @@ find_invariants_body (class loop *loop, basic_block *body,
   unsigned i;
 
   for (i = 0; i < loop->num_nodes; i++)
-    find_invariants_bb (body[i],
-			bitmap_bit_p (always_reached, i),
+    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
 			bitmap_bit_p (always_executed, i));
 }
 
diff --git a/gcc/testsuite/gcc.dg/loop-invariant-2.c b/gcc/testsuite/gcc.dg/loop-invariant-2.c
new file mode 100644
index 00000000000..df3d8458569
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-invariant-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */
+
+volatile int x;
+void
+bar (int, char *, char *);
+void
+foo (int *a, int n, int k)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+    {
+      if (__builtin_expect (x, 0))
+	bar (k / 5, "one", "two");
+      a[i] = k;
+    }
+}
+
+/* { dg-final { scan-rtl-dump "Don't move invariant from bb: .*out of loop" "loop2_invariant" } } */
  
Xionghu Luo Dec. 29, 2021, 1:43 a.m. UTC | #7
On 2021/12/17 09:30, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/16 19:20, Jan Hubicka wrote:
>>>  
>>> OK. Comments like?
>>>
>>> /* Don't move insn of cold BB out of loop to preheader to reduce calculations
>>>    and register live range in hot loop with cold BB.  */
>>
>> Looks good.
>>>
>>>
>>> And maybe some dump log will help tracking in xxx.c.271r.loop2_invariant.
>>>
>>> --- a/gcc/loop-invariant.c
>>> +++ b/gcc/loop-invariant.c
>>> @@ -1190,7 +1190,12 @@ find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
>>>    basic_block preheader = loop_preheader_edge (loop)->src;
>>>
>>>    if (preheader->count > bb->count)
>>> -    return;
>>> +    {
>>> +      if (dump_file)
>>> +       fprintf (dump_file, "Don't move invariant from bb: %d in loop %d\n",
>>> +                bb->index, loop->num);
>>> +      return;
>>> +    }
>>
>> This is also a good idea - testcases are quite important for this typo
>> of stuff.
>>>>
>>>> Thinking about this more, you want to test:
>>>>   if (!always_executed && preheader->count > bb->count)
>>>>     return;
>>>> This will rule out some profile misupates.
>>>>
>>>> Also the code updating always_reached is worng:
>>>>       if (always_reached
>>>>           && CALL_P (insn)
>>>>           && (RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
>>>>               || ! RTL_CONST_OR_PURE_CALL_P (insn)))
>>>>   always_reached = false;
>>>> It misses the case where statement can trhow external exception or
>>>> volatile asms that can also terminate execution.
>>>>
>>>> I bit worry on how often the test will have false positives with guessed
>>>> profile where earlier loop optimizations reduced trip count below
>>>> realistic estimate.
>>>
>>> Sorry I don't understand why always_executed and always_reached matters here?
>>> the test is based on BB before the FOR_BB_INSNS loop...
>>
>> always_executed is useful here to avoid the situation where bb->count or
>> preheader->count is wrong and it would disable wanted transformation.
>>
>> always_executed is just a bug I noticed while looking at the code.  I
>> will produce testcase for bugzilla.
>>
>> Honza
> 
> 
> Thanks, so is this OK to commit now?  And any additional comments for
> the "[PATCH 3/3] Fix loop split incorrect count and probability"
> (https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586374.html)?
> 
> 
> Updated comments and testcase:
> 
> [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL

This is the last patch for the "cold bb in hot loop optimization" I'd
like commit soon if not other comments,  to let it fully tested more
broadly before stage4.  Thanks.

Regression tested pass on powerpc64le-linux-gnu {P10,P9},
powerpc64-linux-gnu {P8, P7} and X86 though.

> 
> From: Xiong Hu Luo <luoxhu@linux.ibm.com>
> 
> gcc/ChangeLog:
> 
> 	* loop-invariant.c (find_invariants_bb): Check profile count
> 	before motion.
> 	(find_invariants_body): Add argument.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/loop-invariant-2.c: New.
> ---
>  gcc/loop-invariant.c                    | 17 ++++++++++++++---
>  gcc/testsuite/gcc.dg/loop-invariant-2.c | 20 ++++++++++++++++++++
>  2 files changed, 34 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/loop-invariant-2.c
> 
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index fca0c2b24be..690f7704a0b 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1183,9 +1183,21 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
>     call.  */
> 
>  static void
> -find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> +		    bool always_executed)
>  {
>    rtx_insn *insn;
> +  basic_block preheader = loop_preheader_edge (loop)->src;
> +
> +  /* Don't move insn of cold BB out of loop to preheader to reduce calculations
> +     and register live range in hot loop with cold BB.  */
> +  if (!always_executed && preheader->count > bb->count)
> +    {
> +      if (dump_file)
> +	fprintf (dump_file, "Don't move invariant from bb: %d out of loop %d\n",
> +		 bb->index, loop->num);
> +      return;
> +    }
> 
>    FOR_BB_INSNS (bb, insn)
>      {
> @@ -1214,8 +1226,7 @@ find_invariants_body (class loop *loop, basic_block *body,
>    unsigned i;
> 
>    for (i = 0; i < loop->num_nodes; i++)
> -    find_invariants_bb (body[i],
> -			bitmap_bit_p (always_reached, i),
> +    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
>  			bitmap_bit_p (always_executed, i));
>  }
> 
> diff --git a/gcc/testsuite/gcc.dg/loop-invariant-2.c b/gcc/testsuite/gcc.dg/loop-invariant-2.c
> new file mode 100644
> index 00000000000..df3d8458569
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/loop-invariant-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_invariant" } */
> +
> +volatile int x;
> +void
> +bar (int, char *, char *);
> +void
> +foo (int *a, int n, int k)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i++)
> +    {
> +      if (__builtin_expect (x, 0))
> +	bar (k / 5, "one", "two");
> +      a[i] = k;
> +    }
> +}
> +
> +/* { dg-final { scan-rtl-dump "Don't move invariant from bb: .*out of loop" "loop2_invariant" } } */
  
Jan Hubicka Dec. 29, 2021, 12:55 p.m. UTC | #8
> > 
> > From: Xiong Hu Luo <luoxhu@linux.ibm.com>
> > 
> > gcc/ChangeLog:
> > 
> > 	* loop-invariant.c (find_invariants_bb): Check profile count
> > 	before motion.
> > 	(find_invariants_body): Add argument.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.dg/loop-invariant-2.c: New.
OK,
thanks!
Honza
  
Xionghu Luo Dec. 30, 2021, 6:08 a.m. UTC | #9
On 2021/12/29 20:55, Jan Hubicka wrote:
>>>
>>> From: Xiong Hu Luo <luoxhu@linux.ibm.com>
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* loop-invariant.c (find_invariants_bb): Check profile count
>>> 	before motion.
>>> 	(find_invariants_body): Add argument.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.dg/loop-invariant-2.c: New.
> OK,
> thanks!
> Honza

Thanks, committed to r12-6149.
  

Patch

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 5eee2e5c9f8..c61c8612fae 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1183,9 +1183,14 @@  find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed)
    call.  */
 
 static void
-find_invariants_bb (basic_block bb, bool always_reached, bool always_executed)
+find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
+		    bool always_executed)
 {
   rtx_insn *insn;
+  basic_block preheader = loop_preheader_edge (loop)->src;
+
+  if (preheader->count > bb->count)
+    return;
 
   FOR_BB_INSNS (bb, insn)
     {
@@ -1214,8 +1219,7 @@  find_invariants_body (class loop *loop, basic_block *body,
   unsigned i;
 
   for (i = 0; i < loop->num_nodes; i++)
-    find_invariants_bb (body[i],
-			bitmap_bit_p (always_reached, i),
+    find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
 			bitmap_bit_p (always_executed, i));
 }