Improve AutoFDO count propagation algorithm

Message ID MW2PR2101MB110052C7BCF974D4C1A4DB8C916A9@MW2PR2101MB1100.namprd21.prod.outlook.com
State New
Headers
Series Improve AutoFDO count propagation algorithm |

Commit Message

Eugene Rozenfeld Dec. 3, 2021, 2:53 a.m. UTC
  When a basic block A has been annotated with a count and it has only one
successor (or predecessor) B, we can propagate the A's count to B.
The algorithm without this change could leave B without an annotation if B had
other unannotated predecessors (or successors). For example, in the test case I added,
the loop header block was left unannotated, which prevented loop unrolling.

gcc/ChangeLog:
        * auto-profile.c (afdo_propagate_edge): Improve count propagation algorithm.

gcc/testsuite/ChangeLog:
        * gcc.dg/tree-prof/init-array.c: New test for unrolling inner loops.
---
 gcc/auto-profile.c                          | 20 +++++++++-
 gcc/testsuite/gcc.dg/tree-prof/init-array.c | 43 +++++++++++++++++++++
 2 files changed, 61 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/init-array.c

--
2.25.1
  

Comments

Jeff Law Dec. 3, 2021, 3:41 p.m. UTC | #1
On 12/2/2021 7:53 PM, Eugene Rozenfeld via Gcc-patches wrote:
> When a basic block A has been annotated with a count and it has only one
> successor (or predecessor) B, we can propagate the A's count to B.
> The algorithm without this change could leave B without an annotation if B had
> other unannotated predecessors (or successors). For example, in the test case I added,
> the loop header block was left unannotated, which prevented loop unrolling.
>
> gcc/ChangeLog:
>          * auto-profile.c (afdo_propagate_edge): Improve count propagation algorithm.
>
> gcc/testsuite/ChangeLog:
>          * gcc.dg/tree-prof/init-array.c: New test for unrolling inner loops.
Seems quite sensible.   I can also easily argue this is a bugfix, even 
though there isn't a BZ associated with  this issue that I'm aware of.

OK for the trunk.

Thanks,
Jeff
  

Patch

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 4c1fc6b536b..dfcd68113aa 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -1192,7 +1192,8 @@  afdo_find_equiv_class (bb_set *annotated_bb)
 /* If a basic block's count is known, and only one of its in/out edges' count
    is unknown, its count can be calculated. Meanwhile, if all of the in/out
    edges' counts are known, then the basic block's unknown count can also be
-   calculated.
+   calculated. Also, if a block has a single predecessor or successor, the block's
+   count can be propagated to that predecessor or successor.
    IS_SUCC is true if out edges of a basic blocks are examined.
    Update ANNOTATED_BB accordingly.
    Return TRUE if any basic block/edge count is changed.  */
@@ -1208,6 +1209,7 @@  afdo_propagate_edge (bool is_succ, bb_set *annotated_bb)
     edge e, unknown_edge = NULL;
     edge_iterator ei;
     int num_unknown_edge = 0;
+    int num_edge = 0;
     profile_count total_known_count = profile_count::zero ().afdo ();

     FOR_EACH_EDGE (e, ei, is_succ ? bb->succs : bb->preds)
@@ -1217,6 +1219,7 @@  afdo_propagate_edge (bool is_succ, bb_set *annotated_bb)
          num_unknown_edge++, unknown_edge = e;
        else
          total_known_count += AFDO_EINFO (e)->get_count ();
+       num_edge++;
       }

     /* Be careful not to annotate block with no successor in special cases.  */
@@ -1230,7 +1233,20 @@  afdo_propagate_edge (bool is_succ, bb_set *annotated_bb)
     else if (num_unknown_edge == 1 && is_bb_annotated (bb, *annotated_bb))
       {
        if (bb->count > total_known_count)
-         AFDO_EINFO (unknown_edge)->set_count (bb->count - total_known_count);
+         {
+             profile_count new_count = bb->count - total_known_count;
+             AFDO_EINFO(unknown_edge)->set_count(new_count);
+             if (num_edge == 1)
+               {
+                 basic_block succ_or_pred_bb = is_succ ? unknown_edge->dest : unknown_edge->src;
+                 if (new_count > succ_or_pred_bb->count)
+                   {
+                     succ_or_pred_bb->count = new_count;
+                     if (!is_bb_annotated (succ_or_pred_bb, *annotated_bb))
+                       set_bb_annotated (succ_or_pred_bb, annotated_bb);
+                   }
+               }
+          }
        else
          AFDO_EINFO (unknown_edge)->set_count (profile_count::zero().afdo ());
        AFDO_EINFO (unknown_edge)->set_annotated ();
diff --git a/gcc/testsuite/gcc.dg/tree-prof/init-array.c b/gcc/testsuite/gcc.dg/tree-prof/init-array.c
new file mode 100644
index 00000000000..0f7a5c84481
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/init-array.c
@@ -0,0 +1,43 @@ 
+/* { dg-options "-O3 -fdump-tree-cunrolli-details" } */
+
+static int s[10][10][10];
+static int d[10][10][10];
+
+__attribute__((noipa))
+int array()
+{
+       int i;
+       register int j, k;
+       for (i = 0; i < 10; i++)
+               for (j = 0; j < 10; j++)
+                       for (k = 0; k < 10; k++)
+                               d[i][j][k] = s[i][j][k];
+
+       return(0);
+}
+
+__attribute__((noipa))
+void TestBench()
+{
+       for (int i = 0; i < 150000; ++i)
+       {
+          array();
+       }
+}
+
+int main(int argc, char *argv[])
+{
+
+       TestBench();
+
+       if (d[9][9][9] == 0 && s[9][9][9] == 0)
+       {
+               return 0;
+       }
+       else
+       {
+               return -1;
+       }
+}
+
+/* { dg-final-use { scan-tree-dump-times "loop with 10 iterations completely unrolled" 2 "cunrolli"} } */