[v9] Improve code sinking pass
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Testing passed
|
Commit Message
This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated. Synced with latest sources and modify the code changes
accordingly.
For example :
void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
int l;
l = a + b + c + d +e + f;
if (a != 5)
{
bar();
j = l;
}
}
Code Sinking does the following:
void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
int l;
if (a != 5)
{
l = a + b + c + d +e + f;
bar();
j = l;
}
}
Bootstrapped regtested on powerpc64-linux-gnu.
Thanks & Regards
Ajit
tree-ssa-sink: Improve code sinking pass
Currently, code sinking will sink code after function calls. This increases
register pressure for callee-saved registers. The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.
2023-10-12 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>
gcc/ChangeLog:
PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.
gcc/testsuite/ChangeLog:
PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New test.
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
---
gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++++++
gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++++++++++
gcc/tree-ssa-sink.cc | 39 ++++++++++++---------
3 files changed, 56 insertions(+), 17 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+ int l;
+ l = a + b + c + d +e + f;
+ if (a != 5)
+ {
+ bar();
+ j = l;
+ }
+}
+/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
new file mode 100644
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+ int l;
+ l = a + b + c + d +e + f;
+ if (a != 5)
+ {
+ bar();
+ if (b != 3)
+ x = 3;
+ else
+ x = 5;
+ j = l;
+ }
+}
+/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
@@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
/* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
- statements.
+ statements. The best basic block should be an immediate dominator of
+ best basic block if the use stmt is after the call.
We want the most control dependent block in the shallowest loop nest.
@@ -196,6 +197,16 @@ select_best_block (basic_block early_bb,
basic_block best_bb = late_bb;
basic_block temp_bb = late_bb;
int threshold;
+ /* Get the sinking threshold. If the statement to be moved has memory
+ operands, then increase the threshold by 7% as those are even more
+ profitable to avoid, clamping at 100%. */
+ threshold = param_sink_frequency_threshold;
+ if (gimple_vuse (stmt) || gimple_vdef (stmt))
+ {
+ threshold += 7;
+ if (threshold > 100)
+ threshold = 100;
+ }
while (temp_bb != early_bb)
{
@@ -204,6 +215,14 @@ select_best_block (basic_block early_bb,
if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
+ /* if we have temp_bb post dominated by use block block then immediate
+ * dominator would be our best block. */
+ if (!gimple_vuse (stmt)
+ && bb_loop_depth (temp_bb) == bb_loop_depth (early_bb)
+ && !(temp_bb->count * 100 >= early_bb->count * threshold)
+ && dominated_by_p (CDI_DOMINATORS, late_bb, temp_bb))
+ best_bb = temp_bb;
+
/* Walk up the dominator tree, hopefully we'll find a shallower
loop nest. */
temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
@@ -233,17 +252,6 @@ select_best_block (basic_block early_bb,
&& !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
return early_bb;
- /* Get the sinking threshold. If the statement to be moved has memory
- operands, then increase the threshold by 7% as those are even more
- profitable to avoid, clamping at 100%. */
- threshold = param_sink_frequency_threshold;
- if (gimple_vuse (stmt) || gimple_vdef (stmt))
- {
- threshold += 7;
- if (threshold > 100)
- threshold = 100;
- }
-
/* If BEST_BB is at the same nesting level, then require it to have
significantly lower execution frequency to avoid gratuitous movement. */
if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
@@ -430,6 +438,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
continue;
break;
}
+
use = USE_STMT (one_use);
if (gimple_code (use) != GIMPLE_PHI)
@@ -439,10 +448,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
if (sinkbb == frombb)
return false;
- if (sinkbb == gimple_bb (use))
- *togsi = gsi_for_stmt (use);
- else
- *togsi = gsi_after_labels (sinkbb);
+ *togsi = gsi_after_labels (sinkbb);
return true;
}
@@ -825,7 +831,6 @@ pass_sink_code::execute (function *fun)
mark_dfs_back_edges (fun);
memset (&sink_stats, 0, sizeof (sink_stats));
calculate_dominance_info (CDI_DOMINATORS);
-
virtual_operand_live vop_live;
int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));