Implement intraprocedural dataflow for ipa-modref EAF analyser

Message ID 20211104141216.GC60321@kam.mff.cuni.cz
State New
Headers
Series Implement intraprocedural dataflow for ipa-modref EAF analyser |

Commit Message

Jan Hubicka Nov. 4, 2021, 2:12 p.m. UTC
  Hi,
this patch implements the (long promised) intraprocedural dataflow for
propagating eaf flags, so we can handle parameters that participate
in loops in SSA graphs. Typical example are acessors that walk linked
lists, for example.

I implemented dataflow using the standard iteration over BBs in RPO some time
ago, but did not like it becuase it had measurable compile time impact with
very small code quality effect. This is why I kept mainline to do the DFS walk
instead. The reason is that we care about flags of SSA names that corresponds
to parameters and those can be often determined from a small fraction of the
SSA graph so solving dataflow for all SSA names in a function is a waste.

This patch implements dataflow more carefully.  The DFS walk is kept in place to
solve acyclic cases and discover the relevat part of SSA graph into new graph
(which is similar to one used for inter-procedrual dataflow - we only need to
know the edges and if the access is direct or derefernced).  The RPO iterative
dataflow then works on this simplified graph.

This seems to be fast in practice. For GCC linktime we do dataflow for 4881
functions. Out of that 4726 finishes in one iteration, 144 in two and 10 in 3.

Overall 31979 functions are analysed, so we do dataflow only for bit over of
10% of cases.  131123 edges are visited by the solver.  I measured no compile
time impact of this.

The disambiguation statis for cc1plus goes from:

Alias oracle query stats:
  refs_may_alias_p: 78335822 disambiguations, 98877143 queries
  ref_maybe_used_by_call_p: 631969 disambiguations, 79338862 queries
  call_may_clobber_ref_p: 382561 disambiguations, 385621 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 26303 queries
  nonoverlapping_refs_since_match_p: 30050 disambiguations, 64969 must overlaps, 95964 queries
  aliasing_component_refs_p: 57690 disambiguations, 11337240 queries
  TBAA oracle: 27859092 disambiguations 91586199 queries
               14867046 are in alias set 0
               8935455 queries asked about the same object
               123 queries asked about the same alias set
               0 access volatile
               38018474 are dependent in the DAG
               1906009 are aritificially in conflict with void *

Modref stats:
  modref use: 25148 disambiguations, 684073 queries
  modref clobber: 2333227 disambiguations, 21994356 queries
  5351481 tbaa queries (0.243312 per modref query)
  756477 base compares (0.034394 per modref query)

PTA query stats:
  pt_solution_includes: 13257914 disambiguations, 36298965 queries
  pt_solutions_intersect: 1541411 disambiguations, 13636921 queries

To:

Alias oracle query stats:
  refs_may_alias_p: 78490846 disambiguations, 99011860 queries
  ref_maybe_used_by_call_p: 641474 disambiguations, 79490686 queries
  call_may_clobber_ref_p: 386604 disambiguations, 389611 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 26209 queries
  nonoverlapping_refs_since_match_p: 30139 disambiguations, 65090 must overlaps, 96188 queries
  aliasing_component_refs_p: 57733 disambiguations, 11335088 queries
  TBAA oracle: 27889044 disambiguations 91654026 queries
               14889813 are in alias set 0
               8943185 queries asked about the same object
               117 queries asked about the same alias set
               0 access volatile
               38023282 are dependent in the DAG
               1908585 are aritificially in conflict with void *

Modref stats:
  modref use: 25348 disambiguations, 697524 queries
  modref clobber: 2335849 disambiguations, 22347941 queries
  5361162 tbaa queries (0.239895 per modref query)
  759200 base compares (0.033972 per modref query)

PTA query stats:
  pt_solution_includes: 13340954 disambiguations, 36365454 queries
  pt_solutions_intersect: 1582530 disambiguations, 13666763 queries

So there is a small improvement around 1% in PTA.
However there are funtions that seems to be quite benefical to analyse like htab_hash_string,
for example, which previously got stuck on the loop in SSA graph while walking the string.

Bootstrapped/regtested x86_64-linux, plan to commit after bit more testing.

gcc/ChangeLog:

	* ipa-modref.c (modref_lattice): Add do_dataflow,
	changed and propagate_to fields.
	(modref_lattice::release): Free propagate_to
	(modref_lattice::merge): Do not give up early on unknown
	lattice values.
	(modref_lattice::merge_deref): Likewise.
	(modref_eaf_analysis): Update toplevel comment.
	(modref_eaf_analysis::analyze_ssa_name): Record postponned ssa names;
	do optimistic dataflow initialization.
	(modref_eaf_analysis::merge_with_ssa_name): Build dataflow graph.
	(modref_eaf_analysis::propagate): New member function.
	(analyze_parms): Update to new API of modref_eaf_analysis.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/modref-11.c: New test.
  

Comments

Martin Liška Nov. 4, 2021, 4:12 p.m. UTC | #1
On 11/4/21 15:12, Jan Hubicka via Gcc-patches wrote:
> |Bootstrapped/regtested x86_64-linux, plan to commit after bit more testing.|

Can you please install the patch after the current MOD REF crashes are fixed?
It will help us with the future bisection.

Thanks,
Martin
  
Jan Hubicka Nov. 4, 2021, 4:14 p.m. UTC | #2
> On 11/4/21 15:12, Jan Hubicka via Gcc-patches wrote:
> > |Bootstrapped/regtested x86_64-linux, plan to commit after bit more testing.|
> 
> Can you please install the patch after the current MOD REF crashes are fixed?
> It will help us with the future bisection.

Sure, this is what I am doing today :)

Honza
> 
> Thanks,
> Martin
  
Jan Hubicka Nov. 7, 2021, 8:40 a.m. UTC | #3
Hi,
I have commited the patch now.  On the current tree the patch causes new
failure
./gcc/testsuite/gfortran/gfortran.sum:FAIL: gfortran.dg/vector_subscript_1.f90   -O1  execution test
./gcc/testsuite/gfortran/gfortran.sum:FAIL: gfortran.dg/vector_subscript_1.f90   -O2  execution test
./gcc/testsuite/gfortran/gfortran.sum:FAIL: gfortran.dg/vector_subscript_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
./gcc/testsuite/gfortran/gfortran.sum:FAIL: gfortran.dg/vector_subscript_1.f90   -O3 -g  execution test

This is however just semi-latent bug that got triggered and same problem
reproduces with -fipa-pta and is tracked in PR86389
The problem is that Fortran builds a trailling array that is not
discovered as trailling by the middle-end and dom pass does invalid
propagation.

Same problem is triggered by modref because in order to make this happen
we need to discover that nested function test is not writting into
arrays a and b.

Honza
  

Patch

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index db071d2212f..188c41aeb50 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1456,14 +1456,32 @@  class modref_lattice
 public:
   /* EAF flags of the SSA name.  */
   eaf_flags_t flags;
-  /* DFS bookkkeeping: we don't do real dataflow yet.  */
+  /* Used during DFS walk to mark names where final value was determined
+     without need for dataflow.  */
   bool known;
+  /* Used during DFS walk to mark open vertices (for cycle detection).  */
   bool open;
+  /* Set during DFS walk for names that needs dataflow propagation.  */
+  bool do_dataflow;
+  /* Used during the iterative dataflow.  */
+  bool changed;
 
   /* When doing IPA analysis we can not merge in callee escape points;
      Only remember them and do the merging at IPA propagation time.  */
   vec <escape_point, va_heap, vl_ptr> escape_points;
 
+  /* Representation of a graph for dataaflow.  This graph is built on-demand
+     using modref_eaf_analysis::analyze_ssa and later solved by
+     modref_eaf_analysis::propagate.
+     Each edge represents the fact that flags of current lattice should be
+     propagated to lattice of SSA_NAME.  */
+  struct propagate_edge
+  {
+    int ssa_name;
+    bool deref;
+  };
+  vec <propagate_edge, va_heap, vl_ptr> propagate_to;
+
   void init ();
   void release ();
   bool merge (const modref_lattice &with);
@@ -1495,6 +1513,7 @@  void
 modref_lattice::release ()
 {
   escape_points.release ();
+  propagate_to.release ();
 }
 
 /* Dump lattice to OUT; indent with INDENT spaces.  */
@@ -1587,7 +1606,7 @@  bool
 modref_lattice::merge (const modref_lattice &with)
 {
   if (!with.known)
-    return merge (0);
+    do_dataflow = true;
 
   bool changed = merge (with.flags);
 
@@ -1608,7 +1627,7 @@  bool
 modref_lattice::merge_deref (const modref_lattice &with, bool ignore_stores)
 {
   if (!with.known)
-    return merge (0);
+    do_dataflow = true;
 
   bool changed = merge (deref_flags (with.flags, ignore_stores));
 
@@ -1646,13 +1665,24 @@  modref_lattice::merge_direct_store ()
   return merge (~(EAF_UNUSED | EAF_NOCLOBBER));
 }
 
-/* Analyzer of EAF flags.  */
+/* Analyzer of EAF flags.
+   This is genrally dataflow problem over the SSA graph, however we only
+   care about flags of few selected ssa names (arguments, return slot and
+   static chain).  So we first call analyze_ssa_name on all relevant names
+   and perform a DFS walk to discover SSA names where flags needs to be
+   determined.  For acyclic graphs we try to determine final flags during
+   this walk.  Once cycles or recursin depth is met we enlist SSA names
+   for dataflow which is done by propagate call.
+
+   After propagation the flags can be obtained using get_ssa_name_flags.  */
 
 class modref_eaf_analysis
 {
 public:
-  /* Compute flags for NAME.  */
+  /* Mark NAME as relevant for analysis.  */
   void analyze_ssa_name (tree name);
+  /* Dataflow slover.  */
+  void propagate ();
   /* Return flags computed earlier for NAME.  */
   int get_ssa_name_flags (tree name)
   {
@@ -1673,7 +1703,7 @@  public:
   ~modref_eaf_analysis ()
   {
     gcc_checking_assert (!m_depth);
-    if (m_ipa)
+    if (m_ipa || m_names_to_propagate.length ())
       for (unsigned int i = 0; i < num_ssa_names; i++)
 	m_lattice[i].release ();
   }
@@ -1685,6 +1715,8 @@  private:
   int m_depth;
   /* Propagation lattice for individual ssa names.  */
   auto_vec<modref_lattice> m_lattice;
+  auto_vec<tree> m_deferred_names;
+  auto_vec<int> m_names_to_propagate;
 
   void merge_with_ssa_name (tree dest, tree src, bool deref);
   void merge_call_lhs_flags (gcall *call, int arg, tree name, bool deref);
@@ -1773,26 +1805,27 @@  modref_eaf_analysis::analyze_ssa_name (tree name)
   int index = SSA_NAME_VERSION (name);
 
   /* See if value is already computed.  */
-  if (m_lattice[index].known)
+  if (m_lattice[index].known || m_lattice[index].do_dataflow)
    return;
   if (m_lattice[index].open)
     {
       if (dump_file)
 	fprintf (dump_file,
-		 "%*sGiving up on a cycle in SSA graph\n",
+		 "%*sCycle in SSA graph\n",
 		 m_depth * 4, "");
       return;
     }
+  /* Recursion guard.  */
+  m_lattice[index].init ();
   if (m_depth == param_modref_max_depth)
     {
       if (dump_file)
 	fprintf (dump_file,
-		 "%*sGiving up on max depth\n",
+		 "%*sMax recursion depth reached; postponing\n",
 		 m_depth * 4, "");
+      m_deferred_names.safe_push (name);
       return;
     }
-  /* Recursion guard.  */
-  m_lattice[index].init ();
 
   if (dump_file)
     {
@@ -2072,7 +2105,8 @@  modref_eaf_analysis::analyze_ssa_name (tree name)
       m_lattice[index].dump (dump_file, m_depth * 4 + 2);
     }
   m_lattice[index].open = false;
-  m_lattice[index].known = true;
+  if (!m_lattice[index].do_dataflow)
+    m_lattice[index].known = true;
 }
 
 /* Propagate info from SRC to DEST.  If DEREF it true, assume that SRC
@@ -2084,6 +2118,10 @@  modref_eaf_analysis::merge_with_ssa_name (tree dest, tree src, bool deref)
   int index = SSA_NAME_VERSION (dest);
   int src_index = SSA_NAME_VERSION (src);
 
+  /* Merging lattice with itself is a no-op.  */
+  if (!deref && src == dest)
+    return;
+
   m_depth++;
   analyze_ssa_name (src);
   m_depth--;
@@ -2091,6 +2129,157 @@  modref_eaf_analysis::merge_with_ssa_name (tree dest, tree src, bool deref)
     m_lattice[index].merge_deref (m_lattice[src_index], false);
   else
     m_lattice[index].merge (m_lattice[src_index]);
+
+  /* If we failed to produce final solution add an edge to the dataflow
+     graph.  */
+  if (!m_lattice[src_index].known)
+    {
+      modref_lattice::propagate_edge e = {index, deref};
+
+      if (!m_lattice[src_index].propagate_to.length ())
+	m_names_to_propagate.safe_push (src_index);
+      m_lattice[src_index].propagate_to.safe_push (e);
+      m_lattice[src_index].changed = true;
+      m_lattice[src_index].do_dataflow = true;
+      if (dump_file)
+	fprintf (dump_file,
+		 "%*sWill propgate from ssa_name %i to %i%s\n",
+		 m_depth * 4 + 4,
+		 "", src_index, index, deref ? " (deref)" : "");
+    }
+}
+
+/* In the case we deferred some SSA names, reprocess them.  In the case some
+   dataflow edges were introduced, do the actual iterative dataflow.  */
+
+void
+modref_eaf_analysis::propagate ()
+{
+  int iterations = 0;
+  size_t i;
+  int index;
+  bool changed = true;
+
+  while (m_deferred_names.length ())
+    {
+      tree name = m_deferred_names.pop ();
+      m_lattice[SSA_NAME_VERSION (name)].open = false;
+      if (dump_file)
+	fprintf (dump_file, "Analyzing deferred SSA name\n");
+      analyze_ssa_name (name);
+    }
+
+  if (!m_names_to_propagate.length ())
+    return;
+  if (dump_file)
+    fprintf (dump_file, "Propagating EAF flags\n");
+
+  /* Compute reverse postorder.  */
+  auto_vec <int> rpo;
+  struct stack_entry
+  {
+    int name;
+    unsigned pos;
+  };
+  auto_vec <struct stack_entry> stack;
+  int pos = m_names_to_propagate.length () - 1;
+
+  rpo.safe_grow (m_names_to_propagate.length (), true);
+  stack.reserve_exact (m_names_to_propagate.length ());
+
+  /* We reuse known flag for RPO DFS walk bookeeping.  */
+  if (flag_checking)
+    FOR_EACH_VEC_ELT (m_names_to_propagate, i, index)
+      gcc_assert (!m_lattice[index].known && m_lattice[index].changed);
+
+  FOR_EACH_VEC_ELT (m_names_to_propagate, i, index)
+    {
+      if (!m_lattice[index].known)
+	{
+	  stack_entry e = {index, 0};
+
+	  stack.quick_push (e);
+	  m_lattice[index].known = true;
+	}
+      while (stack.length ())
+ 	{
+	  bool found = false;
+	  int index1 = stack.last ().name;
+
+	  while (stack.last ().pos < m_lattice[index1].propagate_to.length ())
+	    {
+	      int index2 = m_lattice[index1]
+		      .propagate_to[stack.last ().pos].ssa_name;
+
+	      stack.last ().pos++;
+	      if (!m_lattice[index2].known
+		  && m_lattice[index2].propagate_to.length ())
+		{
+		  stack_entry e = {index2, 0};
+
+		  stack.quick_push (e);
+		  m_lattice[index2].known = true;
+		  found = true;
+		  break;
+		}
+	    }
+	  if (!found
+	      && stack.last ().pos == m_lattice[index1].propagate_to.length ())
+	    {
+	      rpo[pos--] = index1;
+	      stack.pop ();
+	    }
+	}
+    }
+
+  /* Perform itrative dataflow.  */
+  while (changed)
+    {
+      changed = false;
+      iterations++;
+      if (dump_file)
+	fprintf (dump_file, " iteration %i\n", iterations);
+      FOR_EACH_VEC_ELT (rpo, i, index)
+	{
+	  if (m_lattice[index].changed)
+	    {
+	      size_t j;
+
+	      m_lattice[index].changed = false;
+	      if (dump_file)
+		fprintf (dump_file, "  Visiting ssa name %i\n", index);
+	      for (j = 0; j < m_lattice[index].propagate_to.length (); j++)
+		{
+		  bool ch;
+		  int target = m_lattice[index].propagate_to[j].ssa_name;
+		  bool deref = m_lattice[index].propagate_to[j].deref;
+
+		  if (dump_file)
+		    fprintf (dump_file, "   Propagating flags of ssa name"
+			     " %i to %i%s\n",
+			     index, target, deref ? " (deref)" : "");
+		  m_lattice[target].known = true;
+		  if (!m_lattice[index].propagate_to[j].deref)
+		    ch = m_lattice[target].merge (m_lattice[index]);
+		  else
+		    ch = m_lattice[target].merge_deref (m_lattice[index],
+							false);
+		  if (!ch)
+		    continue;
+		  if (dump_file)
+		    {
+		      fprintf (dump_file, "   New lattice: ");
+		      m_lattice[target].dump (dump_file);
+		    }
+		  if (target <= (int)i)
+		    changed = true;
+		  m_lattice[target].changed = true;
+		}
+	    }
+	}
+    }
+  if (dump_file)
+    fprintf (dump_file, "EAF flags propagated in %i iterations\n", iterations);
 }
 
 /* Record escape points of PARM_INDEX according to LATTICE.  */
@@ -2151,6 +2340,23 @@  analyze_parms (modref_summary *summary, modref_summary_lto *summary_lto,
 
   modref_eaf_analysis eaf_analysis (ipa);
 
+  /* Determine all SSA names we need to know flags for.  */
+  for (tree parm = DECL_ARGUMENTS (current_function_decl); parm;
+       parm = TREE_CHAIN (parm))
+    {
+      tree name = ssa_default_def (cfun, parm);
+      if (name)
+	eaf_analysis.analyze_ssa_name (name);
+    }
+  if (retslot)
+    eaf_analysis.analyze_ssa_name (retslot);
+  if (static_chain)
+    eaf_analysis.analyze_ssa_name (static_chain);
+
+  /* Do the dataflow.  */
+  eaf_analysis.propagate ();
+
+  /* Store results to summaries.  */
   for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; parm_index++,
        parm = TREE_CHAIN (parm))
     {
@@ -2175,7 +2381,6 @@  analyze_parms (modref_summary *summary, modref_summary_lto *summary_lto,
 	    }
 	  continue;
 	}
-      eaf_analysis.analyze_ssa_name (name);
       int flags = eaf_analysis.get_ssa_name_flags (name);
 
       /* Eliminate useless flags so we do not end up storing unnecessary
@@ -2204,7 +2409,6 @@  analyze_parms (modref_summary *summary, modref_summary_lto *summary_lto,
     }
   if (retslot)
     {
-      eaf_analysis.analyze_ssa_name (retslot);
       int flags = eaf_analysis.get_ssa_name_flags (retslot);
 
       flags = remove_useless_eaf_flags (flags, ecf_flags, false);
@@ -2220,7 +2424,6 @@  analyze_parms (modref_summary *summary, modref_summary_lto *summary_lto,
     }
   if (static_chain)
     {
-      eaf_analysis.analyze_ssa_name (static_chain);
       int flags = eaf_analysis.get_ssa_name_flags (static_chain);
 
       flags = remove_useless_eaf_flags (flags, ecf_flags, false);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-11.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-11.c
new file mode 100644
index 00000000000..de9ad16879f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-11.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-modref1"  } */
+struct linkedlist {
+  struct linkedlist *next;
+};
+struct linkedlist *
+find_last (struct linkedlist *l)
+{
+  while (l->next)
+   l = l->next;
+  return l;
+}
+/* { dg-final { scan-tree-dump "noclobber noescape nodirectescape" "modref1"} } */