tree-optimization/103168 - Improve VN of pure function calls

Message ID 1o6qo129-2591-o7po-p67q-3ns82312sp7q@fhfr.qr
State Committed
Commit 6180f5c8d6d1dc7b6634c41a46f0f8f5ca2e5b9d
Headers
Series tree-optimization/103168 - Improve VN of pure function calls |

Commit Message

Richard Biener Nov. 24, 2021, 11:40 a.m. UTC
  This improves value-numbering of calls that read memory, calls
to const functions with aggregate arguments and calls to
pure functions where the latter include const functions we
demoted to pure for the fear of interposing with a less
optimized version.  Note that for pure functions we do not
handle functions that access global memory.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2021-11-24  Richard Biener  <rguenther@suse.de>
	Jan Hubicka  <jh@suse.cz>

	PR tree-optimization/103168
	* ipa-modref.h (struct modref_summary): Add load_accesses.
	* ipa-modref.c (modref_summary::finalize): Initialize load_accesses.
	* tree-ssa-sccvn.c (visit_reference_op_call): Use modref
	info to walk the virtual use->def chain to CSE const/pure
	function calls possibly reading from memory.

	* g++.dg/tree-ssa/pr103168.C: New testcase.
---
 gcc/ipa-modref.c                         |  17 +++
 gcc/ipa-modref.h                         |   2 +
 gcc/testsuite/g++.dg/tree-ssa/pr103168.C |  24 +++++
 gcc/tree-ssa-sccvn.c                     | 126 +++++++++++++++++++++++
 4 files changed, 169 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr103168.C
  

Comments

Jan Hubicka Nov. 24, 2021, 11:49 a.m. UTC | #1
> This improves value-numbering of calls that read memory, calls
> to const functions with aggregate arguments and calls to
> pure functions where the latter include const functions we
> demoted to pure for the fear of interposing with a less
> optimized version.  Note that for pure functions we do not
> handle functions that access global memory.

Thank you! I am happy we finally undid some of the pessimization caused
by the interposition panic.  I was wondering if I should try next stage1
start tracking eliminated reads in functions, but that is tricky to do
since things like if (*global_var == *globa_var) is folded already in
frontend.

I was thinking a bit what to do abou global accesses and I think we
still can do something (also next stage1).  

Currently we disambiguate using
  if (stmt_may_clobber_ref_p_1 (def, &ref, true))
where ref is the REF is a ao_ref we built from the summary of STMT and
DEF is another statement.  This is fine but we ignore info we have from
PTA on STMT (the statement we try to optimize).

I we could look at DEF, and try to disambiguate all memory it
writes against STMT using PTA oracle and that would let us to handle
global memory well (we don't need REF for that) because we will work out
that some accesses are not escaping to STMT becaue they are not in
CALLUSED. Somewhat anoying is that we don't have predicate in
tree-ssa-alias for that (stmt_clobber_stmt_p :)

Honza
  
Richard Biener Nov. 24, 2021, 12:22 p.m. UTC | #2
On Wed, 24 Nov 2021, Jan Hubicka wrote:

> > This improves value-numbering of calls that read memory, calls
> > to const functions with aggregate arguments and calls to
> > pure functions where the latter include const functions we
> > demoted to pure for the fear of interposing with a less
> > optimized version.  Note that for pure functions we do not
> > handle functions that access global memory.
> 
> Thank you! I am happy we finally undid some of the pessimization caused
> by the interposition panic.  I was wondering if I should try next stage1
> start tracking eliminated reads in functions, but that is tricky to do
> since things like if (*global_var == *globa_var) is folded already in
> frontend.
> 
> I was thinking a bit what to do abou global accesses and I think we
> still can do something (also next stage1).  
> 
> Currently we disambiguate using
>   if (stmt_may_clobber_ref_p_1 (def, &ref, true))
> where ref is the REF is a ao_ref we built from the summary of STMT and
> DEF is another statement.  This is fine but we ignore info we have from
> PTA on STMT (the statement we try to optimize).
> 
> I we could look at DEF, and try to disambiguate all memory it
> writes against STMT using PTA oracle and that would let us to handle
> global memory well (we don't need REF for that) because we will work out
> that some accesses are not escaping to STMT becaue they are not in
> CALLUSED. Somewhat anoying is that we don't have predicate in
> tree-ssa-alias for that (stmt_clobber_stmt_p :)

Yes, note that we don't have callused unless IPA PTA is enabled,
but it might be salveagable from IPA reference info?  What we're
missing is a stmt_clobbers_pt_solution_p, or rather a reasonably
cheap way to construct an ao_ref covering all of a points-to
solution.  The not-so-cheap way to do that is

  tree tem = make_ssa_name (ptr_type_node);
  ptr_info_def *pi = get_ptr_info (p);
  pt->pt = *gimple_call_use_set (call_stmt);
  tree ref = build2 (MEM_REF, void_type_node /* ?? */, tem, build_zero_cst 
(ptr_type_node /* that effectively is ref-all */));
  ao_ref_init (&r, ref);
  r->base = ref;
  r->ref = NULL_TREE;
  r->offset = 0;
  r->alias_set = 0;
  r->base_alias_set = 0;

and if we come from IPA reference we first have to build a
points-to bitmap as well.

What would be a bit more convenient is probably adding
a pt_solution * member to ao_ref.  Maybe also avoiding
the MEM_REF build we already do in some cases and overload
the base field using a union and a designator ...

But yes, sth for next stage1.

Richard.
  
Jan Hubicka Nov. 24, 2021, 3:06 p.m. UTC | #3
> 
> Yes, note that we don't have callused unless IPA PTA is enabled,
> but it might be salveagable from IPA reference info?  What we're
> missing is a stmt_clobbers_pt_solution_p, or rather a reasonably
> cheap way to construct an ao_ref covering all of a points-to
> solution.  The not-so-cheap way to do that is
> 
>   tree tem = make_ssa_name (ptr_type_node);
>   ptr_info_def *pi = get_ptr_info (p);
>   pt->pt = *gimple_call_use_set (call_stmt);
>   tree ref = build2 (MEM_REF, void_type_node /* ?? */, tem, build_zero_cst 
> (ptr_type_node /* that effectively is ref-all */));
>   ao_ref_init (&r, ref);
>   r->base = ref;
>   r->ref = NULL_TREE;
>   r->offset = 0;
>   r->alias_set = 0;
>   r->base_alias_set = 0;
> 
> and if we come from IPA reference we first have to build a
> points-to bitmap as well.
> 
> What would be a bit more convenient is probably adding
> a pt_solution * member to ao_ref.  Maybe also avoiding

Having a way to add additional pt_solution to the ref looks like a good
idea, since then it could do through the common path in
tree-ssa-alias.c.  I will look into that next year :)
Honza
> the MEM_REF build we already do in some cases and overload
> the base field using a union and a designator ...
> 
> But yes, sth for next stage1.
> 
> Richard.
  

Patch

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 79d7d774715..923ae6c1dd3 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -721,6 +721,23 @@  modref_summary::finalize (tree fun)
 	    break;
 	}
     }
+  if (loads->every_base)
+    load_accesses = 1;
+  else
+    {
+      load_accesses = 0;
+      for (auto base_node : loads->bases)
+	{
+	  if (base_node->every_ref)
+	    load_accesses++;
+	  else
+	    for (auto ref_node : base_node->refs)
+	      if (ref_node->every_access)
+		load_accesses++;
+	      else
+		load_accesses += ref_node->accesses->length ();
+	}
+    }
 }
 
 /* Get function summary for FUNC if it exists, return NULL otherwise.  */
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index f868eb6de07..a0247f5449f 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -53,6 +53,8 @@  struct GTY(()) modref_summary
 
   /* Flags coputed by finalize method.  */
 
+  /* Total number of accesses in loads tree.  */
+  unsigned int load_accesses;
   /* global_memory_read is not set for functions calling functions
      with !binds_to_current_def which, after interposition, may read global
      memory but do nothing useful with it (except for crashing if some
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr103168.C b/gcc/testsuite/g++.dg/tree-ssa/pr103168.C
new file mode 100644
index 00000000000..82924a3e3ce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr103168.C
@@ -0,0 +1,24 @@ 
+// { dg-do compile }
+// { dg-options "-O2 -fdump-tree-fre1-details" }
+
+struct a
+{
+  int a;
+  static __attribute__ ((noinline))
+      int ret (int v) {return v;}
+
+  __attribute__ ((noinline))
+      int inca () {return a++;}
+};
+
+int
+test()
+{
+  struct a av;
+  av.a=1;
+  int val = av.ret (0) + av.inca();
+  av.a=2;
+  return val + av.ret(0) + av.inca();
+}
+
+/* { dg-final { scan-tree-dump-times "Replaced a::ret" 1 "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 149674e6a16..d31bf329d2e 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -71,6 +71,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop-niter.h"
 #include "builtins.h"
 #include "fold-const-call.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 #include "tree-ssa-sccvn.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
@@ -5084,12 +5086,136 @@  visit_reference_op_call (tree lhs, gcall *stmt)
   struct vn_reference_s vr1;
   vn_reference_t vnresult = NULL;
   tree vdef = gimple_vdef (stmt);
+  modref_summary *summary;
 
   /* Non-ssa lhs is handled in copy_reference_ops_from_call.  */
   if (lhs && TREE_CODE (lhs) != SSA_NAME)
     lhs = NULL_TREE;
 
   vn_reference_lookup_call (stmt, &vnresult, &vr1);
+
+  /* If the lookup did not succeed for pure functions try to use
+     modref info to find a candidate to CSE to.  */
+  const unsigned accesses_limit = 8;
+  if (!vnresult
+      && !vdef
+      && lhs
+      && gimple_vuse (stmt)
+      && (((summary = get_modref_function_summary (stmt, NULL))
+	   && !summary->global_memory_read
+	   && summary->load_accesses < accesses_limit)
+	  || gimple_call_flags (stmt) & ECF_CONST))
+    {
+      /* First search if we can do someting useful and build a
+	 vector of all loads we have to check.  */
+      bool unknown_memory_access = false;
+      auto_vec<ao_ref, accesses_limit> accesses;
+      unsigned load_accesses = summary ? summary->load_accesses : 0;
+      if (!unknown_memory_access)
+	/* Add loads done as part of setting up the call arguments.
+	   That's also necessary for CONST functions which will
+	   not have a modref summary.  */
+	for (unsigned i = 0; i < gimple_call_num_args (stmt); ++i)
+	  {
+	    tree arg = gimple_call_arg (stmt, i);
+	    if (TREE_CODE (arg) != SSA_NAME
+		&& !is_gimple_min_invariant (arg))
+	      {
+		if (accesses.length () >= accesses_limit - load_accesses)
+		  {
+		    unknown_memory_access = true;
+		    break;
+		  }
+		accesses.quick_grow (accesses.length () + 1);
+		ao_ref_init (&accesses.last (), arg);
+	      }
+	  }
+      if (summary && !unknown_memory_access)
+	{
+	  /* Add loads as analyzed by IPA modref.  */
+	  for (auto base_node : summary->loads->bases)
+	    if (unknown_memory_access)
+	      break;
+	    else for (auto ref_node : base_node->refs)
+	      if (unknown_memory_access)
+		break;
+	      else for (auto access_node : ref_node->accesses)
+		{
+		  accesses.quick_grow (accesses.length () + 1);
+		  ao_ref *r = &accesses.last ();
+		  if (!access_node.get_ao_ref (stmt, r))
+		    {
+		      /* Initialize a ref based on the argument and
+			 unknown offset if possible.  */
+		      tree arg = access_node.get_call_arg (stmt);
+		      if (arg && TREE_CODE (arg) == SSA_NAME)
+			arg = SSA_VAL (arg);
+		      if (arg
+			  && TREE_CODE (arg) == ADDR_EXPR
+			  && (arg = get_base_address (arg))
+			  && DECL_P (arg))
+			{
+			  ao_ref_init (r, arg);
+			  r->ref = NULL_TREE;
+			  r->base = arg;
+			}
+		      else
+			{
+			  unknown_memory_access = true;
+			  break;
+			}
+		    }
+		  r->base_alias_set = base_node->base;
+		  r->ref_alias_set = ref_node->ref;
+		}
+	}
+
+      /* Walk the VUSE->VDEF chain optimistically trying to find an entry
+	 for the call in the hashtable.  */
+      unsigned limit = (unknown_memory_access
+			? 0
+			: (param_sccvn_max_alias_queries_per_access
+			   / (accesses.length () + 1)));
+      tree saved_vuse = vr1.vuse;
+      hashval_t saved_hashcode = vr1.hashcode;
+      while (limit > 0 && !vnresult && !SSA_NAME_IS_DEFAULT_DEF (vr1.vuse))
+	{
+	  vr1.hashcode = vr1.hashcode - SSA_NAME_VERSION (vr1.vuse);
+	  gimple *def = SSA_NAME_DEF_STMT (vr1.vuse);
+	  /* ???  We could use fancy stuff like in walk_non_aliased_vuses, but
+	     do not bother for now.  */
+	  if (is_a <gphi *> (def))
+	    break;
+	  vr1.vuse = vuse_ssa_val (gimple_vuse (def));
+	  vr1.hashcode = vr1.hashcode + SSA_NAME_VERSION (vr1.vuse);
+	  vn_reference_lookup_1 (&vr1, &vnresult);
+	  limit--;
+	}
+
+      /* If we found a candidate to CSE to verify it is valid.  */
+      if (vnresult && !accesses.is_empty ())
+	{
+	  tree vuse = vuse_ssa_val (gimple_vuse (stmt));
+	  while (vnresult && vuse != vr1.vuse)
+	    {
+	      gimple *def = SSA_NAME_DEF_STMT (vuse);
+	      for (auto &ref : accesses)
+		{
+		  /* ???  stmt_may_clobber_ref_p_1 does per stmt constant
+		     analysis overhead that we might be able to cache.  */
+		  if (stmt_may_clobber_ref_p_1 (def, &ref, true))
+		    {
+		      vnresult = NULL;
+		      break;
+		    }
+		}
+	      vuse = vuse_ssa_val (gimple_vuse (def));
+	    }
+	}
+      vr1.vuse = saved_vuse;
+      vr1.hashcode = saved_hashcode;
+    }
+
   if (vnresult)
     {
       if (vnresult->result_vdef && vdef)