diff mbox series

Use modref summary to DSE calls to non-pure functions

Message ID 20211110124316.GF97553@kam.mff.cuni.cz
State New
Headers show
Series Use modref summary to DSE calls to non-pure functions | expand

Commit Message

Jan Hubicka Nov. 10, 2021, 12:43 p.m. UTC
Hi,
this patch implements DSE using modref summaries: if function has no side effects
besides storing to memory pointed to by its argument and if we can prove those stores
to be dead, we can optimize out. So we handle for example:

volatile int *ptr;
struct a {
	int a,b,c;
} a;
__attribute__((noinline))
static int init (struct a*a)
{
	a->a=0;
	a->b=1;
}
__attribute__((noinline))
static int use (struct a*a)
{
	if (a->c != 3)
		*ptr=5;
}

void
main(void)
{
	struct a a;
	init (&a);
	a.c=3;
	use (&a);
}

And optimize out call to init (&a).

We work quite hard to inline such constructors and this patch is only
effective if inlining did not happen (for whatever reason).  Still, we
optimize about 26 calls building tramp3d and about 70 calls during
bootstrap (mostly ctors of poly_int). During bootstrap most removal
happens early and we would inline the ctors unless we decide to optimize
for size. 1 call per cc1* binary is removed late during LTO build.

This is more frequent in codebases with higher abstraction penalty, with
-Os or with profile feedback in sections optimized for size. I also hope
we will be able to CSE such calls and that would make DSE more
important.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

	* tree-ssa-alias.c (ao_ref_alias_ptr_type): Export.
	* tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare.
	* tree-ssa-dse.c (dse_optimize_stmt): Rename to ...
	(dse_optimize_store): ... this;
	(dse_optimize_call): New function.
	(pass_dse::execute): Use dse_optimize_call and update
	call to dse_optimize_store.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/modref-dse-1.c: New test.
	* gcc.dg/tree-ssa/modref-dse-2.c: New test.

Comments

Richard Biener Nov. 11, 2021, 11:14 a.m. UTC | #1
On Wed, Nov 10, 2021 at 1:43 PM Jan Hubicka via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
> this patch implements DSE using modref summaries: if function has no side effects
> besides storing to memory pointed to by its argument and if we can prove those stores
> to be dead, we can optimize out. So we handle for example:
>
> volatile int *ptr;
> struct a {
>         int a,b,c;
> } a;
> __attribute__((noinline))
> static int init (struct a*a)
> {
>         a->a=0;
>         a->b=1;
> }
> __attribute__((noinline))
> static int use (struct a*a)
> {
>         if (a->c != 3)
>                 *ptr=5;
> }
>
> void
> main(void)
> {
>         struct a a;
>         init (&a);
>         a.c=3;
>         use (&a);
> }
>
> And optimize out call to init (&a).
>
> We work quite hard to inline such constructors and this patch is only
> effective if inlining did not happen (for whatever reason).  Still, we
> optimize about 26 calls building tramp3d and about 70 calls during
> bootstrap (mostly ctors of poly_int). During bootstrap most removal
> happens early and we would inline the ctors unless we decide to optimize
> for size. 1 call per cc1* binary is removed late during LTO build.
>
> This is more frequent in codebases with higher abstraction penalty, with
> -Os or with profile feedback in sections optimized for size. I also hope
> we will be able to CSE such calls and that would make DSE more
> important.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> gcc/ChangeLog:
>
>         * tree-ssa-alias.c (ao_ref_alias_ptr_type): Export.

ao_ref_init_from_ptr_and_range it is

>         * tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare.
>         * tree-ssa-dse.c (dse_optimize_stmt): Rename to ...
>         (dse_optimize_store): ... this;
>         (dse_optimize_call): New function.
>         (pass_dse::execute): Use dse_optimize_call and update
>         call to dse_optimize_store.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/modref-dse-1.c: New test.
>         * gcc.dg/tree-ssa/modref-dse-2.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
> new file mode 100644
> index 00000000000..e78693b349a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1"  } */
> +volatile int *ptr;
> +struct a {
> +       int a,b,c;
> +} a;
> +__attribute__((noinline))
> +static int init (struct a*a)
> +{
> +       a->a=0;
> +       a->b=1;
> +}
> +__attribute__((noinline))
> +static int use (struct a*a)
> +{
> +       if (a->c != 3)
> +               *ptr=5;
> +}
> +
> +void
> +main(void)
> +{
> +       struct a a;
> +       init (&a);
> +       a.c=3;
> +       use (&a);
> +}
> +/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
> new file mode 100644
> index 00000000000..99c8ceb8127
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse2 -fno-ipa-sra -fno-ipa-cp"  } */
> +volatile int *ptr;
> +struct a {
> +       int a,b,c;
> +} a;
> +__attribute__((noinline))
> +static int init (struct a*a)
> +{
> +       a->a=0;
> +       a->b=1;
> +       a->c=1;
> +}
> +__attribute__((noinline))
> +static int use (struct a*a)
> +{
> +       if (a->c != 3)
> +               *ptr=5;
> +}
> +
> +void
> +main(void)
> +{
> +       struct a a;
> +       init (&a);
> +       a.c=3;
> +       use (&a);
> +}
> +/* Only DSE2 is tracking live bytes needed to figure out that store to c is
> +   also dead above.  */
> +/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse2" } } */
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index eabf6805f2b..affb5d40d4b 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -782,7 +782,7 @@ ao_ref_alias_ptr_type (ao_ref *ref)
>     The access is assumed to be only to or after of the pointer target adjusted
>     by the offset, not before it (even in the case RANGE_KNOWN is false).  */
>
> -static void
> +void
>  ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
>                                 bool range_known,
>                                 poly_int64 offset,
> diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
> index 275dea10397..c2e28a74999 100644
> --- a/gcc/tree-ssa-alias.h
> +++ b/gcc/tree-ssa-alias.h
> @@ -111,6 +111,8 @@ ao_ref::max_size_known_p () const
>  /* In tree-ssa-alias.c  */
>  extern void ao_ref_init (ao_ref *, tree);
>  extern void ao_ref_init_from_ptr_and_size (ao_ref *, tree, tree);
> +void ao_ref_init_from_ptr_and_range (ao_ref *, tree, bool,
> +                                    poly_int64, poly_int64, poly_int64);
>  extern tree ao_ref_base (ao_ref *);
>  extern alias_set_type ao_ref_alias_set (ao_ref *);
>  extern alias_set_type ao_ref_base_alias_set (ao_ref *);
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 27287fe88ee..1fec9100011 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -40,6 +40,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify.h"
>  #include "tree-eh.h"
>  #include "cfganal.h"
> +#include "cgraph.h"
> +#include "ipa-modref-tree.h"
> +#include "ipa-modref.h"
>
>  /* This file implements dead store elimination.
>
> @@ -1039,7 +1042,8 @@ delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char *type
>     post dominates the first store, then the first store is dead.  */
>
>  static void
> -dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
> +dse_optimize_store (function *fun, gimple_stmt_iterator *gsi,
> +                   sbitmap live_bytes)
>  {
>    gimple *stmt = gsi_stmt (*gsi);
>
> @@ -1182,6 +1186,128 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
>      delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
>  }
>
> +/* Try to prove, using modref summary, that all memory written to by a call is
> +   dead and remove it.  */
> +
> +static bool
> +dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
> +{
> +  gcall *stmt = as_a <gcall *> (gsi_stmt (*gsi));
> +  tree callee = gimple_call_fndecl (stmt);
> +
> +  if (!callee)
> +    return false;
> +
> +  /* Pure/const functions are optimized by normal DCE
> +     or handled as store above.  */
> +  int flags = gimple_call_flags (stmt);
> +  if ((flags & (ECF_PURE|ECF_CONST|ECF_NOVOPS))
> +      && !(flags & (ECF_LOOPING_CONST_OR_PURE)))
> +    return false;
>
> +  cgraph_node *node = cgraph_node::get (callee);
> +  if (!node)
> +    return false;
> +
> +  if (stmt_could_throw_p (cfun, stmt)
> +      && !cfun->can_delete_dead_exceptions)
> +    return false;
> +
> +  /* If return value is used the call is not dead.  */
> +  tree lhs = gimple_call_lhs (stmt);
> +  if (lhs && TREE_CODE (lhs) == SSA_NAME)
> +    {
> +      imm_use_iterator ui;
> +      gimple *use_stmt;
> +      FOR_EACH_IMM_USE_STMT (use_stmt, ui, lhs)
> +       if (!is_gimple_debug (use_stmt))
> +         return false;
> +    }
> +
> +  /* Verify that there are no side-effects except for return value
> +     and memory writes tracked by modref.  */
> +  modref_summary *summary = get_modref_function_summary (node);
> +  if (!summary || summary->writes_errno
> +      || summary->side_effects
> +      || summary->global_memory_written_p ())
> +    return false;
> +
> +  modref_base_node <alias_set_type> *base_node;
> +  modref_ref_node <alias_set_type> *ref_node;
> +  modref_access_node *access_node;
> +  size_t i, j, k;
> +  bool by_clobber_p = false;
> +  int num_tests = 0, max_tests = param_modref_max_tests;
> +
> +  /* Unlike alias oracle we can not skip subtrees based on TBAA check.
> +     Count the size of the whole tree to verify that we will not need too many
> +     tests.  */
> +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> +    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> +      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> +       if (num_tests++ > max_tests)
> +         return false;

at least the innermost loop can be done as

          if (num_tests += ref_node->accesses.length () > max_tests)

no?

> +
> +  /* Walk all memory writes and verify that they are dead.  */
> +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> +    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> +      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> +       {
> +         /* ??? if offset is unkonwn it may be negative.  Not sure
> +            how to construct ref here.  */

I think you can't, you could use -poly_int64_max or so.

> +         if (!access_node->parm_offset_known)
> +           return false;

But you could do this check in the loop computing num_tests ...
(we could also cache the count and whether any of the refs have unknown offset
in the summary?)

> +         tree arg;
> +         if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM)
> +           arg = gimple_call_chain (stmt);
> +         else
> +           arg = gimple_call_arg (stmt, access_node->parm_index);
> +
> +         ao_ref ref;
> +         poly_offset_int off = (poly_offset_int)access_node->offset
> +               + ((poly_offset_int)access_node->parm_offset
> +                  << LOG2_BITS_PER_UNIT);
> +         poly_int64 off2;
> +         if (!off.to_shwi (&off2))
> +           return false;
> +         ao_ref_init_from_ptr_and_range
> +                (&ref, arg, true, off2, access_node->size,
> +                 access_node->max_size);
> +         ref.ref_alias_set = ref_node->ref;
> +         ref.base_alias_set = base_node->base;
> +
> +         bool byte_tracking_enabled
> +             = setup_live_bytes_from_ref (&ref, live_bytes);
> +         enum dse_store_status store_status;
> +
> +         store_status = dse_classify_store (&ref, stmt,
> +                                            byte_tracking_enabled,
> +                                            live_bytes, &by_clobber_p);
> +         if (store_status != DSE_STORE_DEAD)
> +           return false;
> +       }
> +  /* Check also value stored by the call.  */
> +  if (gimple_store_p (stmt))
> +    {
> +      ao_ref ref;
> +
> +      if (!initialize_ao_ref_for_dse (stmt, &ref))
> +       gcc_unreachable ();
> +      bool byte_tracking_enabled
> +         = setup_live_bytes_from_ref (&ref, live_bytes);
> +      enum dse_store_status store_status;
> +
> +      store_status = dse_classify_store (&ref, stmt,
> +                                        byte_tracking_enabled,
> +                                        live_bytes, &by_clobber_p);
> +      if (store_status != DSE_STORE_DEAD)
> +       return false;
> +    }
> +  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
> +  return true;
> +}
> +
>  namespace {
>
>  const pass_data pass_data_dse =
> @@ -1235,7 +1363,14 @@ pass_dse::execute (function *fun)
>           gimple *stmt = gsi_stmt (gsi);
>
>           if (gimple_vdef (stmt))
> -           dse_optimize_stmt (fun, &gsi, live_bytes);
> +           {
> +             gcall *call = dyn_cast <gcall *> (stmt);
> +
> +             if (call && dse_optimize_call (&gsi, live_bytes))
> +               /* We removed a dead call.  */;
> +             else
> +               dse_optimize_store (fun, &gsi, live_bytes);

I think we want to refactor both functions, dse_optimize_stmt has some
early outs that apply generally, and it handles some builtin calls
that we don't want to re-handle with dse_optimize_call.

So I wonder if it is either possible to call the new function from
inside dse_optimize_stmt instead, after we handled the return
value of call for example or different refactoring can make the flow
more obvious.

Thanks,
Richard.

> +           }
>           else if (def_operand_p
>                      def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
>             {
Jan Hubicka Nov. 11, 2021, 12:07 p.m. UTC | #2
> > +  /* Unlike alias oracle we can not skip subtrees based on TBAA check.
> > +     Count the size of the whole tree to verify that we will not need too many
> > +     tests.  */
> > +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> > +    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> > +      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > +       if (num_tests++ > max_tests)
> > +         return false;
> 
> at least the innermost loop can be done as
> 
>           if (num_tests += ref_node->accesses.length () > max_tests)
> 
> no?

Yep that was stupid, sorry for that ;))
> 
> > +
> > +  /* Walk all memory writes and verify that they are dead.  */
> > +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> > +    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> > +      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > +       {
> > +         /* ??? if offset is unkonwn it may be negative.  Not sure
> > +            how to construct ref here.  */
> 
> I think you can't, you could use -poly_int64_max or so.

I need a ref to give to dse_classify_store. It needs base to track live
bytes etc which is not very useful if I do not know the range.  However
DSE is still useful since I can hit free or end of lifetime of the decl.
I was wondering if I should simply implement a lightweight version of
dse_clasify_store that handles this case?
> 
> > +         if (!access_node->parm_offset_known)
> > +           return false;
> 
> But you could do this check in the loop computing num_tests ...
> (we could also cache the count and whether any of the refs have unknown offset
> in the summary?)

Yep, I plan to add cache for bits like this (and the check for accessing
global memory).  Just want to push bit more of the cleanups I have in my
local tree.
> 
> > +         tree arg;
> > +         if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM)
> > +           arg = gimple_call_chain (stmt);
> > +         else
> > +           arg = gimple_call_arg (stmt, access_node->parm_index);
> > +
> > +         ao_ref ref;
> > +         poly_offset_int off = (poly_offset_int)access_node->offset
> > +               + ((poly_offset_int)access_node->parm_offset
> > +                  << LOG2_BITS_PER_UNIT);
> > +         poly_int64 off2;
> > +         if (!off.to_shwi (&off2))
> > +           return false;
> > +         ao_ref_init_from_ptr_and_range
> > +                (&ref, arg, true, off2, access_node->size,
> > +                 access_node->max_size);
> > +         ref.ref_alias_set = ref_node->ref;
> > +         ref.base_alias_set = base_node->base;
> > +
> > +         bool byte_tracking_enabled
> > +             = setup_live_bytes_from_ref (&ref, live_bytes);
> > +         enum dse_store_status store_status;
> > +
> > +         store_status = dse_classify_store (&ref, stmt,
> > +                                            byte_tracking_enabled,
> > +                                            live_bytes, &by_clobber_p);
> > +         if (store_status != DSE_STORE_DEAD)
> > +           return false;
> > +       }
> > +  /* Check also value stored by the call.  */
> > +  if (gimple_store_p (stmt))
> > +    {
> > +      ao_ref ref;
> > +
> > +      if (!initialize_ao_ref_for_dse (stmt, &ref))
> > +       gcc_unreachable ();
> > +      bool byte_tracking_enabled
> > +         = setup_live_bytes_from_ref (&ref, live_bytes);
> > +      enum dse_store_status store_status;
> > +
> > +      store_status = dse_classify_store (&ref, stmt,
> > +                                        byte_tracking_enabled,
> > +                                        live_bytes, &by_clobber_p);
> > +      if (store_status != DSE_STORE_DEAD)
> > +       return false;
> > +    }
> > +  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
> > +  return true;
> > +}
> > +
> >  namespace {
> >
> >  const pass_data pass_data_dse =
> > @@ -1235,7 +1363,14 @@ pass_dse::execute (function *fun)
> >           gimple *stmt = gsi_stmt (gsi);
> >
> >           if (gimple_vdef (stmt))
> > -           dse_optimize_stmt (fun, &gsi, live_bytes);
> > +           {
> > +             gcall *call = dyn_cast <gcall *> (stmt);
> > +
> > +             if (call && dse_optimize_call (&gsi, live_bytes))
> > +               /* We removed a dead call.  */;
> > +             else
> > +               dse_optimize_store (fun, &gsi, live_bytes);
> 
> I think we want to refactor both functions, dse_optimize_stmt has some
> early outs that apply generally, and it handles some builtin calls
> that we don't want to re-handle with dse_optimize_call.
> 
> So I wonder if it is either possible to call the new function from
> inside dse_optimize_stmt instead, after we handled the return
> value of call for example or different refactoring can make the flow
> more obvious.

It was my initial plan. However I was not sure how much I would get from
that.

The function starts with:

  /* Don't return early on *this_2(D) ={v} {CLOBBER}.  */
  if (gimple_has_volatile_ops (stmt)
      && (!gimple_clobber_p (stmt)
          || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
    return;

  ao_ref ref;
  if (!initialize_ao_ref_for_dse (stmt, &ref))
    return;

The check about clobber does not apply to calls and then it gives up on
functions not returning aggregates (that is a common case).

For functions returing aggregates it tries to prove that retval is dead
and replace it.

I guess I can simply call my analysis from the second return above and
from the code removing dead LHS call instead of doing it from the main
walker and drop the LHS handling?

Thank you,
Honza
> 
> Thanks,
> Richard.
> 
> > +           }
> >           else if (def_operand_p
> >                      def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
> >             {
Richard Biener Nov. 11, 2021, 12:32 p.m. UTC | #3
On Thu, Nov 11, 2021 at 1:07 PM Jan Hubicka <hubicka@kam.mff.cuni.cz> wrote:
>
> > > +  /* Unlike alias oracle we can not skip subtrees based on TBAA check.
> > > +     Count the size of the whole tree to verify that we will not need too many
> > > +     tests.  */
> > > +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> > > +    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> > > +      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > > +       if (num_tests++ > max_tests)
> > > +         return false;
> >
> > at least the innermost loop can be done as
> >
> >           if (num_tests += ref_node->accesses.length () > max_tests)
> >
> > no?
>
> Yep that was stupid, sorry for that ;))
> >
> > > +
> > > +  /* Walk all memory writes and verify that they are dead.  */
> > > +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> > > +    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> > > +      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > > +       {
> > > +         /* ??? if offset is unkonwn it may be negative.  Not sure
> > > +            how to construct ref here.  */
> >
> > I think you can't, you could use -poly_int64_max or so.
>
> I need a ref to give to dse_classify_store. It needs base to track live
> bytes etc which is not very useful if I do not know the range.  However
> DSE is still useful since I can hit free or end of lifetime of the decl.
> I was wondering if I should simply implement a lightweight version of
> dse_clasify_store that handles this case?

No, I think if it turns out useful then we want a way to have such ref
represented by an ao_ref.  Note that when we come from a
ref tree we know handled-components only will increase offset,
only the base MEM_REF can contain a pointer subtraction (but
the result of that is the base then).

In what cases does parm_offset_known end up false?  Is that
when seeing a POINTER_PLUS_EXPR with unknown offset?
So yes, that's a case we cannot capture right now - the only
thing that remains is a pointer with a known points-to-set - a
similar problem as with the pure call PRE.  You could in theory
allocate a scratch SSA name and attach points-to-info
to it.  And when the call argument is &decl based then you could set
offset to zero.

> >
> > > +         if (!access_node->parm_offset_known)
> > > +           return false;
> >
> > But you could do this check in the loop computing num_tests ...
> > (we could also cache the count and whether any of the refs have unknown offset
> > in the summary?)
>
> Yep, I plan to add cache for bits like this (and the check for accessing
> global memory).  Just want to push bit more of the cleanups I have in my
> local tree.
> >
> > > +         tree arg;
> > > +         if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM)
> > > +           arg = gimple_call_chain (stmt);
> > > +         else
> > > +           arg = gimple_call_arg (stmt, access_node->parm_index);
> > > +
> > > +         ao_ref ref;
> > > +         poly_offset_int off = (poly_offset_int)access_node->offset
> > > +               + ((poly_offset_int)access_node->parm_offset
> > > +                  << LOG2_BITS_PER_UNIT);
> > > +         poly_int64 off2;
> > > +         if (!off.to_shwi (&off2))
> > > +           return false;
> > > +         ao_ref_init_from_ptr_and_range
> > > +                (&ref, arg, true, off2, access_node->size,
> > > +                 access_node->max_size);
> > > +         ref.ref_alias_set = ref_node->ref;
> > > +         ref.base_alias_set = base_node->base;
> > > +
> > > +         bool byte_tracking_enabled
> > > +             = setup_live_bytes_from_ref (&ref, live_bytes);
> > > +         enum dse_store_status store_status;
> > > +
> > > +         store_status = dse_classify_store (&ref, stmt,
> > > +                                            byte_tracking_enabled,
> > > +                                            live_bytes, &by_clobber_p);
> > > +         if (store_status != DSE_STORE_DEAD)
> > > +           return false;
> > > +       }
> > > +  /* Check also value stored by the call.  */
> > > +  if (gimple_store_p (stmt))
> > > +    {
> > > +      ao_ref ref;
> > > +
> > > +      if (!initialize_ao_ref_for_dse (stmt, &ref))
> > > +       gcc_unreachable ();
> > > +      bool byte_tracking_enabled
> > > +         = setup_live_bytes_from_ref (&ref, live_bytes);
> > > +      enum dse_store_status store_status;
> > > +
> > > +      store_status = dse_classify_store (&ref, stmt,
> > > +                                        byte_tracking_enabled,
> > > +                                        live_bytes, &by_clobber_p);
> > > +      if (store_status != DSE_STORE_DEAD)
> > > +       return false;
> > > +    }
> > > +  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
> > > +  return true;
> > > +}
> > > +
> > >  namespace {
> > >
> > >  const pass_data pass_data_dse =
> > > @@ -1235,7 +1363,14 @@ pass_dse::execute (function *fun)
> > >           gimple *stmt = gsi_stmt (gsi);
> > >
> > >           if (gimple_vdef (stmt))
> > > -           dse_optimize_stmt (fun, &gsi, live_bytes);
> > > +           {
> > > +             gcall *call = dyn_cast <gcall *> (stmt);
> > > +
> > > +             if (call && dse_optimize_call (&gsi, live_bytes))
> > > +               /* We removed a dead call.  */;
> > > +             else
> > > +               dse_optimize_store (fun, &gsi, live_bytes);
> >
> > I think we want to refactor both functions, dse_optimize_stmt has some
> > early outs that apply generally, and it handles some builtin calls
> > that we don't want to re-handle with dse_optimize_call.
> >
> > So I wonder if it is either possible to call the new function from
> > inside dse_optimize_stmt instead, after we handled the return
> > value of call for example or different refactoring can make the flow
> > more obvious.
>
> It was my initial plan. However I was not sure how much I would get from
> that.
>
> The function starts with:
>
>   /* Don't return early on *this_2(D) ={v} {CLOBBER}.  */
>   if (gimple_has_volatile_ops (stmt)
>       && (!gimple_clobber_p (stmt)
>           || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
>     return;
>
>   ao_ref ref;
>   if (!initialize_ao_ref_for_dse (stmt, &ref))
>     return;
>
> The check about clobber does not apply to calls and then it gives up on
> functions not returning aggregates (that is a common case).
>
> For functions returing aggregates it tries to prove that retval is dead
> and replace it.
>
> I guess I can simply call my analysis from the second return above and
> from the code removing dead LHS call instead of doing it from the main
> walker and drop the LHS handling?

Yeah, something like that.

Richard.

> Thank you,
> Honza
> >
> > Thanks,
> > Richard.
> >
> > > +           }
> > >           else if (def_operand_p
> > >                      def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
> > >             {
Jan Hubicka Nov. 11, 2021, 12:42 p.m. UTC | #4
Hi,
> 
> No, I think if it turns out useful then we want a way to have such ref
> represented by an ao_ref.  Note that when we come from a
> ref tree we know handled-components only will increase offset,
> only the base MEM_REF can contain a pointer subtraction (but
> the result of that is the base then).

Yep, that is why I introduced the parm_offset at first place - it can be
negative or unknown...
> 
> In what cases does parm_offset_known end up false?  Is that
> when seeing a POINTER_PLUS_EXPR with unknown offset?

Yep, a typical example is a loop with pointer walking an array .

> So yes, that's a case we cannot capture right now - the only
> thing that remains is a pointer with a known points-to-set - a
> similar problem as with the pure call PRE.  You could in theory
> allocate a scratch SSA name and attach points-to-info
> to it.  And when the call argument is &decl based then you could set
> offset to zero.

Hmm, I could try to do this, but possibly incrementally?

Basically I want to have

foo (&decl)
decl = {}

To be matched since even if I do not know the offset I know it is dead
after end of lifetime of the decl.  I am not quite sure PTA will give me
that?
> > It was my initial plan. However I was not sure how much I would get from
> > that.
> >
> > The function starts with:
> >
> >   /* Don't return early on *this_2(D) ={v} {CLOBBER}.  */
> >   if (gimple_has_volatile_ops (stmt)
> >       && (!gimple_clobber_p (stmt)
> >           || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
> >     return;
> >
> >   ao_ref ref;
> >   if (!initialize_ao_ref_for_dse (stmt, &ref))
> >     return;
> >
> > The check about clobber does not apply to calls and then it gives up on
> > functions not returning aggregates (that is a common case).
> >
> > For functions returing aggregates it tries to prove that retval is dead
> > and replace it.
> >
> > I guess I can simply call my analysis from the second return above and
> > from the code removing dead LHS call instead of doing it from the main
> > walker and drop the LHS handling?
> 
> Yeah, something like that.
OK, I will prepare updated patch, thanks!

Honza
> 
> Richard.
> 
> > Thank you,
> > Honza
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > +           }
> > > >           else if (def_operand_p
> > > >                      def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
> > > >             {
Richard Biener Nov. 11, 2021, 1:20 p.m. UTC | #5
On Thu, Nov 11, 2021 at 1:42 PM Jan Hubicka <hubicka@kam.mff.cuni.cz> wrote:
>
> Hi,
> >
> > No, I think if it turns out useful then we want a way to have such ref
> > represented by an ao_ref.  Note that when we come from a
> > ref tree we know handled-components only will increase offset,
> > only the base MEM_REF can contain a pointer subtraction (but
> > the result of that is the base then).
>
> Yep, that is why I introduced the parm_offset at first place - it can be
> negative or unknown...
> >
> > In what cases does parm_offset_known end up false?  Is that
> > when seeing a POINTER_PLUS_EXPR with unknown offset?
>
> Yep, a typical example is a loop with pointer walking an array .
>
> > So yes, that's a case we cannot capture right now - the only
> > thing that remains is a pointer with a known points-to-set - a
> > similar problem as with the pure call PRE.  You could in theory
> > allocate a scratch SSA name and attach points-to-info
> > to it.  And when the call argument is &decl based then you could set
> > offset to zero.
>
> Hmm, I could try to do this, but possibly incrementally?

You mean handle a &decl argument specially for unknown param offset?
Yeah, I guess so.

> Basically I want to have
>
> foo (&decl)
> decl = {}
>
> To be matched since even if I do not know the offset I know it is dead
> after end of lifetime of the decl.  I am not quite sure PTA will give me
> that?

for this case PTA should tell you the alias is to 'decl' only but then I'm
not sure if stmt_kills_ref_p is up to the task to determine that 'decl = {}',
from a quick look it doesn't.  So indeed the only interesting case will
be a &decl based parameter which we can special-case.

> > > It was my initial plan. However I was not sure how much I would get from
> > > that.
> > >
> > > The function starts with:
> > >
> > >   /* Don't return early on *this_2(D) ={v} {CLOBBER}.  */
> > >   if (gimple_has_volatile_ops (stmt)
> > >       && (!gimple_clobber_p (stmt)
> > >           || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
> > >     return;
> > >
> > >   ao_ref ref;
> > >   if (!initialize_ao_ref_for_dse (stmt, &ref))
> > >     return;
> > >
> > > The check about clobber does not apply to calls and then it gives up on
> > > functions not returning aggregates (that is a common case).
> > >
> > > For functions returing aggregates it tries to prove that retval is dead
> > > and replace it.
> > >
> > > I guess I can simply call my analysis from the second return above and
> > > from the code removing dead LHS call instead of doing it from the main
> > > walker and drop the LHS handling?
> >
> > Yeah, something like that.
> OK, I will prepare updated patch, thanks!
>
> Honza
> >
> > Richard.
> >
> > > Thank you,
> > > Honza
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > +           }
> > > > >           else if (def_operand_p
> > > > >                      def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
> > > > >             {
Jan Hubicka Nov. 11, 2021, 1:25 p.m. UTC | #6
> > Hmm, I could try to do this, but possibly incrementally?
> 
> You mean handle a &decl argument specially for unknown param offset?
> Yeah, I guess so.

I think it is also pointer that was allocated and is going to be
freed...
> 
> > Basically I want to have
> >
> > foo (&decl)
> > decl = {}
> >
> > To be matched since even if I do not know the offset I know it is dead
> > after end of lifetime of the decl.  I am not quite sure PTA will give me
> > that?
> 
> for this case PTA should tell you the alias is to 'decl' only but then I'm
> not sure if stmt_kills_ref_p is up to the task to determine that 'decl = {}',
> from a quick look it doesn't.  So indeed the only interesting case will
> be a &decl based parameter which we can special-case.

Yep, i do not think it understands this.  I will look into it - I guess
it is common enough to care about.

Honza
Jan Hubicka Nov. 12, 2021, 11:39 a.m. UTC | #7
Hi,
this is updated patch.  It moves the summary walk checking if we can
possibly suceed on dse to summary->finalize member function so it is done
once per summary and refactors dse_optimize_call to be called from
dse_optimize_stmt after early checks.

I did not try to handle the special case of parm_offset_known but we can
do it incrementally.  I think initializing range with offset being
polyin64_int_min and max_size unkonwn as suggested is going to work.
I am bit worried this is in bits so we have 2^61 range instead of 2^64
but I guess once can not offset pointer back and forth in valid program?

Bootstrapped/regtested x86_64-linux, ltobootstrap in progress, OK if it
succeeds?

gcc/ChangeLog:

	* ipa-modref.c (modref_summary::modref_summary): Clear new flags.
	(modref_summary::dump): Dump try_dse.
	(modref_summary::finalize): Add FUN attribute; compute try-dse.
	(analyze_function): Update.
	(read_section): Update.
	(update_signature): Update.
	(pass_ipa_modref::execute): Update.
	* ipa-modref.h (struct modref_summary):
	* tree-ssa-alias.c (ao_ref_init_from_ptr_and_range): Export.
	* tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare.
	* tree-ssa-dse.c: Include cgraph.h, ipa-modref-tree.h and
	ipa-modref.h
	(dse_optimize_call): New function.
	(dse_optimize_stmt): Use it.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/modref-dse-1.c: New test.
	* gcc.dg/tree-ssa/modref-dse-2.c: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index c6efacb0e20..ea6a27ae767 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -276,7 +276,8 @@ static GTY(()) fast_function_summary <modref_summary_lto *, va_gc>
 
 modref_summary::modref_summary ()
   : loads (NULL), stores (NULL), retslot_flags (0), static_chain_flags (0),
-    writes_errno (false), side_effects (false)
+    writes_errno (false), side_effects (false), global_memory_read (false),
+    global_memory_written (false), try_dse (false)
 {
 }
 
@@ -605,6 +606,8 @@ modref_summary::dump (FILE *out)
     fprintf (out, "  Global memory read\n");
   if (global_memory_written)
     fprintf (out, "  Global memory written\n");
+  if (try_dse)
+    fprintf (out, "  Try dse\n");
   if (arg_flags.length ())
     {
       for (unsigned int i = 0; i < arg_flags.length (); i++)
@@ -661,12 +664,56 @@ modref_summary_lto::dump (FILE *out)
 }
 
 /* Called after summary is produced and before it is used by local analysis.
-   Can be called multiple times in case summary needs to update signature.  */
+   Can be called multiple times in case summary needs to update signature.
+   FUN is decl of function summary is attached to.  */
 void
-modref_summary::finalize ()
+modref_summary::finalize (tree fun)
 {
   global_memory_read = !loads || loads->global_access_p ();
   global_memory_written = !stores || stores->global_access_p ();
+
+  /* We can do DSE if we know function has no side effects and
+     we can analyse all stores.  Disable dse if there are too many
+     stores to try.  */
+  if (side_effects || global_memory_written || writes_errno)
+    try_dse = false;
+  else
+    {
+      try_dse = true;
+      size_t i, j, k;
+      int num_tests = 0, max_tests
+       	= opt_for_fn (fun, param_modref_max_tests);
+      modref_base_node <alias_set_type> *base_node;
+      modref_ref_node <alias_set_type> *ref_node;
+      modref_access_node *access_node;
+      FOR_EACH_VEC_SAFE_ELT (stores->bases, i, base_node)
+	{
+	  if (base_node->every_ref)
+	    {
+	      try_dse = false;
+	      break;
+	    }
+	  FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
+	    {
+	      if (base_node->every_ref)
+		{
+		  try_dse = false;
+		  break;
+		}
+	      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
+		if (num_tests++ > max_tests
+		    || !access_node->parm_offset_known)
+		  {
+		    try_dse = false;
+		    break;
+		  }
+	      if (!try_dse)
+		break;
+	    }
+	  if (!try_dse)
+	    break;
+	}
+    }
 }
 
 /* Get function summary for FUNC if it exists, return NULL otherwise.  */
@@ -2803,7 +2850,7 @@ analyze_function (function *f, bool ipa)
       summary = NULL;
     }
   else if (summary)
-    summary->finalize ();
+    summary->finalize (current_function_decl);
   if (summary_lto && !summary_lto->useful_p (ecf_flags))
     {
       summaries_lto->remove (fnode);
@@ -3524,7 +3571,7 @@ read_section (struct lto_file_decl_data *file_data, const char *data,
 	    }
 	}
       if (flag_ltrans)
-	modref_sum->finalize ();
+	modref_sum->finalize (node->decl);
       if (dump_file)
 	{
 	  fprintf (dump_file, "Read modref for %s\n",
@@ -3682,7 +3729,7 @@ update_signature (struct cgraph_node *node)
 	r_lto->dump (dump_file);
     }
   if (r)
-    r->finalize ();
+    r->finalize (node->decl);
   return;
 }
 
@@ -4907,7 +4954,7 @@ pass_ipa_modref::execute (function *)
 	for (struct cgraph_node *cur = component_node; cur;
 	     cur = ((struct ipa_dfs_info *) cur->aux)->next_cycle)
 	  if (modref_summary *sum = optimization_summaries->get (cur))
-	    sum->finalize ();
+	    sum->finalize (cur->decl);
       if (dump_file)
 	modref_propagate_dump_scc (component_node);
     }
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index 9a8d565d770..43353ad2ef4 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -38,13 +38,14 @@ struct GTY(()) modref_summary
   /* Flags coputed by finalize method.  */
   unsigned global_memory_read : 1;
   unsigned global_memory_written : 1;
+  unsigned try_dse : 1;
 
 
   modref_summary ();
   ~modref_summary ();
   void dump (FILE *);
   bool useful_p (int ecf_flags, bool check_flags = true);
-  void finalize ();
+  void finalize (tree);
 };
 
 modref_summary *get_modref_function_summary (cgraph_node *func);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
new file mode 100644
index 00000000000..1f80cc57c57
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1-details"  } */
+volatile int *ptr;
+struct a {
+	int a,b,c;
+} a;
+__attribute__((noinline))
+static int init (struct a*a)
+{
+	a->a=0;
+	a->b=1;
+}
+__attribute__((noinline))
+static int use (struct a*a)
+{
+	if (a->c != 3)
+		*ptr=5;
+}
+
+void
+main(void)
+{
+	struct a a;
+	init (&a);
+	a.c=3;
+	use (&a);
+}
+/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
new file mode 100644
index 00000000000..e41d065a5e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse2-details"  } */
+volatile int *ptr;
+struct a {
+	int a,b,c;
+} a;
+__attribute__((noinline))
+static int init (struct a*a)
+{
+	a->a=0;
+	a->b=1;
+	a->c=1;
+}
+__attribute__((noinline))
+static int use (struct a*a)
+{
+	if (a->c != 3)
+		*ptr=5;
+}
+
+void
+main(void)
+{
+	struct a a;
+	init (&a);
+	a.c=3;
+	use (&a);
+}
+/* Only DSE2 is tracking live bytes needed to figure out that store to c is
+   also dead above.  */
+/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse2" } } */
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 17ff6bb582c..2965902912f 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -782,7 +782,7 @@ ao_ref_alias_ptr_type (ao_ref *ref)
    The access is assumed to be only to or after of the pointer target adjusted
    by the offset, not before it (even in the case RANGE_KNOWN is false).  */
 
-static void
+void
 ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
 				bool range_known,
 				poly_int64 offset,
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index 275dea10397..64d4865f58d 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -111,6 +111,9 @@ ao_ref::max_size_known_p () const
 /* In tree-ssa-alias.c  */
 extern void ao_ref_init (ao_ref *, tree);
 extern void ao_ref_init_from_ptr_and_size (ao_ref *, tree, tree);
+extern void ao_ref_init_from_ptr_and_range (ao_ref *, tree, bool,
+					    poly_int64, poly_int64,
+					    poly_int64);
 extern tree ao_ref_base (ao_ref *);
 extern alias_set_type ao_ref_alias_set (ao_ref *);
 extern alias_set_type ao_ref_base_alias_set (ao_ref *);
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 27287fe88ee..0e8c4ed1435 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -40,6 +40,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "tree-eh.h"
 #include "cfganal.h"
+#include "cgraph.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 /* This file implements dead store elimination.
 
@@ -1027,6 +1030,101 @@ delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char *type
   release_defs (stmt);
 }
 
+/* Try to prove, using modref summary, that all memory written to by a call is
+   dead and remove it.  Assume that if return value is written to memory
+   it is already proved to be dead.  */
+
+static bool
+dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
+{
+  gcall *stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
+
+  if (!stmt)
+    return false;
+
+  tree callee = gimple_call_fndecl (stmt);
+
+  if (!callee)
+    return false;
+
+  /* Pure/const functions are optimized by normal DCE
+     or handled as store above.  */
+  int flags = gimple_call_flags (stmt);
+  if ((flags & (ECF_PURE|ECF_CONST|ECF_NOVOPS))
+      && !(flags & (ECF_LOOPING_CONST_OR_PURE)))
+    return false;
+
+  cgraph_node *node = cgraph_node::get (callee);
+  if (!node)
+    return false;
+
+  if (stmt_could_throw_p (cfun, stmt)
+      && !cfun->can_delete_dead_exceptions)
+    return false;
+
+  /* If return value is used the call is not dead.  */
+  tree lhs = gimple_call_lhs (stmt);
+  if (lhs && TREE_CODE (lhs) == SSA_NAME)
+    {
+      imm_use_iterator ui;
+      gimple *use_stmt;
+      FOR_EACH_IMM_USE_STMT (use_stmt, ui, lhs)
+	if (!is_gimple_debug (use_stmt))
+	  return false;
+    }
+
+  /* Verify that there are no side-effects except for return value
+     and memory writes tracked by modref.  */
+  modref_summary *summary = get_modref_function_summary (node);
+  if (!summary || !summary->try_dse)
+    return false;
+
+  modref_base_node <alias_set_type> *base_node;
+  modref_ref_node <alias_set_type> *ref_node;
+  modref_access_node *access_node;
+  size_t i, j, k;
+  bool by_clobber_p = false;
+
+  /* Walk all memory writes and verify that they are dead.  */
+  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
+    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
+      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
+	{
+	  gcc_checking_assert (access_node->parm_offset_known);
+
+	  tree arg;
+	  if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM)
+	    arg = gimple_call_chain (stmt);
+	  else
+	    arg = gimple_call_arg (stmt, access_node->parm_index);
+
+	  ao_ref ref;
+	  poly_offset_int off = (poly_offset_int)access_node->offset
+		+ ((poly_offset_int)access_node->parm_offset
+		   << LOG2_BITS_PER_UNIT);
+	  poly_int64 off2;
+	  if (!off.to_shwi (&off2))
+	    return false;
+	  ao_ref_init_from_ptr_and_range
+		 (&ref, arg, true, off2, access_node->size,
+		  access_node->max_size);
+	  ref.ref_alias_set = ref_node->ref;
+	  ref.base_alias_set = base_node->base;
+
+	  bool byte_tracking_enabled
+	      = setup_live_bytes_from_ref (&ref, live_bytes);
+	  enum dse_store_status store_status;
+
+	  store_status = dse_classify_store (&ref, stmt,
+					     byte_tracking_enabled,
+					     live_bytes, &by_clobber_p);
+	  if (store_status != DSE_STORE_DEAD)
+	    return false;
+	}
+  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
+  return true;
+}
+
 /* Attempt to eliminate dead stores in the statement referenced by BSI.
 
    A dead store is a store into a memory location which will later be
@@ -1050,8 +1148,13 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
     return;
 
   ao_ref ref;
+  /* If this is not a store we can still remove dead call using
+     modref summary.  */
   if (!initialize_ao_ref_for_dse (stmt, &ref))
-    return;
+    {
+      dse_optimize_call (gsi, live_bytes);
+      return;
+    }
 
   /* We know we have virtual definitions.  We can handle assignments and
      some builtin calls.  */
@@ -1162,6 +1265,9 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
 	  || (stmt_could_throw_p (fun, stmt)
 	      && !fun->can_delete_dead_exceptions)))
     {
+      /* See if we can remove complete call.  */
+      if (dse_optimize_call (gsi, live_bytes))
+	return;
       /* Make sure we do not remove a return slot we cannot reconstruct
 	 later.  */
       if (gimple_call_return_slot_opt_p (as_a <gcall *>(stmt))
Richard Biener Nov. 12, 2021, 12:53 p.m. UTC | #8
On Fri, Nov 12, 2021 at 12:39 PM Jan Hubicka <hubicka@kam.mff.cuni.cz> wrote:
>
> Hi,
> this is updated patch.  It moves the summary walk checking if we can
> possibly suceed on dse to summary->finalize member function so it is done
> once per summary and refactors dse_optimize_call to be called from
> dse_optimize_stmt after early checks.
>
> I did not try to handle the special case of parm_offset_known but we can
> do it incrementally.  I think initializing range with offset being
> polyin64_int_min and max_size unkonwn as suggested is going to work.
> I am bit worried this is in bits so we have 2^61 range instead of 2^64
> but I guess once can not offset pointer back and forth in valid program?

Not sure indeed.  I'd only special-case when the call argument is
&decl, then the start offset can be simply zero.

> Bootstrapped/regtested x86_64-linux, ltobootstrap in progress, OK if it
> succeeds?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * ipa-modref.c (modref_summary::modref_summary): Clear new flags.
>         (modref_summary::dump): Dump try_dse.
>         (modref_summary::finalize): Add FUN attribute; compute try-dse.
>         (analyze_function): Update.
>         (read_section): Update.
>         (update_signature): Update.
>         (pass_ipa_modref::execute): Update.
>         * ipa-modref.h (struct modref_summary):
>         * tree-ssa-alias.c (ao_ref_init_from_ptr_and_range): Export.
>         * tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare.
>         * tree-ssa-dse.c: Include cgraph.h, ipa-modref-tree.h and
>         ipa-modref.h
>         (dse_optimize_call): New function.
>         (dse_optimize_stmt): Use it.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/modref-dse-1.c: New test.
>         * gcc.dg/tree-ssa/modref-dse-2.c: New test.
>
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index c6efacb0e20..ea6a27ae767 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -276,7 +276,8 @@ static GTY(()) fast_function_summary <modref_summary_lto *, va_gc>
>
>  modref_summary::modref_summary ()
>    : loads (NULL), stores (NULL), retslot_flags (0), static_chain_flags (0),
> -    writes_errno (false), side_effects (false)
> +    writes_errno (false), side_effects (false), global_memory_read (false),
> +    global_memory_written (false), try_dse (false)
>  {
>  }
>
> @@ -605,6 +606,8 @@ modref_summary::dump (FILE *out)
>      fprintf (out, "  Global memory read\n");
>    if (global_memory_written)
>      fprintf (out, "  Global memory written\n");
> +  if (try_dse)
> +    fprintf (out, "  Try dse\n");
>    if (arg_flags.length ())
>      {
>        for (unsigned int i = 0; i < arg_flags.length (); i++)
> @@ -661,12 +664,56 @@ modref_summary_lto::dump (FILE *out)
>  }
>
>  /* Called after summary is produced and before it is used by local analysis.
> -   Can be called multiple times in case summary needs to update signature.  */
> +   Can be called multiple times in case summary needs to update signature.
> +   FUN is decl of function summary is attached to.  */
>  void
> -modref_summary::finalize ()
> +modref_summary::finalize (tree fun)
>  {
>    global_memory_read = !loads || loads->global_access_p ();
>    global_memory_written = !stores || stores->global_access_p ();
> +
> +  /* We can do DSE if we know function has no side effects and
> +     we can analyse all stores.  Disable dse if there are too many
> +     stores to try.  */
> +  if (side_effects || global_memory_written || writes_errno)
> +    try_dse = false;
> +  else
> +    {
> +      try_dse = true;
> +      size_t i, j, k;
> +      int num_tests = 0, max_tests
> +               = opt_for_fn (fun, param_modref_max_tests);
> +      modref_base_node <alias_set_type> *base_node;
> +      modref_ref_node <alias_set_type> *ref_node;
> +      modref_access_node *access_node;
> +      FOR_EACH_VEC_SAFE_ELT (stores->bases, i, base_node)
> +       {
> +         if (base_node->every_ref)
> +           {
> +             try_dse = false;
> +             break;
> +           }
> +         FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> +           {
> +             if (base_node->every_ref)
> +               {
> +                 try_dse = false;
> +                 break;
> +               }
> +             FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> +               if (num_tests++ > max_tests
> +                   || !access_node->parm_offset_known)
> +                 {
> +                   try_dse = false;
> +                   break;
> +                 }
> +             if (!try_dse)
> +               break;
> +           }
> +         if (!try_dse)
> +           break;
> +       }
> +    }
>  }
>
>  /* Get function summary for FUNC if it exists, return NULL otherwise.  */
> @@ -2803,7 +2850,7 @@ analyze_function (function *f, bool ipa)
>        summary = NULL;
>      }
>    else if (summary)
> -    summary->finalize ();
> +    summary->finalize (current_function_decl);
>    if (summary_lto && !summary_lto->useful_p (ecf_flags))
>      {
>        summaries_lto->remove (fnode);
> @@ -3524,7 +3571,7 @@ read_section (struct lto_file_decl_data *file_data, const char *data,
>             }
>         }
>        if (flag_ltrans)
> -       modref_sum->finalize ();
> +       modref_sum->finalize (node->decl);
>        if (dump_file)
>         {
>           fprintf (dump_file, "Read modref for %s\n",
> @@ -3682,7 +3729,7 @@ update_signature (struct cgraph_node *node)
>         r_lto->dump (dump_file);
>      }
>    if (r)
> -    r->finalize ();
> +    r->finalize (node->decl);
>    return;
>  }
>
> @@ -4907,7 +4954,7 @@ pass_ipa_modref::execute (function *)
>         for (struct cgraph_node *cur = component_node; cur;
>              cur = ((struct ipa_dfs_info *) cur->aux)->next_cycle)
>           if (modref_summary *sum = optimization_summaries->get (cur))
> -           sum->finalize ();
> +           sum->finalize (cur->decl);
>        if (dump_file)
>         modref_propagate_dump_scc (component_node);
>      }
> diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
> index 9a8d565d770..43353ad2ef4 100644
> --- a/gcc/ipa-modref.h
> +++ b/gcc/ipa-modref.h
> @@ -38,13 +38,14 @@ struct GTY(()) modref_summary
>    /* Flags coputed by finalize method.  */
>    unsigned global_memory_read : 1;
>    unsigned global_memory_written : 1;
> +  unsigned try_dse : 1;
>
>
>    modref_summary ();
>    ~modref_summary ();
>    void dump (FILE *);
>    bool useful_p (int ecf_flags, bool check_flags = true);
> -  void finalize ();
> +  void finalize (tree);
>  };
>
>  modref_summary *get_modref_function_summary (cgraph_node *func);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
> new file mode 100644
> index 00000000000..1f80cc57c57
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1-details"  } */
> +volatile int *ptr;
> +struct a {
> +       int a,b,c;
> +} a;
> +__attribute__((noinline))
> +static int init (struct a*a)
> +{
> +       a->a=0;
> +       a->b=1;
> +}
> +__attribute__((noinline))
> +static int use (struct a*a)
> +{
> +       if (a->c != 3)
> +               *ptr=5;
> +}
> +
> +void
> +main(void)
> +{
> +       struct a a;
> +       init (&a);
> +       a.c=3;
> +       use (&a);
> +}
> +/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
> new file mode 100644
> index 00000000000..e41d065a5e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse2-details"  } */
> +volatile int *ptr;
> +struct a {
> +       int a,b,c;
> +} a;
> +__attribute__((noinline))
> +static int init (struct a*a)
> +{
> +       a->a=0;
> +       a->b=1;
> +       a->c=1;
> +}
> +__attribute__((noinline))
> +static int use (struct a*a)
> +{
> +       if (a->c != 3)
> +               *ptr=5;
> +}
> +
> +void
> +main(void)
> +{
> +       struct a a;
> +       init (&a);
> +       a.c=3;
> +       use (&a);
> +}
> +/* Only DSE2 is tracking live bytes needed to figure out that store to c is
> +   also dead above.  */
> +/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse2" } } */
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index 17ff6bb582c..2965902912f 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -782,7 +782,7 @@ ao_ref_alias_ptr_type (ao_ref *ref)
>     The access is assumed to be only to or after of the pointer target adjusted
>     by the offset, not before it (even in the case RANGE_KNOWN is false).  */
>
> -static void
> +void
>  ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
>                                 bool range_known,
>                                 poly_int64 offset,
> diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
> index 275dea10397..64d4865f58d 100644
> --- a/gcc/tree-ssa-alias.h
> +++ b/gcc/tree-ssa-alias.h
> @@ -111,6 +111,9 @@ ao_ref::max_size_known_p () const
>  /* In tree-ssa-alias.c  */
>  extern void ao_ref_init (ao_ref *, tree);
>  extern void ao_ref_init_from_ptr_and_size (ao_ref *, tree, tree);
> +extern void ao_ref_init_from_ptr_and_range (ao_ref *, tree, bool,
> +                                           poly_int64, poly_int64,
> +                                           poly_int64);
>  extern tree ao_ref_base (ao_ref *);
>  extern alias_set_type ao_ref_alias_set (ao_ref *);
>  extern alias_set_type ao_ref_base_alias_set (ao_ref *);
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 27287fe88ee..0e8c4ed1435 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -40,6 +40,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify.h"
>  #include "tree-eh.h"
>  #include "cfganal.h"
> +#include "cgraph.h"
> +#include "ipa-modref-tree.h"
> +#include "ipa-modref.h"
>
>  /* This file implements dead store elimination.
>
> @@ -1027,6 +1030,101 @@ delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char *type
>    release_defs (stmt);
>  }
>
> +/* Try to prove, using modref summary, that all memory written to by a call is
> +   dead and remove it.  Assume that if return value is written to memory
> +   it is already proved to be dead.  */
> +
> +static bool
> +dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
> +{
> +  gcall *stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
> +
> +  if (!stmt)
> +    return false;
> +
> +  tree callee = gimple_call_fndecl (stmt);
> +
> +  if (!callee)
> +    return false;
> +
> +  /* Pure/const functions are optimized by normal DCE
> +     or handled as store above.  */
> +  int flags = gimple_call_flags (stmt);
> +  if ((flags & (ECF_PURE|ECF_CONST|ECF_NOVOPS))
> +      && !(flags & (ECF_LOOPING_CONST_OR_PURE)))
> +    return false;
> +
> +  cgraph_node *node = cgraph_node::get (callee);
> +  if (!node)
> +    return false;
> +
> +  if (stmt_could_throw_p (cfun, stmt)
> +      && !cfun->can_delete_dead_exceptions)
> +    return false;
> +
> +  /* If return value is used the call is not dead.  */
> +  tree lhs = gimple_call_lhs (stmt);
> +  if (lhs && TREE_CODE (lhs) == SSA_NAME)
> +    {
> +      imm_use_iterator ui;
> +      gimple *use_stmt;
> +      FOR_EACH_IMM_USE_STMT (use_stmt, ui, lhs)
> +       if (!is_gimple_debug (use_stmt))
> +         return false;
> +    }
> +
> +  /* Verify that there are no side-effects except for return value
> +     and memory writes tracked by modref.  */
> +  modref_summary *summary = get_modref_function_summary (node);
> +  if (!summary || !summary->try_dse)
> +    return false;
> +
> +  modref_base_node <alias_set_type> *base_node;
> +  modref_ref_node <alias_set_type> *ref_node;
> +  modref_access_node *access_node;
> +  size_t i, j, k;
> +  bool by_clobber_p = false;
> +
> +  /* Walk all memory writes and verify that they are dead.  */
> +  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
> +    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> +      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> +       {
> +         gcc_checking_assert (access_node->parm_offset_known);
> +
> +         tree arg;
> +         if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM)
> +           arg = gimple_call_chain (stmt);
> +         else
> +           arg = gimple_call_arg (stmt, access_node->parm_index);
> +
> +         ao_ref ref;
> +         poly_offset_int off = (poly_offset_int)access_node->offset
> +               + ((poly_offset_int)access_node->parm_offset
> +                  << LOG2_BITS_PER_UNIT);
> +         poly_int64 off2;
> +         if (!off.to_shwi (&off2))
> +           return false;
> +         ao_ref_init_from_ptr_and_range
> +                (&ref, arg, true, off2, access_node->size,
> +                 access_node->max_size);
> +         ref.ref_alias_set = ref_node->ref;
> +         ref.base_alias_set = base_node->base;
> +
> +         bool byte_tracking_enabled
> +             = setup_live_bytes_from_ref (&ref, live_bytes);
> +         enum dse_store_status store_status;
> +
> +         store_status = dse_classify_store (&ref, stmt,
> +                                            byte_tracking_enabled,
> +                                            live_bytes, &by_clobber_p);
> +         if (store_status != DSE_STORE_DEAD)
> +           return false;
> +       }
> +  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
> +  return true;
> +}
> +
>  /* Attempt to eliminate dead stores in the statement referenced by BSI.
>
>     A dead store is a store into a memory location which will later be
> @@ -1050,8 +1148,13 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
>      return;
>
>    ao_ref ref;
> +  /* If this is not a store we can still remove dead call using
> +     modref summary.  */
>    if (!initialize_ao_ref_for_dse (stmt, &ref))
> -    return;
> +    {
> +      dse_optimize_call (gsi, live_bytes);
> +      return;
> +    }
>
>    /* We know we have virtual definitions.  We can handle assignments and
>       some builtin calls.  */
> @@ -1162,6 +1265,9 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
>           || (stmt_could_throw_p (fun, stmt)
>               && !fun->can_delete_dead_exceptions)))
>      {
> +      /* See if we can remove complete call.  */
> +      if (dse_optimize_call (gsi, live_bytes))
> +       return;
>        /* Make sure we do not remove a return slot we cannot reconstruct
>          later.  */
>        if (gimple_call_return_slot_opt_p (as_a <gcall *>(stmt))
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
new file mode 100644
index 00000000000..e78693b349a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
@@ -0,0 +1,28 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1"  } */
+volatile int *ptr;
+struct a {
+	int a,b,c;
+} a;
+__attribute__((noinline))
+static int init (struct a*a)
+{
+	a->a=0;
+	a->b=1;
+}
+__attribute__((noinline))
+static int use (struct a*a)
+{
+	if (a->c != 3)
+		*ptr=5;
+}
+
+void
+main(void)
+{
+	struct a a;
+	init (&a);
+	a.c=3;
+	use (&a);
+}
+/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
new file mode 100644
index 00000000000..99c8ceb8127
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse2 -fno-ipa-sra -fno-ipa-cp"  } */
+volatile int *ptr;
+struct a {
+	int a,b,c;
+} a;
+__attribute__((noinline))
+static int init (struct a*a)
+{
+	a->a=0;
+	a->b=1;
+	a->c=1;
+}
+__attribute__((noinline))
+static int use (struct a*a)
+{
+	if (a->c != 3)
+		*ptr=5;
+}
+
+void
+main(void)
+{
+	struct a a;
+	init (&a);
+	a.c=3;
+	use (&a);
+}
+/* Only DSE2 is tracking live bytes needed to figure out that store to c is
+   also dead above.  */
+/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse2" } } */
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index eabf6805f2b..affb5d40d4b 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -782,7 +782,7 @@  ao_ref_alias_ptr_type (ao_ref *ref)
    The access is assumed to be only to or after of the pointer target adjusted
    by the offset, not before it (even in the case RANGE_KNOWN is false).  */
 
-static void
+void
 ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
 				bool range_known,
 				poly_int64 offset,
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index 275dea10397..c2e28a74999 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -111,6 +111,8 @@  ao_ref::max_size_known_p () const
 /* In tree-ssa-alias.c  */
 extern void ao_ref_init (ao_ref *, tree);
 extern void ao_ref_init_from_ptr_and_size (ao_ref *, tree, tree);
+void ao_ref_init_from_ptr_and_range (ao_ref *, tree, bool,
+				     poly_int64, poly_int64, poly_int64);
 extern tree ao_ref_base (ao_ref *);
 extern alias_set_type ao_ref_alias_set (ao_ref *);
 extern alias_set_type ao_ref_base_alias_set (ao_ref *);
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 27287fe88ee..1fec9100011 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -40,6 +40,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "tree-eh.h"
 #include "cfganal.h"
+#include "cgraph.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 /* This file implements dead store elimination.
 
@@ -1039,7 +1042,8 @@  delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char *type
    post dominates the first store, then the first store is dead.  */
 
 static void
-dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
+dse_optimize_store (function *fun, gimple_stmt_iterator *gsi,
+		    sbitmap live_bytes)
 {
   gimple *stmt = gsi_stmt (*gsi);
 
@@ -1182,6 +1186,128 @@  dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
     delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
 }
 
+/* Try to prove, using modref summary, that all memory written to by a call is
+   dead and remove it.  */
+
+static bool
+dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
+{
+  gcall *stmt = as_a <gcall *> (gsi_stmt (*gsi));
+  tree callee = gimple_call_fndecl (stmt);
+
+  if (!callee)
+    return false;
+
+  /* Pure/const functions are optimized by normal DCE
+     or handled as store above.  */
+  int flags = gimple_call_flags (stmt);
+  if ((flags & (ECF_PURE|ECF_CONST|ECF_NOVOPS))
+      && !(flags & (ECF_LOOPING_CONST_OR_PURE)))
+    return false;
+
+  cgraph_node *node = cgraph_node::get (callee);
+  if (!node)
+    return false;
+
+  if (stmt_could_throw_p (cfun, stmt)
+      && !cfun->can_delete_dead_exceptions)
+    return false;
+
+  /* If return value is used the call is not dead.  */
+  tree lhs = gimple_call_lhs (stmt);
+  if (lhs && TREE_CODE (lhs) == SSA_NAME)
+    {
+      imm_use_iterator ui;
+      gimple *use_stmt;
+      FOR_EACH_IMM_USE_STMT (use_stmt, ui, lhs)
+	if (!is_gimple_debug (use_stmt))
+	  return false;
+    }
+
+  /* Verify that there are no side-effects except for return value
+     and memory writes tracked by modref.  */
+  modref_summary *summary = get_modref_function_summary (node);
+  if (!summary || summary->writes_errno
+      || summary->side_effects
+      || summary->global_memory_written_p ())
+    return false;
+
+  modref_base_node <alias_set_type> *base_node;
+  modref_ref_node <alias_set_type> *ref_node;
+  modref_access_node *access_node;
+  size_t i, j, k;
+  bool by_clobber_p = false;
+  int num_tests = 0, max_tests = param_modref_max_tests;
+
+  /* Unlike alias oracle we can not skip subtrees based on TBAA check.
+     Count the size of the whole tree to verify that we will not need too many
+     tests.  */
+  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
+    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
+      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
+	if (num_tests++ > max_tests)
+	  return false;
+
+  /* Walk all memory writes and verify that they are dead.  */
+  FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node)
+    FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
+      FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
+	{
+	  /* ??? if offset is unkonwn it may be negative.  Not sure
+	     how to construct ref here.  */
+	  if (!access_node->parm_offset_known)
+	    return false;
+
+	  tree arg;
+	  if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM)
+	    arg = gimple_call_chain (stmt);
+	  else
+	    arg = gimple_call_arg (stmt, access_node->parm_index);
+
+	  ao_ref ref;
+	  poly_offset_int off = (poly_offset_int)access_node->offset
+		+ ((poly_offset_int)access_node->parm_offset
+		   << LOG2_BITS_PER_UNIT);
+	  poly_int64 off2;
+	  if (!off.to_shwi (&off2))
+	    return false;
+	  ao_ref_init_from_ptr_and_range
+		 (&ref, arg, true, off2, access_node->size,
+		  access_node->max_size);
+	  ref.ref_alias_set = ref_node->ref;
+	  ref.base_alias_set = base_node->base;
+
+	  bool byte_tracking_enabled
+	      = setup_live_bytes_from_ref (&ref, live_bytes);
+	  enum dse_store_status store_status;
+
+	  store_status = dse_classify_store (&ref, stmt,
+					     byte_tracking_enabled,
+					     live_bytes, &by_clobber_p);
+	  if (store_status != DSE_STORE_DEAD)
+	    return false;
+	}
+  /* Check also value stored by the call.  */
+  if (gimple_store_p (stmt))
+    {
+      ao_ref ref;
+
+      if (!initialize_ao_ref_for_dse (stmt, &ref))
+	gcc_unreachable ();
+      bool byte_tracking_enabled
+	  = setup_live_bytes_from_ref (&ref, live_bytes);
+      enum dse_store_status store_status;
+
+      store_status = dse_classify_store (&ref, stmt,
+					 byte_tracking_enabled,
+					 live_bytes, &by_clobber_p);
+      if (store_status != DSE_STORE_DEAD)
+	return false;
+    }
+  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
+  return true;
+}
+
 namespace {
 
 const pass_data pass_data_dse =
@@ -1235,7 +1363,14 @@  pass_dse::execute (function *fun)
 	  gimple *stmt = gsi_stmt (gsi);
 
 	  if (gimple_vdef (stmt))
-	    dse_optimize_stmt (fun, &gsi, live_bytes);
+	    {
+	      gcall *call = dyn_cast <gcall *> (stmt);
+
+	      if (call && dse_optimize_call (&gsi, live_bytes))
+		/* We removed a dead call.  */;
+	      else
+		dse_optimize_store (fun, &gsi, live_bytes);
+	    }
 	  else if (def_operand_p
 		     def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
 	    {