[v2] ipa-cp: Speculatively call specialized functions

Message ID 20221216161054.3663182-1-manolis.tsamis@vrull.eu
State New
Headers
Series [v2] ipa-cp: Speculatively call specialized functions |

Commit Message

Manolis Tsamis Dec. 16, 2022, 4:10 p.m. UTC
  The IPA CP pass offers a wide range of optimizations, where most of them
lead to specialized functions that are called from a call site.
This can lead to multiple specialized function clones, if more than
one call-site allows such an optimization.
If not all call-sites can be optimized, the program might end
up with call-sites to the original function.

This pass assumes that non-optimized call-sites (i.e. call-sites
that don't call specialized functions) are likely to be called
with arguments that would allow calling specialized clones.
Since we cannot guarantee this (for obvious reasons), we can't
replace the existing calls. However, we can introduce dynamic
guards that test the arguments for the collected constants
and calls the specialized function if there is a match.

To demonstrate the effect, let's consider the following program part:

  func_1()
    myfunc(1)
  func_2()
    myfunc(2)
  func_i(i)
    myfunc(i)

In this case the transformation would do the following:

  func_1()
    myfunc.constprop.1() // myfunc() with arg0 == 1
  func_2()
    myfunc.constprop.2() // myfunc() with arg0 == 2
  func_i(i)
    if (i == 1)
      myfunc.constprop.1() // myfunc() with arg0 == 1
    else if (i == 2)
      myfunc.constprop.2() // myfunc() with arg0 == 2
    else
      myfunc(i)

The pass consists of two main parts:
* collecting all specialized functions and the argument/constant pair(s)
* insertion of the guards during materialization

The patch integrates well into ipa-cp and related IPA functionality.
Given the nature of IPA, the changes are touching many IPA-related
files as well as call-graph data structures.

The impact of the dynamic guard is expected to be less than the speedup
gained by enabled optimizations (e.g. inlining or constant propagation).

gcc/Changelog:

        * cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for guarded specialized edges.
        (cgraph_edge::set_call_stmt): Likewise.
        (symbol_table::create_edge): Likewise.
        (cgraph_edge::remove): Likewise.
        (cgraph_edge::make_speculative): Likewise.
        (cgraph_edge::make_specialized): Likewise.
        (cgraph_edge::remove_specializations): Likewise.
        (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
        (cgraph_edge::dump_edge_flags): Likewise.
        (verify_speculative_call): Likewise.
        (verify_specialized_call): Likewise.
        (cgraph_node::verify_node): Likewise.
        * cgraph.h (class GTY): Add new class that contains info of specialized edges.
        * cgraphclones.cc (cgraph_edge::clone): Add support for guarded specialized edges.
        (cgraph_node::set_call_stmt_including_clones): Likewise.
        * ipa-cp.cc (want_remove_some_param_p): Likewise.
        (create_specialized_node): Likewise.
        (add_specialized_edges): Likewise.
        (ipcp_driver): Likewise.
        * ipa-fnsummary.cc (redirect_to_unreachable): Likewise.
        (ipa_fn_summary_t::duplicate): Likewise.
        (analyze_function_body): Likewise.
        (estimate_edge_size_and_time): Likewise.
        (remap_edge_summaries): Likewise.
        * ipa-inline-transform.cc (inline_transform): Likewise.
        * ipa-inline.cc (edge_badness): Likewise.
         lto-cgraph.cc (lto_output_edge): Likewise.
        (input_edge): Likewise.
        * tree-inline.cc (copy_bb): Likewise.
        * value-prof.cc (gimple_sc): Add function to create guarded specializations.
        * value-prof.h (gimple_sc): Likewise.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>

---

Changes in v2:
          - Added params ipa-guarded-specialization-guard-complexity and
            ipa-guarded-specializations-per-edge to control the complexity and number
            of specialized edges that are created.
          - Create separate clones for the guarded specialized calls.
          - Add more validation checks for the invariants of specialized edges.
          - Fix bugs and improve robustness.

 gcc/cgraph.cc               | 372 ++++++++++++++++++++++++++++++++++--
 gcc/cgraph.h                | 105 ++++++++++
 gcc/cgraphclones.cc         |  42 ++++
 gcc/common.opt              |   4 +
 gcc/ipa-cp.cc               | 171 ++++++++++++++++-
 gcc/ipa-fnsummary.cc        |  42 ++++
 gcc/ipa-inline-transform.cc |  16 ++
 gcc/ipa-inline.cc           |   1 +
 gcc/lto-cgraph.cc           |  46 +++++
 gcc/params.opt              |   8 +
 gcc/tree-inline.cc          |  75 +++++++-
 gcc/value-prof.cc           | 223 +++++++++++++++++++++
 gcc/value-prof.h            |   1 +
 13 files changed, 1087 insertions(+), 19 deletions(-)
  

Comments

Martin Jambor Jan. 13, 2023, 5:49 p.m. UTC | #1
Hello,

sorry for getting to this quite late.  I have only had a quick glance at
ipa-cp.cc hunks so far.

On Fri, Dec 16 2022, Manolis Tsamis wrote:
> The IPA CP pass offers a wide range of optimizations, where most of them
> lead to specialized functions that are called from a call site.
> This can lead to multiple specialized function clones, if more than
> one call-site allows such an optimization.
> If not all call-sites can be optimized, the program might end
> up with call-sites to the original function.
>
> This pass assumes that non-optimized call-sites (i.e. call-sites
> that don't call specialized functions) are likely to be called
> with arguments that would allow calling specialized clones.
> Since we cannot guarantee this (for obvious reasons), we can't
> replace the existing calls. However, we can introduce dynamic
> guards that test the arguments for the collected constants
> and calls the specialized function if there is a match.
>
> To demonstrate the effect, let's consider the following program part:
>
>   func_1()
>     myfunc(1)
>   func_2()
>     myfunc(2)
>   func_i(i)
>     myfunc(i)
>
> In this case the transformation would do the following:
>
>   func_1()
>     myfunc.constprop.1() // myfunc() with arg0 == 1
>   func_2()
>     myfunc.constprop.2() // myfunc() with arg0 == 2
>   func_i(i)
>     if (i == 1)
>       myfunc.constprop.1() // myfunc() with arg0 == 1
>     else if (i == 2)
>       myfunc.constprop.2() // myfunc() with arg0 == 2
>     else
>       myfunc(i)

My understanding of the code, however, is that it rather creates

  func_i(i)
    if (i == 1)
      myfunc.constprop.1_1() // mostly equivalent but separate from myfunc.constprop.1
    else if (i == 2)
      myfunc.constprop.2_1() // mostly equivalent but separate from myfunc.constprop.2
    else
      myfunc(i)

Which I find difficult to justify.  From comments it looked like the
reason is avoiding calling find_more_scalar_values, is that correct?

I'd like to know more about the cases you are targeting and cases where
adding the additional known scalar constants were an issue.  I think it
needs to be tackled differently.

By the way, as IPA-CP works now (it would be nice but difficult to lift
that limitation), all but up to one constant in known_csts are constants
in all call contexts, so without calling find_more_scalar_values you
should need just one run-time condition per speculative call.  So
tracking which constant is which might be better than avoiding
find_more_scalar_values?

Also growth limits in ipa-cp are not updated appropriately.

Some more comments inline:

>
> The pass consists of two main parts:
> * collecting all specialized functions and the argument/constant pair(s)
> * insertion of the guards during materialization
>
> The patch integrates well into ipa-cp and related IPA functionality.
> Given the nature of IPA, the changes are touching many IPA-related
> files as well as call-graph data structures.
>
> The impact of the dynamic guard is expected to be less than the speedup
> gained by enabled optimizations (e.g. inlining or constant propagation).
>
> gcc/Changelog:
>
>         * cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for guarded specialized edges.
>         (cgraph_edge::set_call_stmt): Likewise.
>         (symbol_table::create_edge): Likewise.
>         (cgraph_edge::remove): Likewise.
>         (cgraph_edge::make_speculative): Likewise.
>         (cgraph_edge::make_specialized): Likewise.
>         (cgraph_edge::remove_specializations): Likewise.
>         (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
>         (cgraph_edge::dump_edge_flags): Likewise.
>         (verify_speculative_call): Likewise.
>         (verify_specialized_call): Likewise.
>         (cgraph_node::verify_node): Likewise.
>         * cgraph.h (class GTY): Add new class that contains info of specialized edges.
>         * cgraphclones.cc (cgraph_edge::clone): Add support for guarded specialized edges.
>         (cgraph_node::set_call_stmt_including_clones): Likewise.
>         * ipa-cp.cc (want_remove_some_param_p): Likewise.
>         (create_specialized_node): Likewise.
>         (add_specialized_edges): Likewise.
>         (ipcp_driver): Likewise.
>         * ipa-fnsummary.cc (redirect_to_unreachable): Likewise.
>         (ipa_fn_summary_t::duplicate): Likewise.
>         (analyze_function_body): Likewise.
>         (estimate_edge_size_and_time): Likewise.
>         (remap_edge_summaries): Likewise.
>         * ipa-inline-transform.cc (inline_transform): Likewise.
>         * ipa-inline.cc (edge_badness): Likewise.
>          lto-cgraph.cc (lto_output_edge): Likewise.
>         (input_edge): Likewise.
>         * tree-inline.cc (copy_bb): Likewise.
>         * value-prof.cc (gimple_sc): Add function to create guarded specializations.
>         * value-prof.h (gimple_sc): Likewise.

Please also include test-cases.

>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
>
> ---
>

[...]

> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index cc031ebed0f..31d01ada928 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -119,6 +119,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "symbol-summary.h"
>  #include "tree-vrp.h"
>  #include "ipa-prop.h"
> +#include "gimple-pretty-print.h"
>  #include "tree-pretty-print.h"
>  #include "tree-inline.h"
>  #include "ipa-fnsummary.h"
> @@ -5239,16 +5240,20 @@ want_remove_some_param_p (cgraph_node *node, vec<tree> known_csts)
>    return false;
>  }
>  
> +static hash_map<cgraph_node*, vec<cgraph_node*>> *available_specializations;
> +
>  /* Create a specialized version of NODE with known constants in KNOWN_CSTS,
>     known contexts in KNOWN_CONTEXTS and known aggregate values in AGGVALS and
> -   redirect all edges in CALLERS to it.  */
> +   redirect all edges in CALLERS to it.  If IS_SPECULATIVE is true then this
> +   node is created to be part of a guarded specialization edge.  */
>  
>  static struct cgraph_node *
>  create_specialized_node (struct cgraph_node *node,
>  			 vec<tree> known_csts,
>  			 vec<ipa_polymorphic_call_context> known_contexts,
>  			 vec<ipa_argagg_value, va_gc> *aggvals,
> -			 vec<cgraph_edge *> &callers)
> +			 vec<cgraph_edge *> &callers,
> +			 bool is_speculative)
>  {
>    ipa_node_params *new_info, *info = ipa_node_params_sum->get (node);
>    vec<ipa_replace_map *, va_gc> *replace_trees = NULL;
> @@ -5383,7 +5388,7 @@ create_specialized_node (struct cgraph_node *node,
>    for (const ipa_argagg_value &av : aggvals)
>      new_node->maybe_create_reference (av.value, NULL);
>  
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> +  if (dump_file && (dump_flags & TDF_DETAILS) && !is_speculative)
>      {
>        fprintf (dump_file, "     the new node is %s.\n", new_node->dump_name ());
>        if (known_contexts.exists ())
> @@ -5409,6 +5414,13 @@ create_specialized_node (struct cgraph_node *node,
>    new_info->known_csts = known_csts;
>    new_info->known_contexts = known_contexts;
>  
> +  if (is_speculative && !info->ipcp_orig_node)

What is the reason for testing !info->ipcp_orig_node node here?


> +    {
> +      vec<cgraph_node*> &spec_nodes
> +	= available_specializations->get_or_insert (node);
> +      spec_nodes.safe_push (new_node);
> +    }
> +
>    ipcp_discover_new_direct_edges (new_node, known_csts, known_contexts,
>  				  aggvals);
>  
> @@ -6104,6 +6116,21 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
>        known_csts = avals->m_known_vals.copy ();
>        known_contexts = copy_useful_known_contexts (avals->m_known_contexts);
>      }
> +
> +  /* If guarded specialization is enabled then we create an additional
> +     clone with KNOWN_CSTS and no known contexts or aggregates.
> +     We don't want find_more_scalar_values because adding more constants
> +     instreases the complexity of the guard and reduces the chance
> +     that it is used.  */
> +  if (flag_ipa_guarded_specialization && !val->self_recursion_generated_p ())
> +    {
> +      vec<cgraph_edge *> no_callers = vNULL;
> +      cgraph_node *guarded_spec_node
> +	= create_specialized_node (node, known_csts.copy (), vNULL,
> +						 NULL, no_callers, true);

It looks like that if the value being considered is an aggregate value
(offset is non-negative) or polymorphic_context (yeah, the whole
function is a template), neither of which is recorded in known_csts,
you'll end up creating a clone with no specialization at all (other than
that for all direct calls).


> +      update_profiling_info (node, guarded_spec_node);

I must say I don't know what is the best way to distribute profiling
counts in the transformation you propose, but this is not going to do
the right thing.  update_profiling_info tries to divide the counts
proportionally depending on sum of counts of calls to the original and
the clone and since the clone has no callers at this point, it will
become quite cold.


> +    }
> +
>    find_more_scalar_values_for_callers_subset (node, known_csts, callers);
>    find_more_contexts_for_caller_subset (node, &known_contexts, callers);
>    vec<ipa_argagg_value, va_gc> *aggvals
> @@ -6111,7 +6138,7 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
>    gcc_checking_assert (ipcp_val_agg_replacement_ok_p (aggvals, index,
>  						      offset, val->value));
>    val->spec_node = create_specialized_node (node, known_csts, known_contexts,
> -					    aggvals, callers);
> +					    aggvals, callers, false);
>  
>    if (val->self_recursion_generated_p ())
>      self_gen_clones->safe_push (val->spec_node);
> @@ -6270,7 +6297,7 @@ decide_whether_version_node (struct cgraph_node *node)
>  	  known_contexts = vNULL;
>  	}
>        clone = create_specialized_node (node, known_csts, known_contexts,
> -				       aggvals, callers);
> +				       aggvals, callers, false);
>        info->do_clone_for_all_contexts = false;
>        ipa_node_params_sum->get (clone)->is_all_contexts_clone = true;
>        ret = true;
> @@ -6546,6 +6573,135 @@ ipcp_store_vr_results (void)
>      }
>  }
>  
> +/* Add new edges to the call graph to represent the available specializations
> +   of each specialized function.  */
> +static void
> +add_specialized_edges (void)
> +{
> +  cgraph_edge *e;
> +  cgraph_node *n, *spec_n;
> +  tree known_cst;
> +  unsigned i, j;
> +
> +  FOR_EACH_DEFINED_FUNCTION (n)
> +    {
> +      if (dump_file && n->callees)
> +	fprintf (dump_file,
> +		 "Procesing function %s for specialization of edges.\n",
> +		 n->dump_name ());
> +
> +      if (n->ipcp_clone)
> +	continue;
> +
> +      bool update = false;
> +      for (e = n->callees; e; e = e->next_callee)
> +	{
> +	  if (!e->callee || e->recursive_p ())
> +	    continue;
> +
> +	  vec<cgraph_node*> *specialization_nodes
> +	    = available_specializations->get (e->callee);
> +
> +	  /* Even if the calle is a specialized node it is still valid to
> +	     further create guarded specializations based on the original node.
> +	     If the existing specialized node doesn't have any known constants
> +	     then it is probably profitable to specialize further.  */

So you are saying thar scalar constants specializations are always
better than aggregate or polymorphic_context ones?  IMHO this should be
at least driven by some heuristics like the number that
good_cloning_opportunity_p uses.

> +	  if (e->callee->ipcp_clone && !specialization_nodes)
> +	    {
> +	      ipa_node_params *info
> +		= ipa_node_params_sum->get (e->callee);
> +	      gcc_checking_assert (info->ipcp_orig_node);
> +
> +	      bool has_known_constant = false;
> +	      FOR_EACH_VEC_ELT (info->known_csts, i, known_cst)
> +		if (known_cst != NULL_TREE)
> +		  {
> +		    has_known_constant = true;
> +		    break;
> +		  }
> +
> +	      if (!has_known_constant)
> +		specialization_nodes
> +		  = available_specializations->get (info->ipcp_orig_node);
> +	    }
> +
> +	  if (!specialization_nodes)
> +	    continue;
> +
> +	  unsigned num_of_specializations = 0;
> +	  unsigned max_num_of_specializations = opt_for_fn (n->decl,
> +						  param_ipa_spec_max_per_edge);
> +
> +	  FOR_EACH_VEC_ELT (*specialization_nodes, i, spec_n)
> +	    {
> +	      if (dump_file)
> +		fprintf (dump_file,
> +			 "Edge has available specialization %s.\n",
> +			 spec_n->dump_name ());
> +
> +	      ipa_node_params *spec_params = ipa_node_params_sum->get (spec_n);
> +	      vec<cgraph_specialization_info> replaced_args = vNULL;
> +	      bool failed = false;
> +
> +	      FOR_EACH_VEC_ELT (spec_params->known_csts, j, known_cst)

As I wrote before, I think you are testing also constants which we know
are there.

The idea is interesting, thanks for exploring these options.  As I said,
knowing a bit more about what motivated you might help us to reason
about it.

Martin
  
Manolis Tsamis Jan. 23, 2023, 10:09 a.m. UTC | #2
On Fri, Jan 13, 2023 at 7:49 PM Martin Jambor <mjambor@suse.cz> wrote:
>
> Hello,
>
> sorry for getting to this quite late.  I have only had a quick glance at
> ipa-cp.cc hunks so far.
>

Hi Martin,

Thanks for taking the time to review these.

> On Fri, Dec 16 2022, Manolis Tsamis wrote:
> > The IPA CP pass offers a wide range of optimizations, where most of them
> > lead to specialized functions that are called from a call site.
> > This can lead to multiple specialized function clones, if more than
> > one call-site allows such an optimization.
> > If not all call-sites can be optimized, the program might end
> > up with call-sites to the original function.
> >
> > This pass assumes that non-optimized call-sites (i.e. call-sites
> > that don't call specialized functions) are likely to be called
> > with arguments that would allow calling specialized clones.
> > Since we cannot guarantee this (for obvious reasons), we can't
> > replace the existing calls. However, we can introduce dynamic
> > guards that test the arguments for the collected constants
> > and calls the specialized function if there is a match.
> >
> > To demonstrate the effect, let's consider the following program part:
> >
> >   func_1()
> >     myfunc(1)
> >   func_2()
> >     myfunc(2)
> >   func_i(i)
> >     myfunc(i)
> >
> > In this case the transformation would do the following:
> >
> >   func_1()
> >     myfunc.constprop.1() // myfunc() with arg0 == 1
> >   func_2()
> >     myfunc.constprop.2() // myfunc() with arg0 == 2
> >   func_i(i)
> >     if (i == 1)
> >       myfunc.constprop.1() // myfunc() with arg0 == 1
> >     else if (i == 2)
> >       myfunc.constprop.2() // myfunc() with arg0 == 2
> >     else
> >       myfunc(i)
>
> My understanding of the code, however, is that it rather creates
>
>   func_i(i)
>     if (i == 1)
>       myfunc.constprop.1_1() // mostly equivalent but separate from myfunc.constprop.1
>     else if (i == 2)
>       myfunc.constprop.2_1() // mostly equivalent but separate from myfunc.constprop.2
>     else
>       myfunc(i)
>
> Which I find difficult to justify.  From comments it looked like the
> reason is avoiding calling find_more_scalar_values, is that correct?
>
> I'd like to know more about the cases you are targeting and cases where
> adding the additional known scalar constants were an issue.  I think it
> needs to be tackled differently.
>
> By the way, as IPA-CP works now (it would be nice but difficult to lift
> that limitation), all but up to one constant in known_csts are constants
> in all call contexts, so without calling find_more_scalar_values you
> should need just one run-time condition per speculative call.  So
> tracking which constant is which might be better than avoiding
> find_more_scalar_values?
>

First of all what you say about the clones being mostly equivalent but
separate is true.
I have also noted this in the V2 changes but the description is based
on V1 where the
clones were indeed shared with ipa-cp. Allow me to provide some context here:

The implementation is based on the assumption that the constant
arguments from an
ipa-cp specialization are likely to appear in non-constant call sites
as well, and in that
case it is worthwhile to speculatively specialize for these. In the
first implementation
the speculative guards called the same specialized functions that were
created by
ipa-cp but this turned out to be an issue for two reasons.

The first issue was find_more_scalar_values. Whereas the constant chosen for the
ipa-cp clone is a good indicator for the likeliness of a value in the
non-constant callsites,
the constants added by find_more_scalar_values are usually not. Adding
more constants
to the specialization is of course an improvement for the call sites
that involve these
constants, but for speculatively specializing they make things worse
by increasing the
complexity of the guard and also decreasing the probability the guard
will be true (by being
more restrictive). This is especially true for pointer constants added
by find_more_scalar_values,
which are useful for the specialized function but greatly limit the
usefulness of the speculative one.

The second reason for creating separate clones is that we realized
that ipa-cp clones are more
than just specializing for a constant. The specialized clones haves a
list of known contexts, which
from my understanding it is incorrect to call the specialized function
from a context not included in
these, and a list of aggregate replacements which didn't work well
when we tried to use them as
speculative specializations.

These are the main reasons that although we wanted to avoid
excessively clone function we had
to create separate clones from ipa-cp's.

Additionally it is true that with this approach, as you mention, just
one run-time condition per
speculative call is needed. The implementation is more general and
supports any number of them,
but currently only one is used. In case this was not implied by your
last sentence there, I want to
clarify that if the specialized function has more than just one
specialized value
(through find_more_scalar_values) then testing for just the single
constant is not enough to
make calling the specialization legal.

I would be interested in ideas and feedback on how that could be
improved and if there is a way
to avoid creating separate function clones here.

> Also growth limits in ipa-cp are not updated appropriately.
>

Because I'm not really familiar with how the growth limits work, can
you please point out what
I should look for to address this?

> Some more comments inline:
>
> >
> > The pass consists of two main parts:
> > * collecting all specialized functions and the argument/constant pair(s)
> > * insertion of the guards during materialization
> >
> > The patch integrates well into ipa-cp and related IPA functionality.
> > Given the nature of IPA, the changes are touching many IPA-related
> > files as well as call-graph data structures.
> >
> > The impact of the dynamic guard is expected to be less than the speedup
> > gained by enabled optimizations (e.g. inlining or constant propagation).
> >
> > gcc/Changelog:
> >
> >         * cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for guarded specialized edges.
> >         (cgraph_edge::set_call_stmt): Likewise.
> >         (symbol_table::create_edge): Likewise.
> >         (cgraph_edge::remove): Likewise.
> >         (cgraph_edge::make_speculative): Likewise.
> >         (cgraph_edge::make_specialized): Likewise.
> >         (cgraph_edge::remove_specializations): Likewise.
> >         (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
> >         (cgraph_edge::dump_edge_flags): Likewise.
> >         (verify_speculative_call): Likewise.
> >         (verify_specialized_call): Likewise.
> >         (cgraph_node::verify_node): Likewise.
> >         * cgraph.h (class GTY): Add new class that contains info of specialized edges.
> >         * cgraphclones.cc (cgraph_edge::clone): Add support for guarded specialized edges.
> >         (cgraph_node::set_call_stmt_including_clones): Likewise.
> >         * ipa-cp.cc (want_remove_some_param_p): Likewise.
> >         (create_specialized_node): Likewise.
> >         (add_specialized_edges): Likewise.
> >         (ipcp_driver): Likewise.
> >         * ipa-fnsummary.cc (redirect_to_unreachable): Likewise.
> >         (ipa_fn_summary_t::duplicate): Likewise.
> >         (analyze_function_body): Likewise.
> >         (estimate_edge_size_and_time): Likewise.
> >         (remap_edge_summaries): Likewise.
> >         * ipa-inline-transform.cc (inline_transform): Likewise.
> >         * ipa-inline.cc (edge_badness): Likewise.
> >          lto-cgraph.cc (lto_output_edge): Likewise.
> >         (input_edge): Likewise.
> >         * tree-inline.cc (copy_bb): Likewise.
> >         * value-prof.cc (gimple_sc): Add function to create guarded specializations.
> >         * value-prof.h (gimple_sc): Likewise.
>
> Please also include test-cases.
>

Will do.

> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> >
> > ---
> >
>
> [...]
>
> > diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> > index cc031ebed0f..31d01ada928 100644
> > --- a/gcc/ipa-cp.cc
> > +++ b/gcc/ipa-cp.cc
> > @@ -119,6 +119,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "symbol-summary.h"
> >  #include "tree-vrp.h"
> >  #include "ipa-prop.h"
> > +#include "gimple-pretty-print.h"
> >  #include "tree-pretty-print.h"
> >  #include "tree-inline.h"
> >  #include "ipa-fnsummary.h"
> > @@ -5239,16 +5240,20 @@ want_remove_some_param_p (cgraph_node *node, vec<tree> known_csts)
> >    return false;
> >  }
> >
> > +static hash_map<cgraph_node*, vec<cgraph_node*>> *available_specializations;
> > +
> >  /* Create a specialized version of NODE with known constants in KNOWN_CSTS,
> >     known contexts in KNOWN_CONTEXTS and known aggregate values in AGGVALS and
> > -   redirect all edges in CALLERS to it.  */
> > +   redirect all edges in CALLERS to it.  If IS_SPECULATIVE is true then this
> > +   node is created to be part of a guarded specialization edge.  */
> >
> >  static struct cgraph_node *
> >  create_specialized_node (struct cgraph_node *node,
> >                        vec<tree> known_csts,
> >                        vec<ipa_polymorphic_call_context> known_contexts,
> >                        vec<ipa_argagg_value, va_gc> *aggvals,
> > -                      vec<cgraph_edge *> &callers)
> > +                      vec<cgraph_edge *> &callers,
> > +                      bool is_speculative)
> >  {
> >    ipa_node_params *new_info, *info = ipa_node_params_sum->get (node);
> >    vec<ipa_replace_map *, va_gc> *replace_trees = NULL;
> > @@ -5383,7 +5388,7 @@ create_specialized_node (struct cgraph_node *node,
> >    for (const ipa_argagg_value &av : aggvals)
> >      new_node->maybe_create_reference (av.value, NULL);
> >
> > -  if (dump_file && (dump_flags & TDF_DETAILS))
> > +  if (dump_file && (dump_flags & TDF_DETAILS) && !is_speculative)
> >      {
> >        fprintf (dump_file, "     the new node is %s.\n", new_node->dump_name ());
> >        if (known_contexts.exists ())
> > @@ -5409,6 +5414,13 @@ create_specialized_node (struct cgraph_node *node,
> >    new_info->known_csts = known_csts;
> >    new_info->known_contexts = known_contexts;
> >
> > +  if (is_speculative && !info->ipcp_orig_node)
>
> What is the reason for testing !info->ipcp_orig_node node here?
>

The idea is to limit excessive speculation by not doing it for when a
non-specialized
function is specialized for the first time. This does not affect
correctness, so it can be
removed if my assumption doesn't hold.

>
> > +    {
> > +      vec<cgraph_node*> &spec_nodes
> > +     = available_specializations->get_or_insert (node);
> > +      spec_nodes.safe_push (new_node);
> > +    }
> > +
> >    ipcp_discover_new_direct_edges (new_node, known_csts, known_contexts,
> >                                 aggvals);
> >
> > @@ -6104,6 +6116,21 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
> >        known_csts = avals->m_known_vals.copy ();
> >        known_contexts = copy_useful_known_contexts (avals->m_known_contexts);
> >      }
> > +
> > +  /* If guarded specialization is enabled then we create an additional
> > +     clone with KNOWN_CSTS and no known contexts or aggregates.
> > +     We don't want find_more_scalar_values because adding more constants
> > +     instreases the complexity of the guard and reduces the chance
> > +     that it is used.  */
> > +  if (flag_ipa_guarded_specialization && !val->self_recursion_generated_p ())
> > +    {
> > +      vec<cgraph_edge *> no_callers = vNULL;
> > +      cgraph_node *guarded_spec_node
> > +     = create_specialized_node (node, known_csts.copy (), vNULL,
> > +                                              NULL, no_callers, true);
>
> It looks like that if the value being considered is an aggregate value
> (offset is non-negative) or polymorphic_context (yeah, the whole
> function is a template), neither of which is recorded in known_csts,
> you'll end up creating a clone with no specialization at all (other than
> that for all direct calls).
>

Thanks for pointing out, I will add a condition to check if there are
any known constants.

>
> > +      update_profiling_info (node, guarded_spec_node);
>
> I must say I don't know what is the best way to distribute profiling
> counts in the transformation you propose, but this is not going to do
> the right thing.  update_profiling_info tries to divide the counts
> proportionally depending on sum of counts of calls to the original and
> the clone and since the clone has no callers at this point, it will
> become quite cold.
>

Indeed. Would it be fine to assume that for all non-constant call sites a small
percentage of them (say ~10%) will end to the speculative specializations and
calculate the profiles based on that?

>
> > +    }
> > +
> >    find_more_scalar_values_for_callers_subset (node, known_csts, callers);
> >    find_more_contexts_for_caller_subset (node, &known_contexts, callers);
> >    vec<ipa_argagg_value, va_gc> *aggvals
> > @@ -6111,7 +6138,7 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
> >    gcc_checking_assert (ipcp_val_agg_replacement_ok_p (aggvals, index,
> >                                                     offset, val->value));
> >    val->spec_node = create_specialized_node (node, known_csts, known_contexts,
> > -                                         aggvals, callers);
> > +                                         aggvals, callers, false);
> >
> >    if (val->self_recursion_generated_p ())
> >      self_gen_clones->safe_push (val->spec_node);
> > @@ -6270,7 +6297,7 @@ decide_whether_version_node (struct cgraph_node *node)
> >         known_contexts = vNULL;
> >       }
> >        clone = create_specialized_node (node, known_csts, known_contexts,
> > -                                    aggvals, callers);
> > +                                    aggvals, callers, false);
> >        info->do_clone_for_all_contexts = false;
> >        ipa_node_params_sum->get (clone)->is_all_contexts_clone = true;
> >        ret = true;
> > @@ -6546,6 +6573,135 @@ ipcp_store_vr_results (void)
> >      }
> >  }
> >
> > +/* Add new edges to the call graph to represent the available specializations
> > +   of each specialized function.  */
> > +static void
> > +add_specialized_edges (void)
> > +{
> > +  cgraph_edge *e;
> > +  cgraph_node *n, *spec_n;
> > +  tree known_cst;
> > +  unsigned i, j;
> > +
> > +  FOR_EACH_DEFINED_FUNCTION (n)
> > +    {
> > +      if (dump_file && n->callees)
> > +     fprintf (dump_file,
> > +              "Procesing function %s for specialization of edges.\n",
> > +              n->dump_name ());
> > +
> > +      if (n->ipcp_clone)
> > +     continue;
> > +
> > +      bool update = false;
> > +      for (e = n->callees; e; e = e->next_callee)
> > +     {
> > +       if (!e->callee || e->recursive_p ())
> > +         continue;
> > +
> > +       vec<cgraph_node*> *specialization_nodes
> > +         = available_specializations->get (e->callee);
> > +
> > +       /* Even if the calle is a specialized node it is still valid to
> > +          further create guarded specializations based on the original node.
> > +          If the existing specialized node doesn't have any known constants
> > +          then it is probably profitable to specialize further.  */
>
> So you are saying thar scalar constants specializations are always
> better than aggregate or polymorphic_context ones?  IMHO this should be
> at least driven by some heuristics like the number that
> good_cloning_opportunity_p uses.
>

Ok, I will look into good_cloning_opportunity_p and see if I can
create a heuristic
that makes sense for that case. The reason this was made like that is
that in our
tests the constants ended up being more important.

> > +       if (e->callee->ipcp_clone && !specialization_nodes)
> > +         {
> > +           ipa_node_params *info
> > +             = ipa_node_params_sum->get (e->callee);
> > +           gcc_checking_assert (info->ipcp_orig_node);
> > +
> > +           bool has_known_constant = false;
> > +           FOR_EACH_VEC_ELT (info->known_csts, i, known_cst)
> > +             if (known_cst != NULL_TREE)
> > +               {
> > +                 has_known_constant = true;
> > +                 break;
> > +               }
> > +
> > +           if (!has_known_constant)
> > +             specialization_nodes
> > +               = available_specializations->get (info->ipcp_orig_node);
> > +         }
> > +
> > +       if (!specialization_nodes)
> > +         continue;
> > +
> > +       unsigned num_of_specializations = 0;
> > +       unsigned max_num_of_specializations = opt_for_fn (n->decl,
> > +                                               param_ipa_spec_max_per_edge);
> > +
> > +       FOR_EACH_VEC_ELT (*specialization_nodes, i, spec_n)
> > +         {
> > +           if (dump_file)
> > +             fprintf (dump_file,
> > +                      "Edge has available specialization %s.\n",
> > +                      spec_n->dump_name ());
> > +
> > +           ipa_node_params *spec_params = ipa_node_params_sum->get (spec_n);
> > +           vec<cgraph_specialization_info> replaced_args = vNULL;
> > +           bool failed = false;
> > +
> > +           FOR_EACH_VEC_ELT (spec_params->known_csts, j, known_cst)
>
> As I wrote before, I think you are testing also constants which we know
> are there.

If you mean that guards which are trivially false can be created (e.g.
if (4 == 6))
that is true. I thought that letting later optimizations take care of
that is fine
but maybe it's also wasteful. I can improve it by testing if the argument
is constant.

>
> The idea is interesting, thanks for exploring these options.  As I said,
> knowing a bit more about what motivated you might help us to reason
> about it.
>

This implementation is part of an optimization effort that was inspired by
patterns in SPEC2017's x264 benchmark that were then made into more
general optimization approaches. The idea is that these are useful if they
can offer improved performance as long they don't cause important
performance regressions. Combined with another (similar in nature)
proposed optimization our current measurements show a ~2-3%
improvement in that benchmark.

Thanks,
Manolis

> Martin
  

Patch

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index f15cb47c8b8..356b1f64756 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -718,18 +718,24 @@  cgraph_add_edge_to_call_site_hash (cgraph_edge *e)
      one indirect); always hash the direct one.  */
   if (e->speculative && e->indirect_unknown_callee)
     return;
+  /* There are potentially multiple specialization edges for every
+     specialized call; always hash the base egde.  */
+  if (e->guarded_specialization_edge_p ())
+    return;
   cgraph_edge **slot = e->caller->call_site_hash->find_slot_with_hash
       (e->call_stmt, cgraph_edge_hasher::hash (e->call_stmt), INSERT);
   if (*slot)
     {
-      gcc_assert (((cgraph_edge *)*slot)->speculative);
+      gcc_assert (((cgraph_edge *)*slot)->speculative
+		  || ((cgraph_edge *)*slot)->specialized);
       if (e->callee && (!e->prev_callee
 			|| !e->prev_callee->speculative
+			|| !e->prev_callee->specialized
 			|| e->prev_callee->call_stmt != e->call_stmt))
 	*slot = e;
       return;
     }
-  gcc_assert (!*slot || e->speculative);
+  gcc_assert (!*slot || e->speculative || e->specialized);
   *slot = e;
 }
 
@@ -743,8 +749,16 @@  cgraph_node::get_edge (gimple *call_stmt)
   int n = 0;
 
   if (call_site_hash)
-    return call_site_hash->find_with_hash
-	(call_stmt, cgraph_edge_hasher::hash (call_stmt));
+    {
+      e = call_site_hash->find_with_hash
+	  (call_stmt, cgraph_edge_hasher::hash (call_stmt));
+
+      /* Always return the base edge of a group of specialized edges.  */
+      if (e && e->guarded_specialization_edge_p ())
+	e = e->specialized_call_base_edge ();
+
+      return e;
+    }
 
   /* This loop may turn out to be performance problem.  In such case adding
      hashtables into call nodes with very many edges is probably best
@@ -775,6 +789,10 @@  cgraph_node::get_edge (gimple *call_stmt)
 	cgraph_add_edge_to_call_site_hash (e2);
     }
 
+  /* Always return the base edge of a group of specialized edges.  */
+  if (e && e->guarded_specialization_edge_p ())
+    e = e->specialized_call_base_edge ();
+
   return e;
 }
 
@@ -800,6 +818,40 @@  cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
       gcc_checking_assert (new_direct_callee);
     }
 
+  /* Update specialized first and do not return yet in case we're dealing
+     with an edge that is both specialized and speculative.  */
+  if (update_speculative && e->specialized)
+    {
+      /* If this is a guarded specialization edge then delegate the needed
+	 work to the base specialization edge.  This is needed to correctly
+	 update all call statements, including the case where this is a
+	 group of both speculative and specialized edges.  */
+      if (e->guarded_specialization_edge_p ())
+	{
+	  set_call_stmt (e->specialized_call_base_edge (), new_stmt, true);
+	  return e;
+	}
+      else
+	{
+	  cgraph_edge *next;
+	  for (cgraph_edge *d = e->first_specialized_call_target ();
+	       d; d = next)
+	    {
+	      next = d->next_specialized_call_target ();
+	      cgraph_edge *d2 = set_call_stmt (d, new_stmt, false);
+	      gcc_assert (d2 == d);
+	    }
+
+	  /* Don't update base for speculative edges.
+	     The code below that handles speculative edges will.  */
+	  if (!(e->speculative && !new_direct_callee))
+	    {
+	      set_call_stmt (e, new_stmt, false);
+	      return e;
+	    }
+	}
+    }
+
   /* Speculative edges has three component, update all of them
      when asked to.  */
   if (update_speculative && e->speculative
@@ -841,6 +893,7 @@  cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
   /* Only direct speculative edges go to call_site_hash.  */
   if (e->caller->call_site_hash
       && (!e->speculative || !e->indirect_unknown_callee)
+      && (!e->specialized || e->spec_args == NULL)
       /* It is possible that edge was previously speculative.  In this case
 	 we have different value in call stmt hash which needs preserving.  */
       && e->caller->get_edge (e->call_stmt) == e)
@@ -854,11 +907,12 @@  cgraph_edge::set_call_stmt (cgraph_edge *e, gcall *new_stmt,
   /* Update call stite hash.  For speculative calls we only record the first
      direct edge.  */
   if (e->caller->call_site_hash
-      && (!e->speculative
-	  || (e->callee
+      && ((!e->speculative && !e->specialized)
+	  || (e->speculative && e->callee
 	      && (!e->prev_callee || !e->prev_callee->speculative
 		  || e->prev_callee->call_stmt != e->call_stmt))
-	  || (e->speculative && !e->callee)))
+	  || (e->speculative && !e->callee)
+	  || e->base_specialization_edge_p ()))
     cgraph_add_edge_to_call_site_hash (e);
   return e;
 }
@@ -883,7 +937,8 @@  symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
 	 construction of call stmt hashtable.  */
       cgraph_edge *e;
       gcc_checking_assert (!(e = caller->get_edge (call_stmt))
-			   || e->speculative);
+			   || e->speculative
+			   || e->specialized);
 
       gcc_assert (is_gimple_call (call_stmt));
     }
@@ -909,6 +964,8 @@  symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
   edge->indirect_info = NULL;
   edge->indirect_inlining_edge = 0;
   edge->speculative = false;
+  edge->specialized = false;
+  edge->spec_args = NULL;
   edge->indirect_unknown_callee = indir_unknown_callee;
   if (call_stmt && caller->call_site_hash)
     cgraph_add_edge_to_call_site_hash (edge);
@@ -1066,6 +1123,11 @@  symbol_table::free_edge (cgraph_edge *e)
 void
 cgraph_edge::remove (cgraph_edge *edge)
 {
+  /* If we remove the base edge of a group of specialized
+     edges then we must also remove all of its specializations.  */
+  if (edge->base_specialization_edge_p ())
+    cgraph_edge::remove_specializations (edge);
+
   /* Call all edge removal hooks.  */
   symtab->call_edge_removal_hooks (edge);
 
@@ -1109,6 +1171,8 @@  cgraph_edge::make_speculative (cgraph_node *n2, profile_count direct_count,
   ipa_ref *ref = NULL;
   cgraph_edge *e2;
 
+  gcc_checking_assert (!specialized);
+
   if (dump_file)
     fprintf (dump_file, "Indirect call -> speculative call %s => %s\n",
 	     n->dump_name (), n2->dump_name ());
@@ -1134,6 +1198,60 @@  cgraph_edge::make_speculative (cgraph_node *n2, profile_count direct_count,
   return e2;
 }
 
+/* Mark this edge as specialized and add a new edge representing that N2
+   is a specialized version of the CALLE of this edge, with the specialized
+   arguments found in SPEC_ARGS.  */
+cgraph_edge *
+cgraph_edge::make_specialized (cgraph_node *n2,
+				vec<cgraph_specialization_info>* spec_args,
+				profile_count spec_count)
+{
+  if (speculative)
+    {
+      /* Because both speculative and specialized edges use CALL_STMT and
+	 LTO_STMT_UID to link edges together there is a limitation in
+	 specializing speculative edges.  Only one group of specialized
+	 edges can exist for a given group of speculative edges.  */
+      for (cgraph_edge *direct = first_speculative_call_target ();
+	   direct; direct = direct->next_speculative_call_target ())
+	if (direct != this && direct->specialized)
+	  return NULL;
+    }
+
+  cgraph_node *n = caller;
+  cgraph_edge *e2;
+
+  if (dump_file)
+    fprintf (dump_file, "Creating guarded specialized edge %s -> %s "
+			"from%s callee %s\n",
+			caller->dump_name (), n2->dump_name (),
+			(speculative? " speculative" : ""),
+			callee->dump_name ());
+  specialized = true;
+  e2 = n->create_edge (n2, call_stmt, spec_count);
+
+  e2->inline_failed = CIF_UNSPECIFIED;
+  if (TREE_NOTHROW (n2->decl))
+    e2->can_throw_external = false;
+  else
+    e2->can_throw_external = can_throw_external;
+
+  e2->specialized = true;
+
+  unsigned i;
+  cgraph_specialization_info* spec_info;
+  vec_alloc (e2->spec_args, spec_args->length ());
+
+  FOR_EACH_VEC_ELT (*spec_args, i, spec_info)
+    e2->spec_args->quick_push (*spec_info);
+
+  e2->lto_stmt_uid = lto_stmt_uid;
+  e2->in_polymorphic_cdtor = in_polymorphic_cdtor;
+  count -= e2->count;
+  symtab->call_edge_duplication_hooks (this, e2);
+  return e2;
+}
+
 /* Speculative call consists of an indirect edge and one or more
    direct edge+ref pairs.
 
@@ -1364,6 +1482,39 @@  cgraph_edge::make_direct (cgraph_edge *edge, cgraph_node *callee)
   return edge;
 }
 
+/* Given the base edge of a group of specialized edges remove all its
+   specialized edges.  Essentially this can be used to undo the descision
+   to specialize EDGE.  */
+
+void
+cgraph_edge::remove_specializations (cgraph_edge *edge)
+{
+  if (!edge->specialized)
+    return;
+
+  if (edge->base_specialization_edge_p ())
+    {
+      cgraph_edge *next;
+      for (cgraph_edge *e2 = edge->caller->callees; e2; e2 = next)
+	{
+	  next = e2->next_callee;
+
+	  if (e2->guarded_specialization_edge_p ()
+	      && edge->call_stmt == e2->call_stmt
+	      && edge->lto_stmt_uid == e2->lto_stmt_uid)
+	    {
+	      edge->count += e2->count;
+	      if (e2->inline_failed)
+		remove (e2);
+	      else
+		e2->callee->remove_symbol_and_inline_clones ();
+	    }
+	}
+    }
+  else
+    gcc_checking_assert (false);
+}
+
 /* Redirect callee of the edge to N.  The function does not update underlying
    call expression.  */
 
@@ -1408,9 +1559,38 @@  cgraph_edge::redirect_callee (cgraph_node *n)
 
 gimple *
 cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
+{
+  cgraph_edge *specs = NULL;
+  gcall *old_call_stmt = e->call_stmt;
+  /* If we're materializing a speculative and base specialized edge
+     then we want to keep the specializations alive.  This amounts
+     to changing the call statements of the guarded
+     specializations.  */
+  if (e->speculative && e->base_specialization_edge_p ())
+    specs = e->first_specialized_call_target ();
+
+  gcall *new_call_stmt = redirect_call_stmt_to_callee_1 (e);
+
+  if (new_call_stmt != old_call_stmt)
+    {
+      cgraph_edge *next;
+      for (; specs; specs = next)
+	{
+	  next = specs->next_specialized_call_target ();
+	  cgraph_edge *d = set_call_stmt (specs, new_call_stmt, false);
+	  gcc_assert (d == specs);
+	}
+    }
+
+  return new_call_stmt;
+}
+
+gcall *
+cgraph_edge::redirect_call_stmt_to_callee_1 (cgraph_edge *e)
 {
   tree decl = gimple_call_fndecl (e->call_stmt);
   gcall *new_stmt;
+  bool remove_specializations_if_base = true;
 
   if (e->speculative)
     {
@@ -1467,6 +1647,8 @@  cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
 	  /* Indirect edges are not both in the call site hash.
 	     get it updated.  */
 	  update_call_stmt_hash_for_removing_direct_edge (e, indirect);
+
+	  remove_specializations_if_base = false;
 	  cgraph_edge::set_call_stmt (e, new_stmt, false);
 	  e->count = gimple_bb (e->call_stmt)->count;
 
@@ -1482,6 +1664,58 @@  cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
 	}
     }
 
+  if (e->specialized)
+    {
+      if (e->spec_args != NULL)
+	{
+	  /* Be sure we redirect all specialized targets before poking
+	     about base edge.  */
+	  cgraph_edge *base = e->specialized_call_base_edge ();
+	  gcall *new_stmt;
+
+	  /* Materialization of a guarded specialiazation that has a
+	     speculative base is unsound because the guard will be outside
+	     the speculation guard.  */
+	  gcc_assert (!base->speculative);
+
+	  /* Expand specialization into GIMPLE code.  */
+	  if (dump_file)
+	    fprintf (dump_file,
+		     "Expanding specialized call of %s -> %s\n",
+		     e->caller->dump_name (), e->callee->dump_name ());
+
+	  push_cfun (DECL_STRUCT_FUNCTION (e->caller->decl));
+
+	  profile_count all = base->count;
+	  for (cgraph_edge *e2 = e->first_specialized_call_target ();
+	       e2; e2 = e2->next_specialized_call_target ())
+	    all = all + e2->count;
+
+	  profile_probability prob = e->count.probability_in (all);
+	  if (!prob.initialized_p ())
+	    prob = profile_probability::even ();
+
+	  new_stmt = gimple_sc (e, prob);
+	  e->specialized = false;
+	  if (!base->first_specialized_call_target ())
+	    base->specialized = false;
+
+	  cgraph_edge::set_call_stmt (e, new_stmt, false);
+	  e->spec_args = NULL;
+	  e->count = gimple_bb (e->call_stmt)->count;
+	  /* Once we are done with expanding the sequence, update also base
+	     call probability.  Until then the basic block accounts for the
+	     sum of specialized edges and all non-expanded specializations.  */
+	  if (!base->specialized)
+	    base->count = gimple_bb (base->call_stmt)->count;
+
+	  pop_cfun ();
+	}
+      else if (remove_specializations_if_base)
+	/* The specialized edges are in part connected by CALL_STMT so if
+	   we change it for the base edge then remove all specializations.  */
+	cgraph_edge::remove_specializations (e);
+    }
 
   if (e->indirect_unknown_callee
       || decl == e->callee->decl)
@@ -2069,6 +2303,10 @@  cgraph_edge::dump_edge_flags (FILE *f)
 {
   if (speculative)
     fprintf (f, "(speculative) ");
+  if (base_specialization_edge_p ())
+    fprintf (f, "(specialized base) ");
+  if (guarded_specialization_edge_p ())
+    fprintf (f, "(guarded specialization) ");
   if (!inline_failed)
     fprintf (f, "(inlined) ");
   if (call_stmt_cannot_inline_p)
@@ -3312,6 +3550,10 @@  verify_speculative_call (struct cgraph_node *node, gimple *stmt,
        direct = direct->next_callee)
     if (direct->call_stmt == stmt && direct->lto_stmt_uid == lto_stmt_uid)
       {
+	/* Guarded specialized edges share the same CALL_STMT and LTO_STMT_UID
+	   but are handled separately.  */
+	if (direct->guarded_specialization_edge_p ())
+	  continue;
 	if (!first_call)
 	  first_call = direct;
 	if (prev_call && direct != prev_call->next_callee)
@@ -3343,7 +3585,7 @@  verify_speculative_call (struct cgraph_node *node, gimple *stmt,
 	direct_calls[direct->speculative_id] = direct;
       }
 
-  if (first_call->call_stmt
+  if (first_call->call_stmt && node->call_site_hash
       && first_call != node->get_edge (first_call->call_stmt))
     {
       error ("call stmt hash does not point to first direct edge of "
@@ -3401,6 +3643,103 @@  verify_speculative_call (struct cgraph_node *node, gimple *stmt,
   return false;
 }
 
+/* Verify consistency of specialized call in NODE corresponding to STMT
+   and LTO_STMT_UID.  If BASE is set, assume that it is the base
+   edge of call sequence.  Return true if error is found.
+
+   This function is called to every component of specialized call (base edge
+   and specialized edges).  To save duplicated work, do full testing only
+   when testing the base edge.  */
+static bool
+verify_specialized_call (struct cgraph_node *node, gimple *stmt,
+			 unsigned int lto_stmt_uid,
+			 struct cgraph_edge *base,
+			 struct cgraph_edge *edge)
+{
+  if (base == NULL)
+    {
+      cgraph_edge *base, *iter;
+      for (base = node->callees; base;
+	   base = base->next_callee)
+	if (base->call_stmt == stmt
+	    && base->lto_stmt_uid == lto_stmt_uid
+	    && base->spec_args == NULL)
+	  break;
+      if (!base)
+	{
+	  error ("missing base call in specialized call sequence");
+	  return true;
+	}
+      if (!base->specialized)
+	{
+	  error ("base call in specialized call sequence has no "
+		 "specialized flag");
+	  return true;
+	}
+      for (iter = node->callees; iter != base;
+	   iter = iter->next_callee)
+	if (iter == edge)
+	  break;
+      if (iter == base)
+	{
+	  error ("specialized edges must precede the base specialized edge");
+	  return true;
+	}
+      for (base = base->next_callee; base;
+	   base = base->next_callee)
+	if (base->call_stmt == stmt
+	    && base->lto_stmt_uid == lto_stmt_uid
+	    && base->spec_args == NULL)
+	  {
+	    error ("cannot have more than one base edge in specialized "
+		   "call sequence");
+	    return true;
+	  }
+      return false;
+    }
+
+  cgraph_edge *prev_call = NULL;
+
+  cgraph_node *origin_base = base->callee;
+  while (origin_base->clone_of)
+    origin_base = origin_base->clone_of;
+
+  for (cgraph_edge *spec = node->callees; spec;
+       spec = spec->next_callee)
+    if (spec->call_stmt == stmt
+	&& spec->lto_stmt_uid == lto_stmt_uid
+	&& spec->spec_args != NULL)
+      {
+	cgraph_node *origin_spec = spec->callee;
+	while (origin_spec->clone_of)
+	  origin_spec = origin_spec->clone_of;
+
+	if (spec->callee->clone_of && origin_base != origin_spec)
+	  {
+	    error ("specialized call to %s in specialized call sequence has "
+		   "different origin than base %s %s %s",
+		   origin_spec->dump_name (), origin_base->dump_name (),
+		   base->callee->dump_name (), spec->callee->dump_name ());
+	    return true;
+	  }
+
+	if (prev_call && spec != prev_call->next_callee)
+	  {
+	    error ("specialized edges are not adjacent");
+	    return true;
+	  }
+	prev_call = spec;
+	if (!spec->specialized)
+	  {
+	    error ("call to %s in specialized call sequence has no "
+		   "specialized flag", spec->callee->dump_name ());
+	    return true;
+	  }
+      }
+
+  return false;
+}
+
 /* Verify cgraph nodes of given cgraph node.  */
 DEBUG_FUNCTION void
 cgraph_node::verify_node (void)
@@ -3577,6 +3916,7 @@  cgraph_node::verify_node (void)
       if (gimple_has_body_p (e->caller->decl)
 	  && !e->caller->inlined_to
 	  && !e->speculative
+	  && !e->specialized
 	  /* Optimized out calls are redirected to __builtin_unreachable.  */
 	  && (e->count.nonzero_p ()
 	      || ! e->callee->decl
@@ -3603,6 +3943,10 @@  cgraph_node::verify_node (void)
 	  && verify_speculative_call (e->caller, e->call_stmt, e->lto_stmt_uid,
 				      NULL))
 	error_found = true;
+      if (e->specialized
+	  && verify_specialized_call (e->caller, e->call_stmt, e->lto_stmt_uid,
+				      (e->spec_args == NULL? e : NULL), e))
+	error_found = true;
     }
   for (e = indirect_calls; e; e = e->next_callee)
     {
@@ -3611,6 +3955,7 @@  cgraph_node::verify_node (void)
       if (gimple_has_body_p (e->caller->decl)
 	  && !e->caller->inlined_to
 	  && !e->speculative
+	  && !e->specialized
 	  && e->count.ipa_p ()
 	  && count
 	      == ENTRY_BLOCK_PTR_FOR_FN (DECL_STRUCT_FUNCTION (decl))->count
@@ -3629,6 +3974,11 @@  cgraph_node::verify_node (void)
 	  && verify_speculative_call (e->caller, e->call_stmt, e->lto_stmt_uid,
 				      e))
 	error_found = true;
+      if (e->specialized || e->spec_args != NULL)
+	{
+	  error ("Cannot have specialized edges in indirect call");
+	  error_found = true;
+	}
     }
   for (i = 0; iterate_reference (i, ref); i++)
     {
@@ -3823,7 +4173,7 @@  cgraph_node::verify_node (void)
 
       for (e = callees; e; e = e->next_callee)
 	{
-	  if (!e->aux && !e->speculative)
+	  if (!e->aux && !e->speculative && !e->specialized)
 	    {
 	      error ("edge %s->%s has no corresponding call_stmt",
 		     identifier_to_locale (e->caller->name ()),
@@ -3835,7 +4185,7 @@  cgraph_node::verify_node (void)
 	}
       for (e = indirect_calls; e; e = e->next_callee)
 	{
-	  if (!e->aux && !e->speculative)
+	  if (!e->aux && !e->speculative && !e->specialized)
 	    {
 	      error ("an indirect edge from %s has no corresponding call_stmt",
 		     identifier_to_locale (e->caller->name ()));
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 4be67e3cea9..6ae84ce01dd 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1683,6 +1683,19 @@  public:
   unsigned vptr_changed : 1;
 };
 
+class GTY (()) cgraph_specialization_info
+{
+public:
+  unsigned arg_idx;
+  int is_unsigned; /* Whether the specialization constant is unsigned.  */
+  union
+    {
+      HOST_WIDE_INT GTY ((tag ("0"))) sval;
+      unsigned HOST_WIDE_INT GTY ((tag ("1"))) uval;
+    }
+  GTY ((desc ("%1.is_unsigned"))) cst;
+};
+
 class GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
 	   for_user)) cgraph_edge
 {
@@ -1723,6 +1736,12 @@  public:
    */
   cgraph_edge *make_speculative (cgraph_node *n2, profile_count direct_count,
 				 unsigned int speculative_id = 0);
+  /* Mark that this edge represents a specialized call to N2.
+     SPEC_ARGS represent the position and values of the CALL_STMT of this edge
+     that are specialized in N2.  */
+  cgraph_edge *make_specialized (cgraph_node *n2,
+				 vec<cgraph_specialization_info> *spec_args,
+				 profile_count spec_count);
 
   /* Speculative call consists of an indirect edge and one or more
      direct edge+ref pairs.  Speculative will expand to the following sequence:
@@ -1802,6 +1821,66 @@  public:
     gcc_unreachable ();
   }
 
+  /* Return the first edge that represents a specialization of the CALL_STMT
+     of this edge if one exists or NULL otherwise.  */
+  cgraph_edge *first_specialized_call_target ()
+  {
+    gcc_checking_assert (specialized && callee);
+    for (cgraph_edge *e2 = caller->callees;
+	 e2; e2 = e2->next_callee)
+      if (e2->guarded_specialization_edge_p ()
+	  && call_stmt == e2->call_stmt
+	  && lto_stmt_uid == e2->lto_stmt_uid)
+	return e2;
+
+    return NULL;
+  }
+
+  /* Return the next edge that represents a specialization of the CALL_STMT
+     of this edge if one exists or NULL otherwise.  */
+  cgraph_edge *next_specialized_call_target ()
+  {
+    cgraph_edge *e = this;
+    gcc_checking_assert (specialized && callee);
+
+    if (e->next_callee
+	&& e->next_callee->guarded_specialization_edge_p ()
+	&& e->next_callee->call_stmt == e->call_stmt
+	&& e->next_callee->lto_stmt_uid == e->lto_stmt_uid)
+      return e->next_callee;
+    return NULL;
+  }
+
+  /* When called on any edge in a specialized call return the (unique)
+     edge that points to the non specialized function.  */
+  cgraph_edge *specialized_call_base_edge ()
+  {
+    gcc_checking_assert (specialized && callee);
+    for (cgraph_edge *e2 = caller->callees;
+	 e2; e2 = e2->next_callee)
+      if (e2->base_specialization_edge_p ()
+	  && call_stmt == e2->call_stmt
+	  && lto_stmt_uid == e2->lto_stmt_uid)
+	return e2;
+
+    return NULL;
+  }
+
+  /* Return true iff this edge is part of specialized sequence and is the
+     original edge for which other specialization edges potentially exist.  */
+  bool base_specialization_edge_p () const
+  {
+    return specialized && spec_args == NULL;
+  }
+
+  /* Return true iff this edge is part of specialized sequence and it
+     represents a potential specialization target that canbe used instead
+     of the base edge.  */
+  bool guarded_specialization_edge_p () const
+  {
+    return specialized && spec_args != NULL;
+  }
+
   /* Speculative call edge turned out to be direct call to CALLEE_DECL.  Remove
      the speculative call sequence and return edge representing the call, the
      original EDGE can be removed and deallocated.  It is up to caller to
@@ -1820,6 +1899,11 @@  public:
   static cgraph_edge *resolve_speculation (cgraph_edge *edge,
 					   tree callee_decl = NULL);
 
+  /* Given the base edge of a group of specialized edges remove all its
+     specialized edges.  Essentially this can be used to undo the descision
+     to specialize EDGE.  */
+  static void remove_specializations (cgraph_edge *edge);
+
   /* If necessary, change the function declaration in the call statement
      associated with edge E so that it corresponds to the edge callee.
      Speculations can be resolved in the process and EDGE can be removed and
@@ -1895,6 +1979,9 @@  public:
   /* Additional information about an indirect call.  Not cleared when an edge
      becomes direct.  */
   cgraph_indirect_call_info *indirect_info;
+  /* If this edge has a specialized function as a callee then this vector
+     holds the indices and values of the specialized arguments.  */
+  vec<cgraph_specialization_info>* GTY ((skip (""))) spec_args;
   void *GTY ((skip (""))) aux;
   /* When equal to CIF_OK, inline this call.  Otherwise, points to the
      explanation why function was not inlined.  */
@@ -1933,6 +2020,21 @@  public:
      Optimizers may later redirect direct call to clone, so 1) and 3)
      do not need to necessarily agree with destination.  */
   unsigned int speculative : 1;
+  /* Edges with SPECIALIZED flag represents calls that have additional
+     specialized functions that can be used instead (as a result of ipa-cp).
+     The final code sequence will have form:
+
+     if (specialized_arg_0 == specialized_const_0
+	 && ...
+	 && specialized_arg_i == specialized_const_i)
+       call_target.constprop.N (non_specialized_arg_0, ...);
+     ...
+     more potential specializations
+     ...
+     else
+       call_target ();
+  */
+  unsigned int specialized : 1;
   /* Set to true when caller is a constructor or destructor of polymorphic
      type.  */
   unsigned in_polymorphic_cdtor : 1;
@@ -1964,6 +2066,9 @@  private:
      callers. */
   void set_callee (cgraph_node *n);
 
+  /* Worker for redirect_call_stmt_to_callee.  */
+  static gcall *redirect_call_stmt_to_callee_1 (cgraph_edge *e);
+
   /* Output flags of edge to a file F.  */
   void dump_edge_flags (FILE *f);
 
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index bb4b3c5407d..30d85c6789f 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -141,6 +141,20 @@  cgraph_edge::clone (cgraph_node *n, gcall *call_stmt, unsigned stmt_uid,
   new_edge->can_throw_external = can_throw_external;
   new_edge->call_stmt_cannot_inline_p = call_stmt_cannot_inline_p;
   new_edge->speculative = speculative;
+
+  new_edge->specialized = specialized;
+  new_edge->spec_args = NULL;
+
+  if (spec_args)
+    {
+      unsigned i;
+      cgraph_specialization_info* spec_info;
+      vec_alloc (new_edge->spec_args, spec_args->length ());
+
+      FOR_EACH_VEC_ELT (*spec_args, i, spec_info)
+	new_edge->spec_args->quick_push (*spec_info);
+    }
+
   new_edge->in_polymorphic_cdtor = in_polymorphic_cdtor;
 
   /* Update IPA profile.  Local profiles need no updating in original.  */
@@ -430,11 +444,23 @@  cgraph_node::create_clone (tree new_decl, profile_count prof_count,
     }
   new_node->expand_all_artificial_thunks ();
 
+  /* When an edge is created it is added at the begining of the callee list.
+     If we clone the edges in the order they appear in the lists then the
+     new node will have them backwards.  In order to maintain the order which
+     may be needed for speculative edges, we iterate in revese.  */
+  cgraph_edge *last_callee = NULL;
   for (e = callees;e; e=e->next_callee)
+    last_callee = e;
+
+  for (e = last_callee;e; e=e->prev_callee)
     e->clone (new_node, e->call_stmt, e->lto_stmt_uid, new_node->count, old_count,
 	      update_original);
 
+  last_callee = NULL;
   for (e = indirect_calls; e; e = e->next_callee)
+    last_callee = e;
+
+  for (e = last_callee; e; e = e->prev_callee)
     e->clone (new_node, e->call_stmt, e->lto_stmt_uid,
 	      new_node->count, old_count, update_original);
   new_node->clone_references (this);
@@ -791,6 +817,22 @@  cgraph_node::set_call_stmt_including_clones (gimple *old_stmt,
 		  }
 		indirect->speculative = false;
 	      }
+
+	    if (edge->specialized && !update_speculative)
+	      {
+		cgraph_edge *base = edge->specialized_call_base_edge ();
+
+		for (cgraph_edge *next, *specialized
+			= edge->first_specialized_call_target ();
+		     specialized;
+		     specialized = next)
+		  {
+		    next = specialized->next_specialized_call_target ();
+		    specialized->specialized = false;
+		  }
+		base->specialized = false;
+	      }
+
 	  }
 	if (node->clones)
 	  node = node->clones;
diff --git a/gcc/common.opt b/gcc/common.opt
index 562d73d7f55..96c90b3cc3a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1932,6 +1932,10 @@  fipa-bit-cp
 Common Var(flag_ipa_bit_cp) Optimization
 Perform interprocedural bitwise constant propagation.
 
+fipa-guarded-specialization
+Common Var(flag_ipa_guarded_specialization) Optimization
+Add speculative edges for existing specialized functions.
+
 fipa-modref
 Common Var(flag_ipa_modref) Optimization
 Perform interprocedural modref analysis.
diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index cc031ebed0f..31d01ada928 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -119,6 +119,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "symbol-summary.h"
 #include "tree-vrp.h"
 #include "ipa-prop.h"
+#include "gimple-pretty-print.h"
 #include "tree-pretty-print.h"
 #include "tree-inline.h"
 #include "ipa-fnsummary.h"
@@ -5239,16 +5240,20 @@  want_remove_some_param_p (cgraph_node *node, vec<tree> known_csts)
   return false;
 }
 
+static hash_map<cgraph_node*, vec<cgraph_node*>> *available_specializations;
+
 /* Create a specialized version of NODE with known constants in KNOWN_CSTS,
    known contexts in KNOWN_CONTEXTS and known aggregate values in AGGVALS and
-   redirect all edges in CALLERS to it.  */
+   redirect all edges in CALLERS to it.  If IS_SPECULATIVE is true then this
+   node is created to be part of a guarded specialization edge.  */
 
 static struct cgraph_node *
 create_specialized_node (struct cgraph_node *node,
 			 vec<tree> known_csts,
 			 vec<ipa_polymorphic_call_context> known_contexts,
 			 vec<ipa_argagg_value, va_gc> *aggvals,
-			 vec<cgraph_edge *> &callers)
+			 vec<cgraph_edge *> &callers,
+			 bool is_speculative)
 {
   ipa_node_params *new_info, *info = ipa_node_params_sum->get (node);
   vec<ipa_replace_map *, va_gc> *replace_trees = NULL;
@@ -5383,7 +5388,7 @@  create_specialized_node (struct cgraph_node *node,
   for (const ipa_argagg_value &av : aggvals)
     new_node->maybe_create_reference (av.value, NULL);
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
+  if (dump_file && (dump_flags & TDF_DETAILS) && !is_speculative)
     {
       fprintf (dump_file, "     the new node is %s.\n", new_node->dump_name ());
       if (known_contexts.exists ())
@@ -5409,6 +5414,13 @@  create_specialized_node (struct cgraph_node *node,
   new_info->known_csts = known_csts;
   new_info->known_contexts = known_contexts;
 
+  if (is_speculative && !info->ipcp_orig_node)
+    {
+      vec<cgraph_node*> &spec_nodes
+	= available_specializations->get_or_insert (node);
+      spec_nodes.safe_push (new_node);
+    }
+
   ipcp_discover_new_direct_edges (new_node, known_csts, known_contexts,
 				  aggvals);
 
@@ -6104,6 +6116,21 @@  decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
       known_csts = avals->m_known_vals.copy ();
       known_contexts = copy_useful_known_contexts (avals->m_known_contexts);
     }
+
+  /* If guarded specialization is enabled then we create an additional
+     clone with KNOWN_CSTS and no known contexts or aggregates.
+     We don't want find_more_scalar_values because adding more constants
+     instreases the complexity of the guard and reduces the chance
+     that it is used.  */
+  if (flag_ipa_guarded_specialization && !val->self_recursion_generated_p ())
+    {
+      vec<cgraph_edge *> no_callers = vNULL;
+      cgraph_node *guarded_spec_node
+	= create_specialized_node (node, known_csts.copy (), vNULL,
+						 NULL, no_callers, true);
+      update_profiling_info (node, guarded_spec_node);
+    }
+
   find_more_scalar_values_for_callers_subset (node, known_csts, callers);
   find_more_contexts_for_caller_subset (node, &known_contexts, callers);
   vec<ipa_argagg_value, va_gc> *aggvals
@@ -6111,7 +6138,7 @@  decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset,
   gcc_checking_assert (ipcp_val_agg_replacement_ok_p (aggvals, index,
 						      offset, val->value));
   val->spec_node = create_specialized_node (node, known_csts, known_contexts,
-					    aggvals, callers);
+					    aggvals, callers, false);
 
   if (val->self_recursion_generated_p ())
     self_gen_clones->safe_push (val->spec_node);
@@ -6270,7 +6297,7 @@  decide_whether_version_node (struct cgraph_node *node)
 	  known_contexts = vNULL;
 	}
       clone = create_specialized_node (node, known_csts, known_contexts,
-				       aggvals, callers);
+				       aggvals, callers, false);
       info->do_clone_for_all_contexts = false;
       ipa_node_params_sum->get (clone)->is_all_contexts_clone = true;
       ret = true;
@@ -6546,6 +6573,135 @@  ipcp_store_vr_results (void)
     }
 }
 
+/* Add new edges to the call graph to represent the available specializations
+   of each specialized function.  */
+static void
+add_specialized_edges (void)
+{
+  cgraph_edge *e;
+  cgraph_node *n, *spec_n;
+  tree known_cst;
+  unsigned i, j;
+
+  FOR_EACH_DEFINED_FUNCTION (n)
+    {
+      if (dump_file && n->callees)
+	fprintf (dump_file,
+		 "Procesing function %s for specialization of edges.\n",
+		 n->dump_name ());
+
+      if (n->ipcp_clone)
+	continue;
+
+      bool update = false;
+      for (e = n->callees; e; e = e->next_callee)
+	{
+	  if (!e->callee || e->recursive_p ())
+	    continue;
+
+	  vec<cgraph_node*> *specialization_nodes
+	    = available_specializations->get (e->callee);
+
+	  /* Even if the calle is a specialized node it is still valid to
+	     further create guarded specializations based on the original node.
+	     If the existing specialized node doesn't have any known constants
+	     then it is probably profitable to specialize further.  */
+	  if (e->callee->ipcp_clone && !specialization_nodes)
+	    {
+	      ipa_node_params *info
+		= ipa_node_params_sum->get (e->callee);
+	      gcc_checking_assert (info->ipcp_orig_node);
+
+	      bool has_known_constant = false;
+	      FOR_EACH_VEC_ELT (info->known_csts, i, known_cst)
+		if (known_cst != NULL_TREE)
+		  {
+		    has_known_constant = true;
+		    break;
+		  }
+
+	      if (!has_known_constant)
+		specialization_nodes
+		  = available_specializations->get (info->ipcp_orig_node);
+	    }
+
+	  if (!specialization_nodes)
+	    continue;
+
+	  unsigned num_of_specializations = 0;
+	  unsigned max_num_of_specializations = opt_for_fn (n->decl,
+						  param_ipa_spec_max_per_edge);
+
+	  FOR_EACH_VEC_ELT (*specialization_nodes, i, spec_n)
+	    {
+	      if (dump_file)
+		fprintf (dump_file,
+			 "Edge has available specialization %s.\n",
+			 spec_n->dump_name ());
+
+	      ipa_node_params *spec_params = ipa_node_params_sum->get (spec_n);
+	      vec<cgraph_specialization_info> replaced_args = vNULL;
+	      bool failed = false;
+
+	      FOR_EACH_VEC_ELT (spec_params->known_csts, j, known_cst)
+		{
+		  if (known_cst != NULL_TREE)
+		    {
+		      if (TREE_CODE (known_cst) == INTEGER_CST
+			  && TYPE_UNSIGNED (TREE_TYPE (known_cst))
+			  && tree_fits_uhwi_p (known_cst))
+			{
+			      cgraph_specialization_info spec_info;
+			      spec_info.arg_idx = j;
+			      spec_info.is_unsigned = 1;
+			      spec_info.cst.uval = tree_to_uhwi (known_cst);
+			      replaced_args.safe_push (spec_info);
+			}
+		      else if (TREE_CODE (known_cst) == INTEGER_CST
+			       && !TYPE_UNSIGNED (TREE_TYPE (known_cst))
+			       && tree_fits_shwi_p (known_cst))
+			{
+			      cgraph_specialization_info spec_info;
+			      spec_info.arg_idx = j;
+			      spec_info.is_unsigned = 0;
+			      spec_info.cst.uval = tree_to_shwi (known_cst);
+			      replaced_args.safe_push (spec_info);
+			}
+		      else
+			{
+			  failed = true;
+			  break;
+			}
+		    }
+		}
+
+	      unsigned max_guard_complexity = opt_for_fn (n->decl,
+					   param_ipa_spec_guard_complexity);
+
+	      if (!failed && replaced_args.length () > 0
+		  && (replaced_args.length () < max_guard_complexity
+		      || max_guard_complexity == 0))
+		{
+		  if (e->make_specialized (spec_n,
+					   &replaced_args,
+					   e->count.apply_scale (1, 10)))
+		    {
+		      num_of_specializations++;
+		      update = true;
+
+		      if (num_of_specializations > max_num_of_specializations
+			  && max_num_of_specializations != 0)
+			break;
+		    }
+		}
+	    }
+	}
+
+      if (update)
+	ipa_update_overall_fn_summary (n);
+    }
+}
+
 /* The IPCP driver.  */
 
 static unsigned int
@@ -6559,6 +6715,7 @@  ipcp_driver (void)
   ipa_check_create_node_params ();
   ipa_check_create_edge_args ();
   clone_num_suffixes = new hash_map<const char *, unsigned>;
+  available_specializations = new hash_map<cgraph_node*, vec<cgraph_node*>>;
 
   if (dump_file)
     {
@@ -6578,8 +6735,12 @@  ipcp_driver (void)
   ipcp_store_bits_results ();
   /* Store results of value range propagation.  */
   ipcp_store_vr_results ();
+  /* Add new edges for specializations.  */
+  if (flag_ipa_guarded_specialization)
+    add_specialized_edges ();
 
   /* Free all IPCP structures.  */
+  delete available_specializations;
   delete clone_num_suffixes;
   free_toporder_info (&topo);
   delete edge_clone_summaries;
diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index fd3d7d6c5e8..a1f219a056e 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -257,6 +257,13 @@  redirect_to_unreachable (struct cgraph_edge *e)
     e = cgraph_edge::resolve_speculation (e, target->decl);
   else if (!e->callee)
     e = cgraph_edge::make_direct (e, target);
+  else if (e->base_specialization_edge_p ())
+    {
+      /* If the base edge becomes unreachable there's no reason to
+	 keep the specializations around.  */
+      cgraph_edge::remove_specializations (e);
+      e->redirect_callee (target);
+    }
   else
     e->redirect_callee (target);
   class ipa_call_summary *es = ipa_call_summaries->get (e);
@@ -866,6 +873,7 @@  ipa_fn_summary_t::duplicate (cgraph_node *src,
 	  ipa_predicate new_predicate;
 	  class ipa_call_summary *es = ipa_call_summaries->get (edge);
 	  next = edge->next_callee;
+	  bool update_next = edge->specialized;
 
 	  if (!edge->inline_failed)
 	    inlined_to_p = true;
@@ -876,6 +884,9 @@  ipa_fn_summary_t::duplicate (cgraph_node *src,
 	  if (new_predicate == false && *es->predicate != false)
 	    optimized_out_size += es->call_stmt_size * ipa_fn_summary::size_scale;
 	  edge_set_predicate (edge, &new_predicate);
+	  /* NEXT may be invalidated for specialized calls.  */
+	  if (update_next)
+	    next = edge->next_callee;
 	}
 
       /* Remap indirect edge predicates with the same simplification as above.
@@ -2825,6 +2836,29 @@  analyze_function_body (struct cgraph_node *node, bool early)
 						     es, es3);
 		    }
 		}
+	      if (edge->specialized)
+		{
+		  cgraph_edge *base
+			= edge->specialized_call_base_edge ();
+		  ipa_call_summary *es2
+			 = ipa_call_summaries->get_create (base);
+		  ipa_call_summaries->duplicate (edge, base,
+						 es, es2);
+
+		  /* Edge is the first direct call.
+		     create and duplicate call summaries for multiple
+		     speculative call targets.  */
+		  for (cgraph_edge *specialization
+			 = edge->next_specialized_call_target ();
+		       specialization; specialization
+			 = specialization->next_specialized_call_target ())
+		    {
+		      ipa_call_summary *es3
+			= ipa_call_summaries->get_create (specialization);
+		      ipa_call_summaries->duplicate (edge, specialization,
+						     es, es3);
+		    }
+		}
 	    }
 
 	  /* TODO: When conditional jump or switch is known to be constant, but
@@ -3275,6 +3309,9 @@  estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size,
 			     sreal *time, ipa_call_arg_values *avals,
 			     ipa_hints *hints)
 {
+  if (e->guarded_specialization_edge_p ())
+    return;
+
   class ipa_call_summary *es = ipa_call_summaries->get (e);
   int call_size = es->call_stmt_size;
   int call_time = es->call_stmt_time;
@@ -4050,6 +4087,7 @@  remap_edge_summaries (struct cgraph_edge *inlined_edge,
     {
       ipa_predicate p;
       next = e->next_callee;
+      bool update_next = e->specialized;
 
       if (e->inline_failed)
 	{
@@ -4073,6 +4111,10 @@  remap_edge_summaries (struct cgraph_edge *inlined_edge,
 		              params_summary, callee_info,
 			      operand_map, offset_map, possible_truths,
 			      toplev_predicate);
+
+      /* NEXT may be invalidated for specialized calls.  */
+      if (update_next)
+	next = e->next_callee;
     }
   for (e = node->indirect_calls; e; e = next)
     {
diff --git a/gcc/ipa-inline-transform.cc b/gcc/ipa-inline-transform.cc
index 07288e57c73..888c3a7e718 100644
--- a/gcc/ipa-inline-transform.cc
+++ b/gcc/ipa-inline-transform.cc
@@ -775,11 +775,27 @@  inline_transform (struct cgraph_node *node)
     }
 
   maybe_materialize_called_clones (node);
+
+  /* Verify NODE before doing potential speculative transformations.  */
+  if (flag_checking)
+    node->verify ();
+
+  /* Perform call statement redirection in two steps.  In the first step
+     only consider speculative edges and then process the rest in a separate
+     step.  This is required due to the potential existance of edges that are
+     both speculative and specialized, in which case we need to process them
+     in this order.  */
   for (e = node->callees; e; e = next)
     {
       if (!e->inline_failed)
 	has_inline = true;
       next = e->next_callee;
+      if (e->speculative)
+	cgraph_edge::redirect_call_stmt_to_callee (e);
+    }
+  for (e = node->callees; e; e = next)
+    {
+      next = e->next_callee;
       cgraph_edge::redirect_call_stmt_to_callee (e);
     }
   node->remove_all_references ();
diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
index 14969198cde..5a86c25caf2 100644
--- a/gcc/ipa-inline.cc
+++ b/gcc/ipa-inline.cc
@@ -1185,6 +1185,7 @@  edge_badness (struct cgraph_edge *edge, bool dump)
   edge_time = estimate_edge_time (edge, &unspec_edge_time);
   hints = estimate_edge_hints (edge);
   gcc_checking_assert (edge_time >= 0);
+
   /* Check that inlined time is better, but tolerate some roundoff issues.
      FIXME: When callee profile drops to 0 we account calls more.  This
      should be fixed by never doing that.  */
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 350195d86db..c8250f7b73c 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -271,6 +271,8 @@  lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
   bp_pack_value (&bp, edge->speculative_id, 16);
   bp_pack_value (&bp, edge->indirect_inlining_edge, 1);
   bp_pack_value (&bp, edge->speculative, 1);
+  bp_pack_value (&bp, edge->specialized, 1);
+  bp_pack_value (&bp, edge->spec_args != NULL, 1);
   bp_pack_value (&bp, edge->call_stmt_cannot_inline_p, 1);
   gcc_assert (!edge->call_stmt_cannot_inline_p
 	      || edge->inline_failed != CIF_BODY_NOT_AVAILABLE);
@@ -295,7 +297,27 @@  lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
       bp_pack_value (&bp, edge->indirect_info->num_speculative_call_targets,
 		     16);
     }
+
   streamer_write_bitpack (&bp);
+
+  if (edge->spec_args != NULL)
+    {
+      cgraph_specialization_info *spec_info;
+      unsigned len = edge->spec_args->length (), i;
+      streamer_write_uhwi_stream (ob->main_stream, len);
+
+      FOR_EACH_VEC_ELT (*edge->spec_args, i, spec_info)
+	{
+	  unsigned idx = spec_info->arg_idx;
+	  streamer_write_uhwi_stream (ob->main_stream, idx);
+	  streamer_write_hwi_stream (ob->main_stream, spec_info->is_unsigned);
+
+	  if (spec_info->is_unsigned)
+	    streamer_write_uhwi_stream (ob->main_stream, spec_info->cst.uval);
+	  else
+	    streamer_write_hwi_stream (ob->main_stream, spec_info->cst.sval);
+	}
+    }
 }
 
 /* Return if NODE contain references from other partitions.  */
@@ -1517,6 +1539,8 @@  input_edge (class lto_input_block *ib, vec<symtab_node *> nodes,
 
   edge->indirect_inlining_edge = bp_unpack_value (&bp, 1);
   edge->speculative = bp_unpack_value (&bp, 1);
+  edge->specialized = bp_unpack_value (&bp, 1);
+  bool has_edge_spec_args = bp_unpack_value (&bp, 1);
   edge->lto_stmt_uid = stmt_id;
   edge->speculative_id = speculative_id;
   edge->inline_failed = inline_failed;
@@ -1542,6 +1566,28 @@  input_edge (class lto_input_block *ib, vec<symtab_node *> nodes,
       edge->indirect_info->num_speculative_call_targets
 	= bp_unpack_value (&bp, 16);
     }
+
+  if (has_edge_spec_args)
+    {
+      unsigned len = streamer_read_uhwi (ib);
+      vec_alloc (edge->spec_args, len);
+
+      for (unsigned i = 0; i < len; i++)
+	{
+	  cgraph_specialization_info spec_info;
+	  spec_info.arg_idx = streamer_read_uhwi (ib);
+	  spec_info.is_unsigned = streamer_read_hwi (ib);
+
+	  if (spec_info.is_unsigned)
+	    spec_info.cst.uval = streamer_read_uhwi (ib);
+	  else
+	    spec_info.cst.sval = streamer_read_hwi (ib);
+
+	  edge->spec_args->quick_push (spec_info);
+      }
+    }
+  else
+    edge->spec_args = NULL;
 }
 
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 397ec0bd128..6853cc3ca60 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -218,6 +218,14 @@  The upper bound for sharing integer constants.
 Common Joined UInteger Var(param_ipa_cp_eval_threshold) Init(500) Param Optimization
 Threshold ipa-cp opportunity evaluation that is still considered beneficial to clone.
 
+-param=ipa-guarded-specialization-guard-complexity=
+Common Joined UInteger Var(param_ipa_spec_guard_complexity) Init(2) Param Optimization
+Maximum number of required comparisons for a single specialization guard.
+
+-param=ipa-guarded-specializations-per-edge=
+Common Joined UInteger Var(param_ipa_spec_max_per_edge) Init(3) Param Optimization
+Maximum number of guarded specializations for a single function call.
+
 -param=ipa-cp-loop-hint-bonus=
 Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimization
 Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 8091ba8f13b..b8a822770e3 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -2247,13 +2247,15 @@  copy_bb (copy_body_data *id, basic_block bb,
 		  edge = id->src_node->get_edge (orig_stmt);
 		  if (edge)
 		    {
+		      if (edge->guarded_specialization_edge_p ())
+			edge = edge->specialized_call_base_edge ();
 		      struct cgraph_edge *old_edge = edge;
-
+		      struct cgraph_edge *speculative_specialized_edge = NULL;
 		      /* A speculative call is consist of multiple
 			 edges - indirect edge and one or more direct edges
 			 Duplicate the whole thing and distribute frequencies
 			 accordingly.  */
-		      if (edge->speculative)
+		      if (old_edge->speculative)
 			{
 			  int n = 0;
 			  profile_count direct_cnt
@@ -2290,6 +2292,10 @@  copy_bb (copy_body_data *id, basic_block bb,
 					 (prob);
 			      n++;
 			    }
+
+			  if (old_edge->specialized)
+			    speculative_specialized_edge = edge;
+
 			  gcc_checking_assert
 				 (indirect->num_speculative_call_targets_p ()
 				  == n);
@@ -2307,7 +2313,67 @@  copy_bb (copy_body_data *id, basic_block bb,
 			  indirect->count
 			     = copy_basic_block->count.apply_probability (prob);
 			}
-		      else
+		      /* A specialized call is consist of multiple
+			 edges - a base edge and one or more specialized edges.
+			 Duplicate and distribute frequencies in a way similar
+			 to the speculative edges.  */
+		      if (old_edge->specialized)
+			{
+			  int n = 0;
+			  cgraph_edge *first
+				 = old_edge->first_specialized_call_target ();
+			  profile_count spec_cnt
+				 = profile_count::zero ();
+
+			  /* First figure out the distribution of counts
+			     so we can re-scale BB profile accordingly.  */
+			  for (cgraph_edge *e = first; e;
+			       e = e->next_specialized_call_target ())
+			    spec_cnt = spec_cnt + e->count;
+
+			  cgraph_edge *base
+				 = old_edge->specialized_call_base_edge ();
+			  profile_count base_cnt = base->count;
+
+			  /* Next iterate all specializations, clone them
+			     and update the profile.  */
+			  for (cgraph_edge *e = first; e;
+			       e = e->next_specialized_call_target ())
+			    {
+			      profile_count cnt = e->count;
+
+			      edge = e->clone (id->dst_node, call_stmt,
+					       gimple_uid (stmt), num, den,
+					       true);
+			      profile_probability prob
+				 = cnt.probability_in (spec_cnt
+						       + base_cnt);
+			      edge->count
+				 = copy_basic_block->count.apply_probability
+					 (prob);
+			      n++;
+			    }
+
+			  /* Duplicate the base edge after all specialized
+			     edges cloned.  */
+			  if (old_edge->speculative)
+			    base = speculative_specialized_edge;
+			  else
+			  {
+			    base = base->clone (id->dst_node, call_stmt,
+						gimple_uid (stmt),
+						num, den,
+						true);
+			  }
+
+			  profile_probability prob
+			     = base_cnt.probability_in (spec_cnt
+							 + base_cnt);
+			  base->count
+			     = copy_basic_block->count.apply_probability (prob);
+			}
+
+		      if (!old_edge->speculative && !old_edge->specialized)
 			{
 			  edge = edge->clone (id->dst_node, call_stmt,
 					      gimple_uid (stmt),
@@ -3003,6 +3069,9 @@  redirect_all_calls (copy_body_data * id, basic_block bb)
 	  struct cgraph_edge *edge = id->dst_node->get_edge (stmt);
 	  if (edge)
 	    {
+	      if (edge->guarded_specialization_edge_p ())
+		edge = edge->specialized_call_base_edge ();
+
 	      gimple *new_stmt
 		= cgraph_edge::redirect_call_stmt_to_callee (edge);
 	      /* If IPA-SRA transformation, run as part of edge redirection,
diff --git a/gcc/value-prof.cc b/gcc/value-prof.cc
index 9656ce5870d..05fc2138724 100644
--- a/gcc/value-prof.cc
+++ b/gcc/value-prof.cc
@@ -42,6 +42,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "dumpfile.h"
 #include "builtins.h"
+#include "tree-cfg.h"
+#include "tree-dfa.h"
 
 /* In this file value profile based optimizations are placed.  Currently the
    following optimizations are implemented (for more detailed descriptions
@@ -1434,6 +1436,227 @@  gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
   return dcall_stmt;
 }
 
+/* Do transformation
+
+  if (arg_i == spec_args[y] && ...)
+    do call to specialized target callee
+  else
+    old call
+ */
+
+gcall *
+gimple_sc (struct cgraph_edge *edg, profile_probability prob)
+{
+  /* The call statement we're modifying.  */
+  gcall *call_stmt = edg->call_stmt;
+  /* The cgraph_node of the specialized function.  */
+  cgraph_node *callee = edg->callee;
+  vec<cgraph_specialization_info> *spec_args = edg->spec_args;
+
+  /* CALL_STMT should be the call_stmt of the generic function.  */
+  gcc_checking_assert (edg->specialized_call_base_edge ()->call_stmt
+		      == call_stmt);
+
+  gcall *spec_call_stmt = NULL;
+  tree cond_tree = NULL_TREE;
+  gcond *cond_stmt = NULL;
+  basic_block cond_bb, dcall_bb, icall_bb, join_bb = NULL;
+  edge e_cd, e_ci, e_di, e_dj = NULL, e_ij;
+  gimple_stmt_iterator gsi;
+  int lp_nr, dflags;
+  edge e_eh, e;
+  edge_iterator ei;
+
+  cond_bb = gimple_bb (call_stmt);
+  gsi = gsi_for_stmt (call_stmt);
+
+  /* To call the specialized function we need to build a guard conditional
+     with the specialized arguments and constants.  */
+  unsigned nargs = gimple_call_num_args (call_stmt);
+  unsigned cur_spec = 0;
+  bool dump_first = true;
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "Creating specialization guard for edge %s -> %s:\n",
+			 edg->caller->dump_name (), edg->callee->dump_name ());
+      fprintf (dump_file, "if (");
+    }
+
+  for (unsigned arg_idx = 0; arg_idx < nargs; arg_idx++)
+    {
+      tree cur_arg = gimple_call_arg (call_stmt, arg_idx);
+      bool cur_arg_specialized_p = cur_spec < spec_args->length ()
+	&& arg_idx == (*spec_args)[cur_spec].arg_idx;
+
+      if (cur_arg_specialized_p)
+	{
+	  gcc_checking_assert (!cond_stmt);
+
+	  cgraph_specialization_info spec_info = (*spec_args)[cur_spec];
+	  cur_spec++;
+
+	  tree spec_v;
+	  if (spec_info.is_unsigned)
+	    spec_v = build_int_cstu (integer_type_node, spec_info.cst.uval);
+	  else
+	    spec_v = build_int_cst (integer_type_node, spec_info.cst.sval);
+
+	  tree cmp_const = fold_convert (TREE_TYPE (cur_arg), spec_v);
+
+	  tree cur_arg_eq_spec = build2 (EQ_EXPR, boolean_type_node,
+					      cur_arg, cmp_const);
+
+	  if (dump_file)
+	    {
+	      if (!dump_first)
+		fprintf (dump_file, " && ");
+	      print_generic_expr (dump_file, cur_arg_eq_spec);
+	      dump_first = false;
+	    }
+
+	  tree tmp1 = make_temp_ssa_name (boolean_type_node, NULL, "SPEC");
+	  gassign* load_stmt1 = gimple_build_assign (tmp1, cur_arg_eq_spec);
+	  gsi_insert_before (&gsi, load_stmt1, GSI_SAME_STMT);
+
+	  if (!cond_tree)
+	    cond_tree = tmp1;
+	  else
+	    {
+	      tree cur_and_prev_true = fold_build2 (BIT_AND_EXPR,
+					 boolean_type_node,
+					 cond_tree,
+					 tmp1);
+
+	      tree tmp2 = make_temp_ssa_name (boolean_type_node, NULL, "SPEC");
+	      gassign* load_stmt2
+		= gimple_build_assign (tmp2, cur_and_prev_true);
+	      gsi_insert_before (&gsi, load_stmt2, GSI_SAME_STMT);
+	      cond_tree = tmp2;
+	    }
+	}
+    }
+
+  /* If not all specializations were used to construct the guard then
+     don't use this specialization.  This can happen when some other IPA
+     pass changes the signature of the base call.  */
+  if (cur_spec < spec_args->length ())
+    cond_tree = build_int_cst (boolean_type_node, 0);
+
+  cond_stmt = gimple_build_cond (EQ_EXPR, cond_tree, boolean_true_node,
+				 NULL_TREE, NULL_TREE);
+
+  gsi_insert_before (&gsi, cond_stmt, GSI_SAME_STMT);
+
+  if (gimple_vdef (call_stmt)
+      && TREE_CODE (gimple_vdef (call_stmt)) == SSA_NAME)
+    {
+      unlink_stmt_vdef (call_stmt);
+      release_ssa_name (gimple_vdef (call_stmt));
+    }
+  gimple_set_vdef (call_stmt, NULL_TREE);
+  gimple_set_vuse (call_stmt, NULL_TREE);
+  update_stmt (call_stmt);
+  spec_call_stmt = as_a <gcall *> (gimple_copy (call_stmt));
+  gimple_call_set_fndecl (spec_call_stmt, callee->decl);
+  dflags = flags_from_decl_or_type (callee->decl);
+
+  if ((dflags & ECF_NORETURN) != 0
+      && should_remove_lhs_p (gimple_call_lhs (spec_call_stmt)))
+    gimple_call_set_lhs (spec_call_stmt, NULL_TREE);
+  gsi_insert_before (&gsi, spec_call_stmt, GSI_SAME_STMT);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, ")");
+      if (cur_spec < spec_args->length ())
+	fprintf (dump_file, " [guard disabled]");
+      fprintf (dump_file, "\n  ");
+      print_gimple_stmt (dump_file, spec_call_stmt, 0);
+    }
+
+  e_cd = split_block (cond_bb, cond_stmt);
+  dcall_bb = e_cd->dest;
+  dcall_bb->count = cond_bb->count.apply_probability (prob);
+
+  e_di = split_block (dcall_bb, spec_call_stmt);
+  icall_bb = e_di->dest;
+  icall_bb->count = cond_bb->count - dcall_bb->count;
+
+  if (!stmt_ends_bb_p (call_stmt))
+    e_ij = split_block (icall_bb, call_stmt);
+  else
+    {
+      e_ij = find_fallthru_edge (icall_bb->succs);
+      if (e_ij != NULL)
+	{
+	  e_ij->probability = profile_probability::always ();
+	  e_ij = single_pred_edge (split_edge (e_ij));
+	}
+    }
+  if (e_ij != NULL)
+    {
+      join_bb = e_ij->dest;
+      join_bb->count = cond_bb->count;
+    }
+
+  e_cd->flags = (e_cd->flags & ~EDGE_FALLTHRU) | EDGE_TRUE_VALUE;
+  e_cd->probability = prob;
+
+  e_ci = make_edge (cond_bb, icall_bb, EDGE_FALSE_VALUE);
+  e_ci->probability = prob.invert ();
+
+  remove_edge (e_di);
+
+  if (e_ij != NULL)
+    {
+      if ((dflags & ECF_NORETURN) == 0)
+	{
+	  e_dj = make_edge (dcall_bb, join_bb, EDGE_FALLTHRU);
+	  e_dj->probability = profile_probability::always ();
+	}
+      e_ij->probability = profile_probability::always ();
+    }
+
+  if (gimple_call_lhs (call_stmt)
+      && TREE_CODE (gimple_call_lhs (call_stmt)) == SSA_NAME
+      && (dflags & ECF_NORETURN) == 0)
+    {
+      tree result = gimple_call_lhs (call_stmt);
+      gphi *phi = create_phi_node (result, join_bb);
+      gimple_call_set_lhs (call_stmt,
+			   duplicate_ssa_name (result, call_stmt));
+      add_phi_arg (phi, gimple_call_lhs (call_stmt), e_ij, UNKNOWN_LOCATION);
+      gimple_call_set_lhs (spec_call_stmt,
+			   duplicate_ssa_name (result, spec_call_stmt));
+      add_phi_arg (phi, gimple_call_lhs (spec_call_stmt), e_dj,
+		   UNKNOWN_LOCATION);
+    }
+
+  lp_nr = lookup_stmt_eh_lp (call_stmt);
+  if (lp_nr > 0 && stmt_could_throw_p (cfun, spec_call_stmt))
+    {
+      add_stmt_to_eh_lp (spec_call_stmt, lp_nr);
+    }
+
+  FOR_EACH_EDGE (e_eh, ei, icall_bb->succs)
+    if (e_eh->flags & (EDGE_EH | EDGE_ABNORMAL))
+      {
+	e = make_edge (dcall_bb, e_eh->dest, e_eh->flags);
+	e->probability = e_eh->probability;
+	for (gphi_iterator psi = gsi_start_phis (e_eh->dest);
+	     !gsi_end_p (psi); gsi_next (&psi))
+	  {
+	    gphi *phi = psi.phi ();
+	    SET_USE (PHI_ARG_DEF_PTR_FROM_EDGE (phi, e),
+		     PHI_ARG_DEF_FROM_EDGE (phi, e_eh));
+	  }
+       }
+  if (!stmt_could_throw_p (cfun, spec_call_stmt))
+    gimple_purge_dead_eh_edges (dcall_bb);
+  return spec_call_stmt;
+}
+
 /* Dump info about indirect call profile.  */
 
 static void
diff --git a/gcc/value-prof.h b/gcc/value-prof.h
index d852c41f33f..7d8be5920b9 100644
--- a/gcc/value-prof.h
+++ b/gcc/value-prof.h
@@ -89,6 +89,7 @@  void verify_histograms (void);
 void free_histograms (function *);
 void stringop_block_profile (gimple *, unsigned int *, HOST_WIDE_INT *);
 gcall *gimple_ic (gcall *, struct cgraph_node *, profile_probability);
+gcall *gimple_sc (struct cgraph_edge *, profile_probability);
 bool get_nth_most_common_value (gimple *stmt, const char *counter_type,
 				histogram_value hist, gcov_type *value,
 				gcov_type *count, gcov_type *all,