c++: optimize specialization of nested class templates

Message ID 20220608182147.4123587-1-ppalka@redhat.com
State New
Headers
Series c++: optimize specialization of nested class templates |

Commit Message

Patrick Palka June 8, 2022, 6:21 p.m. UTC
  When substituting a class template specialization, tsubst_aggr_type
substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
This appears to be unnecessary, however, because the the initial value
of lookup_template_class's context parameter is unused outside of the
IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
context, anyway.  So this patch removes the redundant substitution in
tsubst_aggr_type.  Doing so causes us to ICE on template/nested5.C
because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
we substitute and complete the context A<T>::C with T=E, which in turn
registers the desired dependent specialization of D for us and we end up
trying to register it again.  This patch fixes this by checking the
specializations table again after completion of the context.

This patch also implements a couple of other optimizations:

  * In lookup_template_class, if the context of the partially
    instantiated template is already non-dependent, then we could
    reuse that instead of substituting the context of the most
    general template.
  * When substituting the TYPE_DECL for an injected-class-name
    in tsubst_decl, we can avoid substituting its TREE_TYPE and
    DECL_TI_ARGS.

Together these optimizations improve memory usage for the range-v3
testcase test/view/split.cc by about 5%.  The improvement is probably
more significant when dealing with deeply nested class templates.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

	* pt.cc (lookup_template_class): Remove dead stores to
	context parameter.  Don't substitute the context of the
	most general template if that of the partially instantiated
	template is non-dependent.  Check the specializations table
	again after completing the context of a nested dependent
	specialization.
	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
	TYPE_CONTEXT or pass it to lookup_template_class.
	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
---
 gcc/cp/pt.cc | 69 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 41 insertions(+), 28 deletions(-)
  

Comments

Jason Merrill June 9, 2022, 3:54 p.m. UTC | #1
On 6/8/22 14:21, Patrick Palka wrote:
> When substituting a class template specialization, tsubst_aggr_type
> substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
> This appears to be unnecessary, however, because the the initial value
> of lookup_template_class's context parameter is unused outside of the
> IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
> context, anyway.  So this patch removes the redundant substitution in
> tsubst_aggr_type.  Doing so causes us to ICE on template/nested5.C
> because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
> we substitute and complete the context A<T>::C with T=E, which in turn
> registers the desired dependent specialization of D for us and we end up
> trying to register it again.  This patch fixes this by checking the
> specializations table again after completion of the context.
> 
> This patch also implements a couple of other optimizations:
> 
>    * In lookup_template_class, if the context of the partially
>      instantiated template is already non-dependent, then we could
>      reuse that instead of substituting the context of the most
>      general template.
>    * When substituting the TYPE_DECL for an injected-class-name
>      in tsubst_decl, we can avoid substituting its TREE_TYPE and
>      DECL_TI_ARGS.
> 
> Together these optimizations improve memory usage for the range-v3
> testcase test/view/split.cc by about 5%.  The improvement is probably
> more significant when dealing with deeply nested class templates.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
> gcc/cp/ChangeLog:
> 
> 	* pt.cc (lookup_template_class): Remove dead stores to
> 	context parameter.  Don't substitute the context of the
> 	most general template if that of the partially instantiated
> 	template is non-dependent.  Check the specializations table
> 	again after completing the context of a nested dependent
> 	specialization.
> 	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
> 	TYPE_CONTEXT or pass it to lookup_template_class.
> 	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
> 	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
> ---
>   gcc/cp/pt.cc | 69 +++++++++++++++++++++++++++++++---------------------
>   1 file changed, 41 insertions(+), 28 deletions(-)
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 59b94317e88..28023d60684 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>   	  if (context)
>   	    pop_decl_namespace ();
>   	}
> -      if (templ)
> -	context = DECL_CONTEXT (templ);
>       }
>     else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (d1)))
>       {
> @@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>       {
>         templ = d1;
>         d1 = DECL_NAME (templ);
> -      context = DECL_CONTEXT (templ);
>       }
>     else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
>       {
> @@ -10059,8 +10056,25 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>         context = DECL_CONTEXT (gen_tmpl);
>         if (context && TYPE_P (context))
>   	{
> -	  context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
> -	  context = complete_type (context);
> +	  if (!uses_template_parms (DECL_CONTEXT (templ)))
> +	    /* If the context of the partially instantiated template is
> +	       already non-dependent, then we might as well use it.  */
> +	    context = DECL_CONTEXT (templ);
> +	  else
> +	    {
> +	      context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
> +	      context = complete_type (context);
> +	      if (is_dependent_type && arg_depth > 1)
> +		{
> +		  /* If this is a dependent nested specialization such as
> +		     A<int>::B<T>, then completion of A<int> might have
> +		     registered this specialization of B for us, so check
> +		     the table again (33959).  */
> +		  entry = type_specializations->find_with_hash (&elt, hash);
> +		  if (entry)
> +		    return entry->spec;
> +		}
> +	    }
>   	}
>         else
>   	context = tsubst (context, arglist, complain, in_decl);
> @@ -13711,25 +13725,12 @@ tsubst_aggr_type (tree t,
>         if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
>   	{
>   	  tree argvec;
> -	  tree context;
>   	  tree r;
>   
>   	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
>   	  cp_evaluated ev;
>   
> -	  /* First, determine the context for the type we are looking
> -	     up.  */
> -	  context = TYPE_CONTEXT (t);
> -	  if (context && TYPE_P (context))
> -	    {
> -	      context = tsubst_aggr_type (context, args, complain,
> -					  in_decl, /*entering_scope=*/1);
> -	      /* If context is a nested class inside a class template,
> -	         it may still need to be instantiated (c++/33959).  */
> -	      context = complete_type (context);
> -	    }
> -
> -	  /* Then, figure out what arguments are appropriate for the
> +	  /* Figure out what arguments are appropriate for the
>   	     type we are trying to find.  For example, given:
>   
>   	       template <class T> struct S;
> @@ -13744,7 +13745,7 @@ tsubst_aggr_type (tree t,
>   	    r = error_mark_node;
>   	  else
>   	    {
> -	      r = lookup_template_class (t, argvec, in_decl, context,
> +	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
>   					 entering_scope, complain);
>   	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
>   	    }
> @@ -14880,6 +14881,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
>   		ctx = tsubst_aggr_type (ctx, args,
>   					complain,
>   					in_decl, /*entering_scope=*/1);
> +		if (DECL_SELF_REFERENCE_P (t))
> +		  /* The context and type of a injected-class-name are
> +		     the same, so we don't need to substitute both.  */
> +		  type = ctx;
>   		/* If CTX is unchanged, then T is in fact the
>   		   specialization we want.  That situation occurs when
>   		   referencing a static data member within in its own
> @@ -14900,14 +14905,22 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
>   	      {
>   		tmpl = DECL_TI_TEMPLATE (t);
>   		gen_tmpl = most_general_template (tmpl);
> -		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> -		if (argvec != error_mark_node)
> -		  argvec = (coerce_innermost_template_parms
> -			    (DECL_TEMPLATE_PARMS (gen_tmpl),
> -			     argvec, t, complain,
> -			     /*all*/true, /*defarg*/true));
> -		if (argvec == error_mark_node)
> -		  RETURN (error_mark_node);
> +		if (DECL_SELF_REFERENCE_P (t))
> +		  /* The DECL_TI_ARGS for the injected-class-name are the
> +		     generic template arguments for the class template, so
> +		     substitution/coercion is just the identity mapping.  */
> +		  argvec = args;

Would it make sense to extend this to any TEMPLATE_DECL for which 
DECL_PRIMARY_TEMPLATE is the class template?  So, anything that gets 
here except an alias template.

> +		else
> +		  {
> +		    argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> +		    if (argvec != error_mark_node)
> +		      argvec = (coerce_innermost_template_parms
> +				(DECL_TEMPLATE_PARMS (gen_tmpl),
> +				 argvec, t, complain,
> +				 /*all*/true, /*defarg*/true));
> +		    if (argvec == error_mark_node)
> +		      RETURN (error_mark_node);
> +		  }
>   		hash = hash_tmpl_and_args (gen_tmpl, argvec);
>   		spec = retrieve_specialization (gen_tmpl, argvec, hash);
>   	      }
  
Patrick Palka June 9, 2022, 7:16 p.m. UTC | #2
On Thu, 9 Jun 2022, Jason Merrill wrote:

> On 6/8/22 14:21, Patrick Palka wrote:
> > When substituting a class template specialization, tsubst_aggr_type
> > substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
> > This appears to be unnecessary, however, because the the initial value
> > of lookup_template_class's context parameter is unused outside of the
> > IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
> > context, anyway.  So this patch removes the redundant substitution in
> > tsubst_aggr_type.  Doing so causes us to ICE on template/nested5.C
> > because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
> > we substitute and complete the context A<T>::C with T=E, which in turn
> > registers the desired dependent specialization of D for us and we end up
> > trying to register it again.  This patch fixes this by checking the
> > specializations table again after completion of the context.
> > 
> > This patch also implements a couple of other optimizations:
> > 
> >    * In lookup_template_class, if the context of the partially
> >      instantiated template is already non-dependent, then we could
> >      reuse that instead of substituting the context of the most
> >      general template.
> >    * When substituting the TYPE_DECL for an injected-class-name
> >      in tsubst_decl, we can avoid substituting its TREE_TYPE and
> >      DECL_TI_ARGS.
> > 
> > Together these optimizations improve memory usage for the range-v3
> > testcase test/view/split.cc by about 5%.  The improvement is probably
> > more significant when dealing with deeply nested class templates.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > gcc/cp/ChangeLog:
> > 
> > 	* pt.cc (lookup_template_class): Remove dead stores to
> > 	context parameter.  Don't substitute the context of the
> > 	most general template if that of the partially instantiated
> > 	template is non-dependent.  Check the specializations table
> > 	again after completing the context of a nested dependent
> > 	specialization.
> > 	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
> > 	TYPE_CONTEXT or pass it to lookup_template_class.
> > 	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
> > 	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
> > ---
> >   gcc/cp/pt.cc | 69 +++++++++++++++++++++++++++++++---------------------
> >   1 file changed, 41 insertions(+), 28 deletions(-)
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 59b94317e88..28023d60684 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree
> > in_decl, tree context,
> >   	  if (context)
> >   	    pop_decl_namespace ();
> >   	}
> > -      if (templ)
> > -	context = DECL_CONTEXT (templ);
> >       }
> >     else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE
> > (d1)))
> >       {
> > @@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree
> > in_decl, tree context,
> >       {
> >         templ = d1;
> >         d1 = DECL_NAME (templ);
> > -      context = DECL_CONTEXT (templ);
> >       }
> >     else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
> >       {
> > @@ -10059,8 +10056,25 @@ lookup_template_class (tree d1, tree arglist, tree
> > in_decl, tree context,
> >         context = DECL_CONTEXT (gen_tmpl);
> >         if (context && TYPE_P (context))
> >   	{
> > -	  context = tsubst_aggr_type (context, arglist, complain, in_decl,
> > true);
> > -	  context = complete_type (context);
> > +	  if (!uses_template_parms (DECL_CONTEXT (templ)))
> > +	    /* If the context of the partially instantiated template is
> > +	       already non-dependent, then we might as well use it.  */
> > +	    context = DECL_CONTEXT (templ);
> > +	  else
> > +	    {
> > +	      context = tsubst_aggr_type (context, arglist, complain, in_decl,
> > true);
> > +	      context = complete_type (context);
> > +	      if (is_dependent_type && arg_depth > 1)
> > +		{
> > +		  /* If this is a dependent nested specialization such as
> > +		     A<int>::B<T>, then completion of A<int> might have
> > +		     registered this specialization of B for us, so check
> > +		     the table again (33959).  */
> > +		  entry = type_specializations->find_with_hash (&elt, hash);
> > +		  if (entry)
> > +		    return entry->spec;
> > +		}
> > +	    }
> >   	}
> >         else
> >   	context = tsubst (context, arglist, complain, in_decl);
> > @@ -13711,25 +13725,12 @@ tsubst_aggr_type (tree t,
> >         if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
> >   	{
> >   	  tree argvec;
> > -	  tree context;
> >   	  tree r;
> >     	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
> >   	  cp_evaluated ev;
> >   -	  /* First, determine the context for the type we are looking
> > -	     up.  */
> > -	  context = TYPE_CONTEXT (t);
> > -	  if (context && TYPE_P (context))
> > -	    {
> > -	      context = tsubst_aggr_type (context, args, complain,
> > -					  in_decl, /*entering_scope=*/1);
> > -	      /* If context is a nested class inside a class template,
> > -	         it may still need to be instantiated (c++/33959).  */
> > -	      context = complete_type (context);
> > -	    }
> > -
> > -	  /* Then, figure out what arguments are appropriate for the
> > +	  /* Figure out what arguments are appropriate for the
> >   	     type we are trying to find.  For example, given:
> >     	       template <class T> struct S;
> > @@ -13744,7 +13745,7 @@ tsubst_aggr_type (tree t,
> >   	    r = error_mark_node;
> >   	  else
> >   	    {
> > -	      r = lookup_template_class (t, argvec, in_decl, context,
> > +	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
> >   					 entering_scope, complain);
> >   	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
> >   	    }
> > @@ -14880,6 +14881,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
> > complain)
> >   		ctx = tsubst_aggr_type (ctx, args,
> >   					complain,
> >   					in_decl, /*entering_scope=*/1);
> > +		if (DECL_SELF_REFERENCE_P (t))
> > +		  /* The context and type of a injected-class-name are
> > +		     the same, so we don't need to substitute both.  */
> > +		  type = ctx;
> >   		/* If CTX is unchanged, then T is in fact the
> >   		   specialization we want.  That situation occurs when
> >   		   referencing a static data member within in its own
> > @@ -14900,14 +14905,22 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
> > complain)
> >   	      {
> >   		tmpl = DECL_TI_TEMPLATE (t);
> >   		gen_tmpl = most_general_template (tmpl);
> > -		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> > -		if (argvec != error_mark_node)
> > -		  argvec = (coerce_innermost_template_parms
> > -			    (DECL_TEMPLATE_PARMS (gen_tmpl),
> > -			     argvec, t, complain,
> > -			     /*all*/true, /*defarg*/true));
> > -		if (argvec == error_mark_node)
> > -		  RETURN (error_mark_node);
> > +		if (DECL_SELF_REFERENCE_P (t))
> > +		  /* The DECL_TI_ARGS for the injected-class-name are the
> > +		     generic template arguments for the class template, so
> > +		     substitution/coercion is just the identity mapping.  */
> > +		  argvec = args;
> 
> Would it make sense to extend this to any TEMPLATE_DECL for which
> DECL_PRIMARY_TEMPLATE is the class template?  So, anything that gets here
> except an alias template.

Ah nice, it does look like we could extend this to any TEMPLATE_DECL that
satisfies DECL_CLASS_SCOPE_P && !DECL_MEMBER_TEMPLATE_P (so that we also
exclude static data member templates), i.e. any templated non-template
member.  Is that equivalent to what you had in mind?

Based on some light testing, if we do this, it seems we also need to
handle 'args' having greater depth than 'DECL_TI_ARGS' here, something
which could happen during satisfaction.  I can work on extending this
as a followup patch.

> 
> > +		else
> > +		  {
> > +		    argvec = tsubst (DECL_TI_ARGS (t), args, complain,
> > in_decl);
> > +		    if (argvec != error_mark_node)
> > +		      argvec = (coerce_innermost_template_parms
> > +				(DECL_TEMPLATE_PARMS (gen_tmpl),
> > +				 argvec, t, complain,
> > +				 /*all*/true, /*defarg*/true));
> > +		    if (argvec == error_mark_node)
> > +		      RETURN (error_mark_node);
> > +		  }
> >   		hash = hash_tmpl_and_args (gen_tmpl, argvec);
> >   		spec = retrieve_specialization (gen_tmpl, argvec, hash);
> >   	      }
> 
>
  
Patrick Palka June 10, 2022, 1:18 p.m. UTC | #3
On Thu, 9 Jun 2022, Patrick Palka wrote:

> On Thu, 9 Jun 2022, Jason Merrill wrote:
> 
> > On 6/8/22 14:21, Patrick Palka wrote:
> > > When substituting a class template specialization, tsubst_aggr_type
> > > substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
> > > This appears to be unnecessary, however, because the the initial value
> > > of lookup_template_class's context parameter is unused outside of the
> > > IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
> > > context, anyway.  So this patch removes the redundant substitution in
> > > tsubst_aggr_type.  Doing so causes us to ICE on template/nested5.C
> > > because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
> > > we substitute and complete the context A<T>::C with T=E, which in turn
> > > registers the desired dependent specialization of D for us and we end up
> > > trying to register it again.  This patch fixes this by checking the
> > > specializations table again after completion of the context.
> > > 
> > > This patch also implements a couple of other optimizations:
> > > 
> > >    * In lookup_template_class, if the context of the partially
> > >      instantiated template is already non-dependent, then we could
> > >      reuse that instead of substituting the context of the most
> > >      general template.
> > >    * When substituting the TYPE_DECL for an injected-class-name
> > >      in tsubst_decl, we can avoid substituting its TREE_TYPE and
> > >      DECL_TI_ARGS.
> > > 
> > > Together these optimizations improve memory usage for the range-v3
> > > testcase test/view/split.cc by about 5%.  The improvement is probably
> > > more significant when dealing with deeply nested class templates.
> > > 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > > trunk?
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > > 	* pt.cc (lookup_template_class): Remove dead stores to
> > > 	context parameter.  Don't substitute the context of the
> > > 	most general template if that of the partially instantiated
> > > 	template is non-dependent.  Check the specializations table
> > > 	again after completing the context of a nested dependent
> > > 	specialization.
> > > 	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
> > > 	TYPE_CONTEXT or pass it to lookup_template_class.
> > > 	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
> > > 	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
> > > ---
> > >   gcc/cp/pt.cc | 69 +++++++++++++++++++++++++++++++---------------------
> > >   1 file changed, 41 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > index 59b94317e88..28023d60684 100644
> > > --- a/gcc/cp/pt.cc
> > > +++ b/gcc/cp/pt.cc
> > > @@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree
> > > in_decl, tree context,
> > >   	  if (context)
> > >   	    pop_decl_namespace ();
> > >   	}
> > > -      if (templ)
> > > -	context = DECL_CONTEXT (templ);
> > >       }
> > >     else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE
> > > (d1)))
> > >       {
> > > @@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree
> > > in_decl, tree context,
> > >       {
> > >         templ = d1;
> > >         d1 = DECL_NAME (templ);
> > > -      context = DECL_CONTEXT (templ);
> > >       }
> > >     else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
> > >       {
> > > @@ -10059,8 +10056,25 @@ lookup_template_class (tree d1, tree arglist, tree
> > > in_decl, tree context,
> > >         context = DECL_CONTEXT (gen_tmpl);
> > >         if (context && TYPE_P (context))
> > >   	{
> > > -	  context = tsubst_aggr_type (context, arglist, complain, in_decl,
> > > true);
> > > -	  context = complete_type (context);
> > > +	  if (!uses_template_parms (DECL_CONTEXT (templ)))
> > > +	    /* If the context of the partially instantiated template is
> > > +	       already non-dependent, then we might as well use it.  */
> > > +	    context = DECL_CONTEXT (templ);
> > > +	  else
> > > +	    {
> > > +	      context = tsubst_aggr_type (context, arglist, complain, in_decl,
> > > true);
> > > +	      context = complete_type (context);
> > > +	      if (is_dependent_type && arg_depth > 1)
> > > +		{
> > > +		  /* If this is a dependent nested specialization such as
> > > +		     A<int>::B<T>, then completion of A<int> might have
> > > +		     registered this specialization of B for us, so check
> > > +		     the table again (33959).  */
> > > +		  entry = type_specializations->find_with_hash (&elt, hash);
> > > +		  if (entry)
> > > +		    return entry->spec;
> > > +		}
> > > +	    }
> > >   	}
> > >         else
> > >   	context = tsubst (context, arglist, complain, in_decl);
> > > @@ -13711,25 +13725,12 @@ tsubst_aggr_type (tree t,
> > >         if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
> > >   	{
> > >   	  tree argvec;
> > > -	  tree context;
> > >   	  tree r;
> > >     	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
> > >   	  cp_evaluated ev;
> > >   -	  /* First, determine the context for the type we are looking
> > > -	     up.  */
> > > -	  context = TYPE_CONTEXT (t);
> > > -	  if (context && TYPE_P (context))
> > > -	    {
> > > -	      context = tsubst_aggr_type (context, args, complain,
> > > -					  in_decl, /*entering_scope=*/1);
> > > -	      /* If context is a nested class inside a class template,
> > > -	         it may still need to be instantiated (c++/33959).  */
> > > -	      context = complete_type (context);
> > > -	    }
> > > -
> > > -	  /* Then, figure out what arguments are appropriate for the
> > > +	  /* Figure out what arguments are appropriate for the
> > >   	     type we are trying to find.  For example, given:
> > >     	       template <class T> struct S;
> > > @@ -13744,7 +13745,7 @@ tsubst_aggr_type (tree t,
> > >   	    r = error_mark_node;
> > >   	  else
> > >   	    {
> > > -	      r = lookup_template_class (t, argvec, in_decl, context,
> > > +	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
> > >   					 entering_scope, complain);
> > >   	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
> > >   	    }
> > > @@ -14880,6 +14881,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
> > > complain)
> > >   		ctx = tsubst_aggr_type (ctx, args,
> > >   					complain,
> > >   					in_decl, /*entering_scope=*/1);
> > > +		if (DECL_SELF_REFERENCE_P (t))
> > > +		  /* The context and type of a injected-class-name are
> > > +		     the same, so we don't need to substitute both.  */
> > > +		  type = ctx;
> > >   		/* If CTX is unchanged, then T is in fact the
> > >   		   specialization we want.  That situation occurs when
> > >   		   referencing a static data member within in its own
> > > @@ -14900,14 +14905,22 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
> > > complain)
> > >   	      {
> > >   		tmpl = DECL_TI_TEMPLATE (t);
> > >   		gen_tmpl = most_general_template (tmpl);
> > > -		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> > > -		if (argvec != error_mark_node)
> > > -		  argvec = (coerce_innermost_template_parms
> > > -			    (DECL_TEMPLATE_PARMS (gen_tmpl),
> > > -			     argvec, t, complain,
> > > -			     /*all*/true, /*defarg*/true));
> > > -		if (argvec == error_mark_node)
> > > -		  RETURN (error_mark_node);
> > > +		if (DECL_SELF_REFERENCE_P (t))
> > > +		  /* The DECL_TI_ARGS for the injected-class-name are the
> > > +		     generic template arguments for the class template, so
> > > +		     substitution/coercion is just the identity mapping.  */
> > > +		  argvec = args;
> > 
> > Would it make sense to extend this to any TEMPLATE_DECL for which
> > DECL_PRIMARY_TEMPLATE is the class template?  So, anything that gets here
> > except an alias template.
> 
> Ah nice, it does look like we could extend this to any TEMPLATE_DECL that
> satisfies DECL_CLASS_SCOPE_P && !DECL_MEMBER_TEMPLATE_P (so that we also
> exclude static data member templates), i.e. any templated non-template
> member.  Is that equivalent to what you had in mind?
> 
> Based on some light testing, if we do this, it seems we also need to
> handle 'args' having greater depth than 'DECL_TI_ARGS' here, something
> which could happen during satisfaction.

... or we could just restrict the optimization to when the argument
depths match, like so:

-- >8 --

Subject: [PATCH] c++: optimize specialization of nested templated classes

gcc/cp/ChangeLog:

	* pt.cc (lookup_template_class): Remove dead stores to
	context parameter.  Don't substitute the context of the
	most general template if that of the partially instantiated
	template is non-dependent.  Check the specializations table
	again after completing the context of a nested dependent
	specialization.
	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
	TYPE_CONTEXT or pass it to lookup_template_class.
	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
---
 gcc/cp/pt.cc | 73 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 45 insertions(+), 28 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3154186ac20..ebd822373db 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
 	  if (context)
 	    pop_decl_namespace ();
 	}
-      if (templ)
-	context = DECL_CONTEXT (templ);
     }
   else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (d1)))
     {
@@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
     {
       templ = d1;
       d1 = DECL_NAME (templ);
-      context = DECL_CONTEXT (templ);
     }
   else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
     {
@@ -10059,8 +10056,26 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
       context = DECL_CONTEXT (gen_tmpl);
       if (context && TYPE_P (context))
 	{
-	  context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
-	  context = complete_type (context);
+	  if (!uses_template_parms (DECL_CONTEXT (templ)))
+	    /* If the context of the partially instantiated template is
+	       already non-dependent, then we might as well use it.  */
+	    context = DECL_CONTEXT (templ);
+	  else
+	    {
+	      context = tsubst_aggr_type (context, arglist,
+					  complain, in_decl, true);
+	      context = complete_type (context);
+	      if (is_dependent_type && arg_depth > 1)
+		{
+		  /* If this is a dependent nested specialization such as
+		     A<int>::B<T>, then completion of A<int> might have
+		     registered this specialization of B for us, so check
+		     the table again (33959).  */
+		  entry = type_specializations->find_with_hash (&elt, hash);
+		  if (entry)
+		    return entry->spec;
+		}
+	    }
 	}
       else
 	context = tsubst (context, arglist, complain, in_decl);
@@ -13739,25 +13754,12 @@ tsubst_aggr_type (tree t,
       if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
 	{
 	  tree argvec;
-	  tree context;
 	  tree r;
 
 	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
 	  cp_evaluated ev;
 
-	  /* First, determine the context for the type we are looking
-	     up.  */
-	  context = TYPE_CONTEXT (t);
-	  if (context && TYPE_P (context))
-	    {
-	      context = tsubst_aggr_type (context, args, complain,
-					  in_decl, /*entering_scope=*/1);
-	      /* If context is a nested class inside a class template,
-	         it may still need to be instantiated (c++/33959).  */
-	      context = complete_type (context);
-	    }
-
-	  /* Then, figure out what arguments are appropriate for the
+	  /* Figure out what arguments are appropriate for the
 	     type we are trying to find.  For example, given:
 
 	       template <class T> struct S;
@@ -13772,7 +13774,7 @@ tsubst_aggr_type (tree t,
 	    r = error_mark_node;
 	  else
 	    {
-	      r = lookup_template_class (t, argvec, in_decl, context,
+	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
 					 entering_scope, complain);
 	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
 	    }
@@ -14913,6 +14915,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 		ctx = tsubst_aggr_type (ctx, args,
 					complain,
 					in_decl, /*entering_scope=*/1);
+		if (DECL_SELF_REFERENCE_P (t))
+		  /* The context and type of an injected-class-name are
+		     the same, so we don't need to substitute both.  */
+		  type = ctx;
 		/* If CTX is unchanged, then T is in fact the
 		   specialization we want.  That situation occurs when
 		   referencing a static data member within in its own
@@ -14933,14 +14939,25 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 	      {
 		tmpl = DECL_TI_TEMPLATE (t);
 		gen_tmpl = most_general_template (tmpl);
-		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
-		if (argvec != error_mark_node)
-		  argvec = (coerce_innermost_template_parms
-			    (DECL_TEMPLATE_PARMS (gen_tmpl),
-			     argvec, t, complain,
-			     /*all*/true, /*defarg*/true));
-		if (argvec == error_mark_node)
-		  RETURN (error_mark_node);
+		if (DECL_CLASS_SCOPE_P (tmpl)
+		    && !DECL_MEMBER_TEMPLATE_P (tmpl)
+		    && (TMPL_ARGS_DEPTH (args)
+			== TMPL_ARGS_DEPTH (DECL_TI_ARGS (t))))
+		  /* The DECL_TI_ARGS in this case are the generic template
+		     arguments for the class template, so substitution/coercion
+		     is just the identity mapping.  */
+		  argvec = args;
+		else
+		  {
+		    argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
+		    if (argvec != error_mark_node)
+		      argvec = (coerce_innermost_template_parms
+				(DECL_TEMPLATE_PARMS (gen_tmpl),
+				 argvec, t, complain,
+				 /*all*/true, /*defarg*/true));
+		    if (argvec == error_mark_node)
+		      RETURN (error_mark_node);
+		  }
 		hash = hash_tmpl_and_args (gen_tmpl, argvec);
 		spec = retrieve_specialization (gen_tmpl, argvec, hash);
 	      }
  
Patrick Palka June 10, 2022, 4 p.m. UTC | #4
On Fri, 10 Jun 2022, Patrick Palka wrote:

> On Thu, 9 Jun 2022, Patrick Palka wrote:
> 
> > On Thu, 9 Jun 2022, Jason Merrill wrote:
> > 
> > > On 6/8/22 14:21, Patrick Palka wrote:
> > > > When substituting a class template specialization, tsubst_aggr_type
> > > > substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
> > > > This appears to be unnecessary, however, because the the initial value
> > > > of lookup_template_class's context parameter is unused outside of the
> > > > IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
> > > > context, anyway.  So this patch removes the redundant substitution in
> > > > tsubst_aggr_type.  Doing so causes us to ICE on template/nested5.C
> > > > because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
> > > > we substitute and complete the context A<T>::C with T=E, which in turn
> > > > registers the desired dependent specialization of D for us and we end up
> > > > trying to register it again.  This patch fixes this by checking the
> > > > specializations table again after completion of the context.
> > > > 
> > > > This patch also implements a couple of other optimizations:
> > > > 
> > > >    * In lookup_template_class, if the context of the partially
> > > >      instantiated template is already non-dependent, then we could
> > > >      reuse that instead of substituting the context of the most
> > > >      general template.
> > > >    * When substituting the TYPE_DECL for an injected-class-name
> > > >      in tsubst_decl, we can avoid substituting its TREE_TYPE and
> > > >      DECL_TI_ARGS.
> > > > 
> > > > Together these optimizations improve memory usage for the range-v3
> > > > testcase test/view/split.cc by about 5%.  The improvement is probably
> > > > more significant when dealing with deeply nested class templates.
> > > > 
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > > > trunk?
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > 	* pt.cc (lookup_template_class): Remove dead stores to
> > > > 	context parameter.  Don't substitute the context of the
> > > > 	most general template if that of the partially instantiated
> > > > 	template is non-dependent.  Check the specializations table
> > > > 	again after completing the context of a nested dependent
> > > > 	specialization.
> > > > 	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
> > > > 	TYPE_CONTEXT or pass it to lookup_template_class.
> > > > 	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
> > > > 	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
> > > > ---
> > > >   gcc/cp/pt.cc | 69 +++++++++++++++++++++++++++++++---------------------
> > > >   1 file changed, 41 insertions(+), 28 deletions(-)
> > > > 
> > > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > > index 59b94317e88..28023d60684 100644
> > > > --- a/gcc/cp/pt.cc
> > > > +++ b/gcc/cp/pt.cc
> > > > @@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree
> > > > in_decl, tree context,
> > > >   	  if (context)
> > > >   	    pop_decl_namespace ();
> > > >   	}
> > > > -      if (templ)
> > > > -	context = DECL_CONTEXT (templ);
> > > >       }
> > > >     else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE
> > > > (d1)))
> > > >       {
> > > > @@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree
> > > > in_decl, tree context,
> > > >       {
> > > >         templ = d1;
> > > >         d1 = DECL_NAME (templ);
> > > > -      context = DECL_CONTEXT (templ);
> > > >       }
> > > >     else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
> > > >       {
> > > > @@ -10059,8 +10056,25 @@ lookup_template_class (tree d1, tree arglist, tree
> > > > in_decl, tree context,
> > > >         context = DECL_CONTEXT (gen_tmpl);
> > > >         if (context && TYPE_P (context))
> > > >   	{
> > > > -	  context = tsubst_aggr_type (context, arglist, complain, in_decl,
> > > > true);
> > > > -	  context = complete_type (context);
> > > > +	  if (!uses_template_parms (DECL_CONTEXT (templ)))
> > > > +	    /* If the context of the partially instantiated template is
> > > > +	       already non-dependent, then we might as well use it.  */
> > > > +	    context = DECL_CONTEXT (templ);
> > > > +	  else
> > > > +	    {
> > > > +	      context = tsubst_aggr_type (context, arglist, complain, in_decl,
> > > > true);
> > > > +	      context = complete_type (context);
> > > > +	      if (is_dependent_type && arg_depth > 1)
> > > > +		{
> > > > +		  /* If this is a dependent nested specialization such as
> > > > +		     A<int>::B<T>, then completion of A<int> might have
> > > > +		     registered this specialization of B for us, so check
> > > > +		     the table again (33959).  */
> > > > +		  entry = type_specializations->find_with_hash (&elt, hash);
> > > > +		  if (entry)
> > > > +		    return entry->spec;
> > > > +		}
> > > > +	    }
> > > >   	}
> > > >         else
> > > >   	context = tsubst (context, arglist, complain, in_decl);
> > > > @@ -13711,25 +13725,12 @@ tsubst_aggr_type (tree t,
> > > >         if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
> > > >   	{
> > > >   	  tree argvec;
> > > > -	  tree context;
> > > >   	  tree r;
> > > >     	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
> > > >   	  cp_evaluated ev;
> > > >   -	  /* First, determine the context for the type we are looking
> > > > -	     up.  */
> > > > -	  context = TYPE_CONTEXT (t);
> > > > -	  if (context && TYPE_P (context))
> > > > -	    {
> > > > -	      context = tsubst_aggr_type (context, args, complain,
> > > > -					  in_decl, /*entering_scope=*/1);
> > > > -	      /* If context is a nested class inside a class template,
> > > > -	         it may still need to be instantiated (c++/33959).  */
> > > > -	      context = complete_type (context);
> > > > -	    }
> > > > -
> > > > -	  /* Then, figure out what arguments are appropriate for the
> > > > +	  /* Figure out what arguments are appropriate for the
> > > >   	     type we are trying to find.  For example, given:
> > > >     	       template <class T> struct S;
> > > > @@ -13744,7 +13745,7 @@ tsubst_aggr_type (tree t,
> > > >   	    r = error_mark_node;
> > > >   	  else
> > > >   	    {
> > > > -	      r = lookup_template_class (t, argvec, in_decl, context,
> > > > +	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
> > > >   					 entering_scope, complain);
> > > >   	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
> > > >   	    }
> > > > @@ -14880,6 +14881,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
> > > > complain)
> > > >   		ctx = tsubst_aggr_type (ctx, args,
> > > >   					complain,
> > > >   					in_decl, /*entering_scope=*/1);
> > > > +		if (DECL_SELF_REFERENCE_P (t))
> > > > +		  /* The context and type of a injected-class-name are
> > > > +		     the same, so we don't need to substitute both.  */
> > > > +		  type = ctx;
> > > >   		/* If CTX is unchanged, then T is in fact the
> > > >   		   specialization we want.  That situation occurs when
> > > >   		   referencing a static data member within in its own
> > > > @@ -14900,14 +14905,22 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
> > > > complain)
> > > >   	      {
> > > >   		tmpl = DECL_TI_TEMPLATE (t);
> > > >   		gen_tmpl = most_general_template (tmpl);
> > > > -		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> > > > -		if (argvec != error_mark_node)
> > > > -		  argvec = (coerce_innermost_template_parms
> > > > -			    (DECL_TEMPLATE_PARMS (gen_tmpl),
> > > > -			     argvec, t, complain,
> > > > -			     /*all*/true, /*defarg*/true));
> > > > -		if (argvec == error_mark_node)
> > > > -		  RETURN (error_mark_node);
> > > > +		if (DECL_SELF_REFERENCE_P (t))
> > > > +		  /* The DECL_TI_ARGS for the injected-class-name are the
> > > > +		     generic template arguments for the class template, so
> > > > +		     substitution/coercion is just the identity mapping.  */
> > > > +		  argvec = args;
> > > 
> > > Would it make sense to extend this to any TEMPLATE_DECL for which
> > > DECL_PRIMARY_TEMPLATE is the class template?  So, anything that gets here
> > > except an alias template.
> > 
> > Ah nice, it does look like we could extend this to any TEMPLATE_DECL that
> > satisfies DECL_CLASS_SCOPE_P && !DECL_MEMBER_TEMPLATE_P (so that we also
> > exclude static data member templates), i.e. any templated non-template
> > member.  Is that equivalent to what you had in mind?
> > 
> > Based on some light testing, if we do this, it seems we also need to
> > handle 'args' having greater depth than 'DECL_TI_ARGS' here, something
> > which could happen during satisfaction.
> 
> ... or we could just restrict the optimization to when the argument
> depths match, like so:
> 
> -- >8 --
> 
> Subject: [PATCH] c++: optimize specialization of nested templated classes
> 
> gcc/cp/ChangeLog:
> 
> 	* pt.cc (lookup_template_class): Remove dead stores to
> 	context parameter.  Don't substitute the context of the
> 	most general template if that of the partially instantiated
> 	template is non-dependent.  Check the specializations table
> 	again after completing the context of a nested dependent
> 	specialization.
> 	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
> 	TYPE_CONTEXT or pass it to lookup_template_class.
> 	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
> 	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
> ---
>  gcc/cp/pt.cc | 73 ++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 45 insertions(+), 28 deletions(-)
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 3154186ac20..ebd822373db 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>  	  if (context)
>  	    pop_decl_namespace ();
>  	}
> -      if (templ)
> -	context = DECL_CONTEXT (templ);
>      }
>    else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (d1)))
>      {
> @@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>      {
>        templ = d1;
>        d1 = DECL_NAME (templ);
> -      context = DECL_CONTEXT (templ);
>      }
>    else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
>      {
> @@ -10059,8 +10056,26 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>        context = DECL_CONTEXT (gen_tmpl);
>        if (context && TYPE_P (context))
>  	{
> -	  context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
> -	  context = complete_type (context);
> +	  if (!uses_template_parms (DECL_CONTEXT (templ)))
> +	    /* If the context of the partially instantiated template is
> +	       already non-dependent, then we might as well use it.  */
> +	    context = DECL_CONTEXT (templ);
> +	  else
> +	    {
> +	      context = tsubst_aggr_type (context, arglist,
> +					  complain, in_decl, true);
> +	      context = complete_type (context);
> +	      if (is_dependent_type && arg_depth > 1)
> +		{
> +		  /* If this is a dependent nested specialization such as
> +		     A<int>::B<T>, then completion of A<int> might have
> +		     registered this specialization of B for us, so check
> +		     the table again (33959).  */
> +		  entry = type_specializations->find_with_hash (&elt, hash);
> +		  if (entry)
> +		    return entry->spec;
> +		}
> +	    }
>  	}
>        else
>  	context = tsubst (context, arglist, complain, in_decl);
> @@ -13739,25 +13754,12 @@ tsubst_aggr_type (tree t,
>        if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
>  	{
>  	  tree argvec;
> -	  tree context;
>  	  tree r;
>  
>  	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
>  	  cp_evaluated ev;
>  
> -	  /* First, determine the context for the type we are looking
> -	     up.  */
> -	  context = TYPE_CONTEXT (t);
> -	  if (context && TYPE_P (context))
> -	    {
> -	      context = tsubst_aggr_type (context, args, complain,
> -					  in_decl, /*entering_scope=*/1);
> -	      /* If context is a nested class inside a class template,
> -	         it may still need to be instantiated (c++/33959).  */
> -	      context = complete_type (context);
> -	    }
> -
> -	  /* Then, figure out what arguments are appropriate for the
> +	  /* Figure out what arguments are appropriate for the
>  	     type we are trying to find.  For example, given:
>  
>  	       template <class T> struct S;
> @@ -13772,7 +13774,7 @@ tsubst_aggr_type (tree t,
>  	    r = error_mark_node;
>  	  else
>  	    {
> -	      r = lookup_template_class (t, argvec, in_decl, context,
> +	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
>  					 entering_scope, complain);
>  	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
>  	    }
> @@ -14913,6 +14915,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
>  		ctx = tsubst_aggr_type (ctx, args,
>  					complain,
>  					in_decl, /*entering_scope=*/1);
> +		if (DECL_SELF_REFERENCE_P (t))
> +		  /* The context and type of an injected-class-name are
> +		     the same, so we don't need to substitute both.  */
> +		  type = ctx;
>  		/* If CTX is unchanged, then T is in fact the
>  		   specialization we want.  That situation occurs when
>  		   referencing a static data member within in its own
> @@ -14933,14 +14939,25 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
>  	      {
>  		tmpl = DECL_TI_TEMPLATE (t);
>  		gen_tmpl = most_general_template (tmpl);
> -		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> -		if (argvec != error_mark_node)
> -		  argvec = (coerce_innermost_template_parms
> -			    (DECL_TEMPLATE_PARMS (gen_tmpl),
> -			     argvec, t, complain,
> -			     /*all*/true, /*defarg*/true));
> -		if (argvec == error_mark_node)
> -		  RETURN (error_mark_node);
> +		if (DECL_CLASS_SCOPE_P (tmpl)
> +		    && !DECL_MEMBER_TEMPLATE_P (tmpl)
> +		    && (TMPL_ARGS_DEPTH (args)
> +			== TMPL_ARGS_DEPTH (DECL_TI_ARGS (t))))
> +		  /* The DECL_TI_ARGS in this case are the generic template
> +		     arguments for the class template, so substitution/coercion
> +		     is just the identity mapping.  */
> +		  argvec = args;
> +		else
> +		  {
> +		    argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> +		    if (argvec != error_mark_node)
> +		      argvec = (coerce_innermost_template_parms
> +				(DECL_TEMPLATE_PARMS (gen_tmpl),
> +				 argvec, t, complain,
> +				 /*all*/true, /*defarg*/true));
> +		    if (argvec == error_mark_node)
> +		      RETURN (error_mark_node);
> +		  }

Seems we could be a bit cleverer here by trying to avoid the
coercion even if we have to do the substitution.

When the argument depth is less than the parameter depth, then the
innermost substituted arguments are just lowered arguments from
DECL_TI_ARGS, for which coercion would be a no-op, so we can just
skip it.  This optimization seems to trigger 90+% of the time and
compile time/memory by about 1%/0.5% in my experiments.  (The call to
coerce_innermost_template_parms was added by r6-829-gb237c4cbd3da7a
FWIW.)

-- >8 --


Subject: [PATCH] c++: optimize specialization of nested templated classes

gcc/cp/ChangeLog:

	* pt.cc (lookup_template_class): Remove dead stores to
	context parameter.  Don't substitute the context of the
	most general template if that of the partially instantiated
	template is non-dependent.  Check the specializations table
	again after completing the context of a nested dependent
	specialization.
	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
	TYPE_CONTEXT or pass it to lookup_template_class.
	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
---
 gcc/cp/pt.cc | 74 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 46 insertions(+), 28 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3154186ac20..faa5bc377cb 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
 	  if (context)
 	    pop_decl_namespace ();
 	}
-      if (templ)
-	context = DECL_CONTEXT (templ);
     }
   else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (d1)))
     {
@@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
     {
       templ = d1;
       d1 = DECL_NAME (templ);
-      context = DECL_CONTEXT (templ);
     }
   else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
     {
@@ -10059,8 +10056,26 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
       context = DECL_CONTEXT (gen_tmpl);
       if (context && TYPE_P (context))
 	{
-	  context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
-	  context = complete_type (context);
+	  if (!uses_template_parms (DECL_CONTEXT (templ)))
+	    /* If the context of the partially instantiated template is
+	       already non-dependent, then we might as well use it.  */
+	    context = DECL_CONTEXT (templ);
+	  else
+	    {
+	      context = tsubst_aggr_type (context, arglist,
+					  complain, in_decl, true);
+	      context = complete_type (context);
+	      if (is_dependent_type && arg_depth > 1)
+		{
+		  /* If this is a dependent nested specialization such as
+		     A<int>::B<T>, then completion of A<int> might have
+		     registered this specialization of B for us, so check
+		     the table again (33959).  */
+		  entry = type_specializations->find_with_hash (&elt, hash);
+		  if (entry)
+		    return entry->spec;
+		}
+	    }
 	}
       else
 	context = tsubst (context, arglist, complain, in_decl);
@@ -13739,25 +13754,12 @@ tsubst_aggr_type (tree t,
       if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
 	{
 	  tree argvec;
-	  tree context;
 	  tree r;
 
 	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
 	  cp_evaluated ev;
 
-	  /* First, determine the context for the type we are looking
-	     up.  */
-	  context = TYPE_CONTEXT (t);
-	  if (context && TYPE_P (context))
-	    {
-	      context = tsubst_aggr_type (context, args, complain,
-					  in_decl, /*entering_scope=*/1);
-	      /* If context is a nested class inside a class template,
-	         it may still need to be instantiated (c++/33959).  */
-	      context = complete_type (context);
-	    }
-
-	  /* Then, figure out what arguments are appropriate for the
+	  /* Figure out what arguments are appropriate for the
 	     type we are trying to find.  For example, given:
 
 	       template <class T> struct S;
@@ -13772,7 +13774,7 @@ tsubst_aggr_type (tree t,
 	    r = error_mark_node;
 	  else
 	    {
-	      r = lookup_template_class (t, argvec, in_decl, context,
+	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
 					 entering_scope, complain);
 	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
 	    }
@@ -14913,6 +14915,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 		ctx = tsubst_aggr_type (ctx, args,
 					complain,
 					in_decl, /*entering_scope=*/1);
+		if (DECL_SELF_REFERENCE_P (t))
+		  /* The context and type of an injected-class-name are
+		     the same, so we don't need to substitute both.  */
+		  type = ctx;
 		/* If CTX is unchanged, then T is in fact the
 		   specialization we want.  That situation occurs when
 		   referencing a static data member within in its own
@@ -14931,16 +14937,28 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 
 	    if (!spec)
 	      {
+		int args_depth = TMPL_ARGS_DEPTH (args);
+		int parms_depth = TMPL_ARGS_DEPTH (DECL_TI_ARGS (t));
 		tmpl = DECL_TI_TEMPLATE (t);
 		gen_tmpl = most_general_template (tmpl);
-		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
-		if (argvec != error_mark_node)
-		  argvec = (coerce_innermost_template_parms
-			    (DECL_TEMPLATE_PARMS (gen_tmpl),
-			     argvec, t, complain,
-			     /*all*/true, /*defarg*/true));
-		if (argvec == error_mark_node)
-		  RETURN (error_mark_node);
+		if (args_depth == parms_depth
+		    && DECL_CLASS_SCOPE_P (tmpl)
+		    && !DECL_MEMBER_TEMPLATE_P (tmpl))
+		  /* The DECL_TI_ARGS in this case are the generic template
+		     arguments for the class template, so substitution is
+		     just the identity mapping.  */
+		  argvec = args;
+		else
+		  {
+		    argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
+		    if (args_depth >= parms_depth && argvec != error_mark_node)
+		      argvec = (coerce_innermost_template_parms
+				(DECL_TEMPLATE_PARMS (gen_tmpl),
+				 argvec, t, complain,
+				 /*all*/true, /*defarg*/true));
+		    if (argvec == error_mark_node)
+		      RETURN (error_mark_node);
+		  }
 		hash = hash_tmpl_and_args (gen_tmpl, argvec);
 		spec = retrieve_specialization (gen_tmpl, argvec, hash);
 	      }
  
Jason Merrill June 10, 2022, 4:38 p.m. UTC | #5
On 6/10/22 12:00, Patrick Palka wrote:
> On Fri, 10 Jun 2022, Patrick Palka wrote:
> 
>> On Thu, 9 Jun 2022, Patrick Palka wrote:
>>
>>> On Thu, 9 Jun 2022, Jason Merrill wrote:
>>>
>>>> On 6/8/22 14:21, Patrick Palka wrote:
>>>>> When substituting a class template specialization, tsubst_aggr_type
>>>>> substitutes the TYPE_CONTEXT before passing it to lookup_template_class.
>>>>> This appears to be unnecessary, however, because the the initial value
>>>>> of lookup_template_class's context parameter is unused outside of the
>>>>> IDENTIFIER_NODE case, and l_t_c performs its own substitution of the
>>>>> context, anyway.  So this patch removes the redundant substitution in
>>>>> tsubst_aggr_type.  Doing so causes us to ICE on template/nested5.C
>>>>> because during lookup_template_class for A<T>::C::D<S> with T=E and S=S,
>>>>> we substitute and complete the context A<T>::C with T=E, which in turn
>>>>> registers the desired dependent specialization of D for us and we end up
>>>>> trying to register it again.  This patch fixes this by checking the
>>>>> specializations table again after completion of the context.
>>>>>
>>>>> This patch also implements a couple of other optimizations:
>>>>>
>>>>>     * In lookup_template_class, if the context of the partially
>>>>>       instantiated template is already non-dependent, then we could
>>>>>       reuse that instead of substituting the context of the most
>>>>>       general template.
>>>>>     * When substituting the TYPE_DECL for an injected-class-name
>>>>>       in tsubst_decl, we can avoid substituting its TREE_TYPE and
>>>>>       DECL_TI_ARGS.
>>>>>
>>>>> Together these optimizations improve memory usage for the range-v3
>>>>> testcase test/view/split.cc by about 5%.  The improvement is probably
>>>>> more significant when dealing with deeply nested class templates.
>>>>>
>>>>> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
>>>>> trunk?
>>>>>
>>>>> gcc/cp/ChangeLog:
>>>>>
>>>>> 	* pt.cc (lookup_template_class): Remove dead stores to
>>>>> 	context parameter.  Don't substitute the context of the
>>>>> 	most general template if that of the partially instantiated
>>>>> 	template is non-dependent.  Check the specializations table
>>>>> 	again after completing the context of a nested dependent
>>>>> 	specialization.
>>>>> 	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
>>>>> 	TYPE_CONTEXT or pass it to lookup_template_class.
>>>>> 	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
>>>>> 	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
>>>>> ---
>>>>>    gcc/cp/pt.cc | 69 +++++++++++++++++++++++++++++++---------------------
>>>>>    1 file changed, 41 insertions(+), 28 deletions(-)
>>>>>
>>>>> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
>>>>> index 59b94317e88..28023d60684 100644
>>>>> --- a/gcc/cp/pt.cc
>>>>> +++ b/gcc/cp/pt.cc
>>>>> @@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree
>>>>> in_decl, tree context,
>>>>>    	  if (context)
>>>>>    	    pop_decl_namespace ();
>>>>>    	}
>>>>> -      if (templ)
>>>>> -	context = DECL_CONTEXT (templ);
>>>>>        }
>>>>>      else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE
>>>>> (d1)))
>>>>>        {
>>>>> @@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree
>>>>> in_decl, tree context,
>>>>>        {
>>>>>          templ = d1;
>>>>>          d1 = DECL_NAME (templ);
>>>>> -      context = DECL_CONTEXT (templ);
>>>>>        }
>>>>>      else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
>>>>>        {
>>>>> @@ -10059,8 +10056,25 @@ lookup_template_class (tree d1, tree arglist, tree
>>>>> in_decl, tree context,
>>>>>          context = DECL_CONTEXT (gen_tmpl);
>>>>>          if (context && TYPE_P (context))
>>>>>    	{
>>>>> -	  context = tsubst_aggr_type (context, arglist, complain, in_decl,
>>>>> true);
>>>>> -	  context = complete_type (context);
>>>>> +	  if (!uses_template_parms (DECL_CONTEXT (templ)))
>>>>> +	    /* If the context of the partially instantiated template is
>>>>> +	       already non-dependent, then we might as well use it.  */
>>>>> +	    context = DECL_CONTEXT (templ);
>>>>> +	  else
>>>>> +	    {
>>>>> +	      context = tsubst_aggr_type (context, arglist, complain, in_decl,
>>>>> true);
>>>>> +	      context = complete_type (context);
>>>>> +	      if (is_dependent_type && arg_depth > 1)
>>>>> +		{
>>>>> +		  /* If this is a dependent nested specialization such as
>>>>> +		     A<int>::B<T>, then completion of A<int> might have
>>>>> +		     registered this specialization of B for us, so check
>>>>> +		     the table again (33959).  */
>>>>> +		  entry = type_specializations->find_with_hash (&elt, hash);
>>>>> +		  if (entry)
>>>>> +		    return entry->spec;
>>>>> +		}
>>>>> +	    }
>>>>>    	}
>>>>>          else
>>>>>    	context = tsubst (context, arglist, complain, in_decl);
>>>>> @@ -13711,25 +13725,12 @@ tsubst_aggr_type (tree t,
>>>>>          if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
>>>>>    	{
>>>>>    	  tree argvec;
>>>>> -	  tree context;
>>>>>    	  tree r;
>>>>>      	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
>>>>>    	  cp_evaluated ev;
>>>>>    -	  /* First, determine the context for the type we are looking
>>>>> -	     up.  */
>>>>> -	  context = TYPE_CONTEXT (t);
>>>>> -	  if (context && TYPE_P (context))
>>>>> -	    {
>>>>> -	      context = tsubst_aggr_type (context, args, complain,
>>>>> -					  in_decl, /*entering_scope=*/1);
>>>>> -	      /* If context is a nested class inside a class template,
>>>>> -	         it may still need to be instantiated (c++/33959).  */
>>>>> -	      context = complete_type (context);
>>>>> -	    }
>>>>> -
>>>>> -	  /* Then, figure out what arguments are appropriate for the
>>>>> +	  /* Figure out what arguments are appropriate for the
>>>>>    	     type we are trying to find.  For example, given:
>>>>>      	       template <class T> struct S;
>>>>> @@ -13744,7 +13745,7 @@ tsubst_aggr_type (tree t,
>>>>>    	    r = error_mark_node;
>>>>>    	  else
>>>>>    	    {
>>>>> -	      r = lookup_template_class (t, argvec, in_decl, context,
>>>>> +	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
>>>>>    					 entering_scope, complain);
>>>>>    	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
>>>>>    	    }
>>>>> @@ -14880,6 +14881,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
>>>>> complain)
>>>>>    		ctx = tsubst_aggr_type (ctx, args,
>>>>>    					complain,
>>>>>    					in_decl, /*entering_scope=*/1);
>>>>> +		if (DECL_SELF_REFERENCE_P (t))
>>>>> +		  /* The context and type of a injected-class-name are
>>>>> +		     the same, so we don't need to substitute both.  */
>>>>> +		  type = ctx;
>>>>>    		/* If CTX is unchanged, then T is in fact the
>>>>>    		   specialization we want.  That situation occurs when
>>>>>    		   referencing a static data member within in its own
>>>>> @@ -14900,14 +14905,22 @@ tsubst_decl (tree t, tree args, tsubst_flags_t
>>>>> complain)
>>>>>    	      {
>>>>>    		tmpl = DECL_TI_TEMPLATE (t);
>>>>>    		gen_tmpl = most_general_template (tmpl);
>>>>> -		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
>>>>> -		if (argvec != error_mark_node)
>>>>> -		  argvec = (coerce_innermost_template_parms
>>>>> -			    (DECL_TEMPLATE_PARMS (gen_tmpl),
>>>>> -			     argvec, t, complain,
>>>>> -			     /*all*/true, /*defarg*/true));
>>>>> -		if (argvec == error_mark_node)
>>>>> -		  RETURN (error_mark_node);
>>>>> +		if (DECL_SELF_REFERENCE_P (t))
>>>>> +		  /* The DECL_TI_ARGS for the injected-class-name are the
>>>>> +		     generic template arguments for the class template, so
>>>>> +		     substitution/coercion is just the identity mapping.  */
>>>>> +		  argvec = args;
>>>>
>>>> Would it make sense to extend this to any TEMPLATE_DECL for which
>>>> DECL_PRIMARY_TEMPLATE is the class template?  So, anything that gets here
>>>> except an alias template.
>>>
>>> Ah nice, it does look like we could extend this to any TEMPLATE_DECL that
>>> satisfies DECL_CLASS_SCOPE_P && !DECL_MEMBER_TEMPLATE_P (so that we also
>>> exclude static data member templates), i.e. any templated non-template
>>> member.  Is that equivalent to what you had in mind?
>>>
>>> Based on some light testing, if we do this, it seems we also need to
>>> handle 'args' having greater depth than 'DECL_TI_ARGS' here, something
>>> which could happen during satisfaction.
>>
>> ... or we could just restrict the optimization to when the argument
>> depths match, like so:
>>
>> -- >8 --
>>
>> Subject: [PATCH] c++: optimize specialization of nested templated classes
>>
>> gcc/cp/ChangeLog:
>>
>> 	* pt.cc (lookup_template_class): Remove dead stores to
>> 	context parameter.  Don't substitute the context of the
>> 	most general template if that of the partially instantiated
>> 	template is non-dependent.  Check the specializations table
>> 	again after completing the context of a nested dependent
>> 	specialization.
>> 	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
>> 	TYPE_CONTEXT or pass it to lookup_template_class.
>> 	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
>> 	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
>> ---
>>   gcc/cp/pt.cc | 73 ++++++++++++++++++++++++++++++++--------------------
>>   1 file changed, 45 insertions(+), 28 deletions(-)
>>
>> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
>> index 3154186ac20..ebd822373db 100644
>> --- a/gcc/cp/pt.cc
>> +++ b/gcc/cp/pt.cc
>> @@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>>   	  if (context)
>>   	    pop_decl_namespace ();
>>   	}
>> -      if (templ)
>> -	context = DECL_CONTEXT (templ);
>>       }
>>     else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (d1)))
>>       {
>> @@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>>       {
>>         templ = d1;
>>         d1 = DECL_NAME (templ);
>> -      context = DECL_CONTEXT (templ);
>>       }
>>     else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
>>       {
>> @@ -10059,8 +10056,26 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>>         context = DECL_CONTEXT (gen_tmpl);
>>         if (context && TYPE_P (context))
>>   	{
>> -	  context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
>> -	  context = complete_type (context);
>> +	  if (!uses_template_parms (DECL_CONTEXT (templ)))
>> +	    /* If the context of the partially instantiated template is
>> +	       already non-dependent, then we might as well use it.  */
>> +	    context = DECL_CONTEXT (templ);
>> +	  else
>> +	    {
>> +	      context = tsubst_aggr_type (context, arglist,
>> +					  complain, in_decl, true);
>> +	      context = complete_type (context);
>> +	      if (is_dependent_type && arg_depth > 1)
>> +		{
>> +		  /* If this is a dependent nested specialization such as
>> +		     A<int>::B<T>, then completion of A<int> might have
>> +		     registered this specialization of B for us, so check
>> +		     the table again (33959).  */
>> +		  entry = type_specializations->find_with_hash (&elt, hash);
>> +		  if (entry)
>> +		    return entry->spec;
>> +		}
>> +	    }
>>   	}
>>         else
>>   	context = tsubst (context, arglist, complain, in_decl);
>> @@ -13739,25 +13754,12 @@ tsubst_aggr_type (tree t,
>>         if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
>>   	{
>>   	  tree argvec;
>> -	  tree context;
>>   	  tree r;
>>   
>>   	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
>>   	  cp_evaluated ev;
>>   
>> -	  /* First, determine the context for the type we are looking
>> -	     up.  */
>> -	  context = TYPE_CONTEXT (t);
>> -	  if (context && TYPE_P (context))
>> -	    {
>> -	      context = tsubst_aggr_type (context, args, complain,
>> -					  in_decl, /*entering_scope=*/1);
>> -	      /* If context is a nested class inside a class template,
>> -	         it may still need to be instantiated (c++/33959).  */
>> -	      context = complete_type (context);
>> -	    }
>> -
>> -	  /* Then, figure out what arguments are appropriate for the
>> +	  /* Figure out what arguments are appropriate for the
>>   	     type we are trying to find.  For example, given:
>>   
>>   	       template <class T> struct S;
>> @@ -13772,7 +13774,7 @@ tsubst_aggr_type (tree t,
>>   	    r = error_mark_node;
>>   	  else
>>   	    {
>> -	      r = lookup_template_class (t, argvec, in_decl, context,
>> +	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
>>   					 entering_scope, complain);
>>   	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
>>   	    }
>> @@ -14913,6 +14915,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
>>   		ctx = tsubst_aggr_type (ctx, args,
>>   					complain,
>>   					in_decl, /*entering_scope=*/1);
>> +		if (DECL_SELF_REFERENCE_P (t))
>> +		  /* The context and type of an injected-class-name are
>> +		     the same, so we don't need to substitute both.  */
>> +		  type = ctx;
>>   		/* If CTX is unchanged, then T is in fact the
>>   		   specialization we want.  That situation occurs when
>>   		   referencing a static data member within in its own
>> @@ -14933,14 +14939,25 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
>>   	      {
>>   		tmpl = DECL_TI_TEMPLATE (t);
>>   		gen_tmpl = most_general_template (tmpl);
>> -		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
>> -		if (argvec != error_mark_node)
>> -		  argvec = (coerce_innermost_template_parms
>> -			    (DECL_TEMPLATE_PARMS (gen_tmpl),
>> -			     argvec, t, complain,
>> -			     /*all*/true, /*defarg*/true));
>> -		if (argvec == error_mark_node)
>> -		  RETURN (error_mark_node);
>> +		if (DECL_CLASS_SCOPE_P (tmpl)
>> +		    && !DECL_MEMBER_TEMPLATE_P (tmpl)
>> +		    && (TMPL_ARGS_DEPTH (args)
>> +			== TMPL_ARGS_DEPTH (DECL_TI_ARGS (t))))
>> +		  /* The DECL_TI_ARGS in this case are the generic template
>> +		     arguments for the class template, so substitution/coercion
>> +		     is just the identity mapping.  */
>> +		  argvec = args;
>> +		else
>> +		  {
>> +		    argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
>> +		    if (argvec != error_mark_node)
>> +		      argvec = (coerce_innermost_template_parms
>> +				(DECL_TEMPLATE_PARMS (gen_tmpl),
>> +				 argvec, t, complain,
>> +				 /*all*/true, /*defarg*/true));
>> +		    if (argvec == error_mark_node)
>> +		      RETURN (error_mark_node);
>> +		  }
> 
> Seems we could be a bit cleverer here by trying to avoid the
> coercion even if we have to do the substitution.
> 
> When the argument depth is less than the parameter depth, then the
> innermost substituted arguments are just lowered arguments from
> DECL_TI_ARGS, for which coercion would be a no-op, so we can just
> skip it.  This optimization seems to trigger 90+% of the time and
> compile time/memory by about 1%/0.5% in my experiments.  (The call to
> coerce_innermost_template_parms was added by r6-829-gb237c4cbd3da7a
> FWIW.)
> 
> -- >8 --
> 
> 
> Subject: [PATCH] c++: optimize specialization of nested templated classes
> 
> gcc/cp/ChangeLog:
> 
> 	* pt.cc (lookup_template_class): Remove dead stores to
> 	context parameter.  Don't substitute the context of the
> 	most general template if that of the partially instantiated
> 	template is non-dependent.  Check the specializations table
> 	again after completing the context of a nested dependent
> 	specialization.
> 	(tsubst_aggr_type) <case RECORD_TYPE>: Don't substitute
> 	TYPE_CONTEXT or pass it to lookup_template_class.
> 	(tsubst_decl) <case TYPE_DECL>: Avoid substituting the
> 	TREE_TYPE and DECL_TI_ARGS when DECL_SELF_REFERENCE_P.
> ---
>   gcc/cp/pt.cc | 74 ++++++++++++++++++++++++++++++++--------------------
>   1 file changed, 46 insertions(+), 28 deletions(-)
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 3154186ac20..faa5bc377cb 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -9840,8 +9840,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>   	  if (context)
>   	    pop_decl_namespace ();
>   	}
> -      if (templ)
> -	context = DECL_CONTEXT (templ);
>       }
>     else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (d1)))
>       {
> @@ -9868,7 +9866,6 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>       {
>         templ = d1;
>         d1 = DECL_NAME (templ);
> -      context = DECL_CONTEXT (templ);
>       }
>     else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
>       {
> @@ -10059,8 +10056,26 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
>         context = DECL_CONTEXT (gen_tmpl);
>         if (context && TYPE_P (context))
>   	{
> -	  context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
> -	  context = complete_type (context);
> +	  if (!uses_template_parms (DECL_CONTEXT (templ)))
> +	    /* If the context of the partially instantiated template is
> +	       already non-dependent, then we might as well use it.  */
> +	    context = DECL_CONTEXT (templ);
> +	  else
> +	    {
> +	      context = tsubst_aggr_type (context, arglist,
> +					  complain, in_decl, true);
> +	      context = complete_type (context);
> +	      if (is_dependent_type && arg_depth > 1)
> +		{
> +		  /* If this is a dependent nested specialization such as
> +		     A<int>::B<T>, then completion of A<int> might have
> +		     registered this specialization of B for us, so check
> +		     the table again (33959).  */
> +		  entry = type_specializations->find_with_hash (&elt, hash);
> +		  if (entry)
> +		    return entry->spec;
> +		}
> +	    }
>   	}
>         else
>   	context = tsubst (context, arglist, complain, in_decl);
> @@ -13739,25 +13754,12 @@ tsubst_aggr_type (tree t,
>         if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
>   	{
>   	  tree argvec;
> -	  tree context;
>   	  tree r;
>   
>   	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
>   	  cp_evaluated ev;
>   
> -	  /* First, determine the context for the type we are looking
> -	     up.  */
> -	  context = TYPE_CONTEXT (t);
> -	  if (context && TYPE_P (context))
> -	    {
> -	      context = tsubst_aggr_type (context, args, complain,
> -					  in_decl, /*entering_scope=*/1);
> -	      /* If context is a nested class inside a class template,
> -	         it may still need to be instantiated (c++/33959).  */
> -	      context = complete_type (context);
> -	    }
> -
> -	  /* Then, figure out what arguments are appropriate for the
> +	  /* Figure out what arguments are appropriate for the
>   	     type we are trying to find.  For example, given:
>   
>   	       template <class T> struct S;
> @@ -13772,7 +13774,7 @@ tsubst_aggr_type (tree t,
>   	    r = error_mark_node;
>   	  else
>   	    {
> -	      r = lookup_template_class (t, argvec, in_decl, context,
> +	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
>   					 entering_scope, complain);
>   	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
>   	    }
> @@ -14913,6 +14915,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
>   		ctx = tsubst_aggr_type (ctx, args,
>   					complain,
>   					in_decl, /*entering_scope=*/1);
> +		if (DECL_SELF_REFERENCE_P (t))
> +		  /* The context and type of an injected-class-name are
> +		     the same, so we don't need to substitute both.  */
> +		  type = ctx;
>   		/* If CTX is unchanged, then T is in fact the
>   		   specialization we want.  That situation occurs when
>   		   referencing a static data member within in its own
> @@ -14931,16 +14937,28 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
>   
>   	    if (!spec)
>   	      {
> +		int args_depth = TMPL_ARGS_DEPTH (args);
> +		int parms_depth = TMPL_ARGS_DEPTH (DECL_TI_ARGS (t));
>   		tmpl = DECL_TI_TEMPLATE (t);
>   		gen_tmpl = most_general_template (tmpl);
> -		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> -		if (argvec != error_mark_node)
> -		  argvec = (coerce_innermost_template_parms
> -			    (DECL_TEMPLATE_PARMS (gen_tmpl),
> -			     argvec, t, complain,
> -			     /*all*/true, /*defarg*/true));
> -		if (argvec == error_mark_node)
> -		  RETURN (error_mark_node);
> +		if (args_depth == parms_depth
> +		    && DECL_CLASS_SCOPE_P (tmpl)
> +		    && !DECL_MEMBER_TEMPLATE_P (tmpl))

I think these two lines could also be !PRIMARY_TEMPLATE_P (gen_tmpl). 
OK either way.

> +		  /* The DECL_TI_ARGS in this case are the generic template
> +		     arguments for the class template, so substitution is
> +		     just the identity mapping.  */
> +		  argvec = args;
> +		else
> +		  {
> +		    argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
> +		    if (args_depth >= parms_depth && argvec != error_mark_node)

Please add your note about the args < parms case as a comment.

> +		      argvec = (coerce_innermost_template_parms
> +				(DECL_TEMPLATE_PARMS (gen_tmpl),
> +				 argvec, t, complain,
> +				 /*all*/true, /*defarg*/true));
> +		    if (argvec == error_mark_node)
> +		      RETURN (error_mark_node);
> +		  }
>   		hash = hash_tmpl_and_args (gen_tmpl, argvec);
>   		spec = retrieve_specialization (gen_tmpl, argvec, hash);
>   	      }
  

Patch

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 59b94317e88..28023d60684 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9840,8 +9840,6 @@  lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
 	  if (context)
 	    pop_decl_namespace ();
 	}
-      if (templ)
-	context = DECL_CONTEXT (templ);
     }
   else if (TREE_CODE (d1) == TYPE_DECL && MAYBE_CLASS_TYPE_P (TREE_TYPE (d1)))
     {
@@ -9868,7 +9866,6 @@  lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
     {
       templ = d1;
       d1 = DECL_NAME (templ);
-      context = DECL_CONTEXT (templ);
     }
   else if (DECL_TEMPLATE_TEMPLATE_PARM_P (d1))
     {
@@ -10059,8 +10056,25 @@  lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
       context = DECL_CONTEXT (gen_tmpl);
       if (context && TYPE_P (context))
 	{
-	  context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
-	  context = complete_type (context);
+	  if (!uses_template_parms (DECL_CONTEXT (templ)))
+	    /* If the context of the partially instantiated template is
+	       already non-dependent, then we might as well use it.  */
+	    context = DECL_CONTEXT (templ);
+	  else
+	    {
+	      context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
+	      context = complete_type (context);
+	      if (is_dependent_type && arg_depth > 1)
+		{
+		  /* If this is a dependent nested specialization such as
+		     A<int>::B<T>, then completion of A<int> might have
+		     registered this specialization of B for us, so check
+		     the table again (33959).  */
+		  entry = type_specializations->find_with_hash (&elt, hash);
+		  if (entry)
+		    return entry->spec;
+		}
+	    }
 	}
       else
 	context = tsubst (context, arglist, complain, in_decl);
@@ -13711,25 +13725,12 @@  tsubst_aggr_type (tree t,
       if (TYPE_TEMPLATE_INFO (t) && uses_template_parms (t))
 	{
 	  tree argvec;
-	  tree context;
 	  tree r;
 
 	  /* In "sizeof(X<I>)" we need to evaluate "I".  */
 	  cp_evaluated ev;
 
-	  /* First, determine the context for the type we are looking
-	     up.  */
-	  context = TYPE_CONTEXT (t);
-	  if (context && TYPE_P (context))
-	    {
-	      context = tsubst_aggr_type (context, args, complain,
-					  in_decl, /*entering_scope=*/1);
-	      /* If context is a nested class inside a class template,
-	         it may still need to be instantiated (c++/33959).  */
-	      context = complete_type (context);
-	    }
-
-	  /* Then, figure out what arguments are appropriate for the
+	  /* Figure out what arguments are appropriate for the
 	     type we are trying to find.  For example, given:
 
 	       template <class T> struct S;
@@ -13744,7 +13745,7 @@  tsubst_aggr_type (tree t,
 	    r = error_mark_node;
 	  else
 	    {
-	      r = lookup_template_class (t, argvec, in_decl, context,
+	      r = lookup_template_class (t, argvec, in_decl, NULL_TREE,
 					 entering_scope, complain);
 	      r = cp_build_qualified_type (r, cp_type_quals (t), complain);
 	    }
@@ -14880,6 +14881,10 @@  tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 		ctx = tsubst_aggr_type (ctx, args,
 					complain,
 					in_decl, /*entering_scope=*/1);
+		if (DECL_SELF_REFERENCE_P (t))
+		  /* The context and type of a injected-class-name are
+		     the same, so we don't need to substitute both.  */
+		  type = ctx;
 		/* If CTX is unchanged, then T is in fact the
 		   specialization we want.  That situation occurs when
 		   referencing a static data member within in its own
@@ -14900,14 +14905,22 @@  tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 	      {
 		tmpl = DECL_TI_TEMPLATE (t);
 		gen_tmpl = most_general_template (tmpl);
-		argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
-		if (argvec != error_mark_node)
-		  argvec = (coerce_innermost_template_parms
-			    (DECL_TEMPLATE_PARMS (gen_tmpl),
-			     argvec, t, complain,
-			     /*all*/true, /*defarg*/true));
-		if (argvec == error_mark_node)
-		  RETURN (error_mark_node);
+		if (DECL_SELF_REFERENCE_P (t))
+		  /* The DECL_TI_ARGS for the injected-class-name are the
+		     generic template arguments for the class template, so
+		     substitution/coercion is just the identity mapping.  */
+		  argvec = args;
+		else
+		  {
+		    argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
+		    if (argvec != error_mark_node)
+		      argvec = (coerce_innermost_template_parms
+				(DECL_TEMPLATE_PARMS (gen_tmpl),
+				 argvec, t, complain,
+				 /*all*/true, /*defarg*/true));
+		    if (argvec == error_mark_node)
+		      RETURN (error_mark_node);
+		  }
 		hash = hash_tmpl_and_args (gen_tmpl, argvec);
 		spec = retrieve_specialization (gen_tmpl, argvec, hash);
 	      }