[Don't,call,clean_symbol_name,in,create_tmp_var_name,[PR116219]

Message ID ZrMsotHsL//YJsD+@tucnak
State New
Headers
Series [Don't,call,clean_symbol_name,in,create_tmp_var_name,[PR116219] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Test passed

Commit Message

Jakub Jelinek Aug. 7, 2024, 8:13 a.m. UTC
  Hi!

SRA adds fancy names like offset$D94316$_M_impl$D93629$_M_start
where the numbers in there are DECL_UIDs if there are unnamed
FIELD_DECLs etc.
Because -g0 vs. -g can cause differences between the exact DECL_UID
values (add bigger gaps in between them, corresponding decls should
still be ordered the same based on DECL_UID) we make sure such
decls have DECL_NAMELESS set and depending on exact options either don't
dump such names at all or dump_fancy_name sanitizes the D123456$ parts in
there to Dxxxx$.
Unfortunately in tons of places we then use get_name to grab either user
names or these SRA created names and use that as argument to
create_tmp_var{,_name,_raw} to base other artificial temporary names based
on that.  Those are DECL_NAMELESS too, but unfortunately create_tmp_var_name
starting with
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=725494f6e4121eace43b7db1202f8ecbf52a8276
calls clean_symbol_name which replaces the $s in there with _s and thus
dump_fancy_name doesn't sanitize it anymore.

I don't see any discussion of that commit (originally to TM branch, later
merged) on the mailing list, but from
   DECL_NAME (new_decl)
     = create_tmp_var_name (IDENTIFIER_POINTER (DECL_NAME (old_decl)));
-  SET_DECL_ASSEMBLER_NAME (new_decl, NULL_TREE);
+  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
snippet elsewhere in that commit it seems create_tmp_var_name was used at
that point also to determine function names of clones, so presumably the
clean_symbol_name at that point was to ensure the symbol could be emitted
into assembly, maybe in case DECL_NAME is something like C++ operators or
whatever could have there undesirable characters.

Anyway, we don't do that for years anymore, already GCC 4.5 uses for such
purposes clone_function_name which starts of DECL_ASSEMBLER_NAME of the old
function and appends based on supportable symbol suffix separators the
separator and some suffix and/or number, so that part doesn't go through
create_tmp_var_name.

I don't see problems with having the $ and . etc. characters in the names
intended just to make dumps more readable, after all, we already are using
those in the SRA created names.  Those names shouldn't make it into the
assembly in any way, neither debug info nor assembly labels.

There is one theoretical case, where the gimplifier promotes automatic
vars into TREE_STATIC ones and therefore those can then appear in assembly,
just in case it would be on e.g. SRA created names and regimplified later
I've added code to ignore the names and force C.NNN if it is a DECL_NAMELESS
with problematic characters in the name.

Richi mentioned on IRC that the non-cleaned up names might make things
harder to feed stuff back to the GIMPLE FE, but if so, I think it should be
the dumping for GIMPLE FE purposes that cleans those up (but at that point
it should also verify if some such cleaned up names don't collide with
others and somehow deal with those).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The -fcompare-debug failure on the testcase is gone, but the testcase
was huge and hard to reduce.

2024-08-06  Jakub Jelinek  <jakub@redhat.com>

	PR c++/116219
	* gimple-expr.cc (remove_suffix): Formatting fixes.
	(create_tmp_var_name): Don't call clean_symbol_name.
	* gimplify.cc (gimplify_init_constructor): When promoting automatic
	DECL_NAMELESS vars to static, only preserve their DECL_NAME if
	it doesn't contain any characters clean_symbol_name replaces.


	Jakub
  

Comments

Richard Biener Aug. 7, 2024, 9:03 a.m. UTC | #1
On Wed, 7 Aug 2024, Jakub Jelinek wrote:

> Hi!
> 
> SRA adds fancy names like offset$D94316$_M_impl$D93629$_M_start
> where the numbers in there are DECL_UIDs if there are unnamed
> FIELD_DECLs etc.
> Because -g0 vs. -g can cause differences between the exact DECL_UID
> values (add bigger gaps in between them, corresponding decls should
> still be ordered the same based on DECL_UID) we make sure such
> decls have DECL_NAMELESS set and depending on exact options either don't
> dump such names at all or dump_fancy_name sanitizes the D123456$ parts in
> there to Dxxxx$.
> Unfortunately in tons of places we then use get_name to grab either user
> names or these SRA created names and use that as argument to
> create_tmp_var{,_name,_raw} to base other artificial temporary names based
> on that.  Those are DECL_NAMELESS too, but unfortunately create_tmp_var_name
> starting with
> https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=725494f6e4121eace43b7db1202f8ecbf52a8276
> calls clean_symbol_name which replaces the $s in there with _s and thus
> dump_fancy_name doesn't sanitize it anymore.
> 
> I don't see any discussion of that commit (originally to TM branch, later
> merged) on the mailing list, but from
>    DECL_NAME (new_decl)
>      = create_tmp_var_name (IDENTIFIER_POINTER (DECL_NAME (old_decl)));
> -  SET_DECL_ASSEMBLER_NAME (new_decl, NULL_TREE);
> +  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
> snippet elsewhere in that commit it seems create_tmp_var_name was used at
> that point also to determine function names of clones, so presumably the
> clean_symbol_name at that point was to ensure the symbol could be emitted
> into assembly, maybe in case DECL_NAME is something like C++ operators or
> whatever could have there undesirable characters.
> 
> Anyway, we don't do that for years anymore, already GCC 4.5 uses for such
> purposes clone_function_name which starts of DECL_ASSEMBLER_NAME of the old
> function and appends based on supportable symbol suffix separators the
> separator and some suffix and/or number, so that part doesn't go through
> create_tmp_var_name.
> 
> I don't see problems with having the $ and . etc. characters in the names
> intended just to make dumps more readable, after all, we already are using
> those in the SRA created names.  Those names shouldn't make it into the
> assembly in any way, neither debug info nor assembly labels.
> 
> There is one theoretical case, where the gimplifier promotes automatic
> vars into TREE_STATIC ones and therefore those can then appear in assembly,
> just in case it would be on e.g. SRA created names and regimplified later
> I've added code to ignore the names and force C.NNN if it is a DECL_NAMELESS
> with problematic characters in the name.
> 
> Richi mentioned on IRC that the non-cleaned up names might make things
> harder to feed stuff back to the GIMPLE FE, but if so, I think it should be
> the dumping for GIMPLE FE purposes that cleans those up (but at that point
> it should also verify if some such cleaned up names don't collide with
> others and somehow deal with those).

My plan was to accept an additional character in identifiers as
extension with -fgimple and use that.  I think we already accept dollars
but dots are of course problematic and cannot be used.  Replacing
unwanted chars with $ and pre-existing $ with $$ might work.  There's
not many (ASCII) characters available, @ might be.

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> The -fcompare-debug failure on the testcase is gone, but the testcase
> was huge and hard to reduce.
> 
> 2024-08-06  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR c++/116219
> 	* gimple-expr.cc (remove_suffix): Formatting fixes.
> 	(create_tmp_var_name): Don't call clean_symbol_name.
> 	* gimplify.cc (gimplify_init_constructor): When promoting automatic
> 	DECL_NAMELESS vars to static, only preserve their DECL_NAME if
> 	it doesn't contain any characters clean_symbol_name replaces.
> 
> --- gcc/gimple-expr.cc.jj	2024-01-03 11:51:28.280776310 +0100
> +++ gcc/gimple-expr.cc	2024-08-06 14:43:42.328673383 +0200
> @@ -406,14 +406,12 @@ remove_suffix (char *name, int len)
>  {
>    int i;
>  
> -  for (i = 2;  i < 7 && len > i;  i++)
> -    {
> -      if (name[len - i] == '.')
> -	{
> -	  name[len - i] = '\0';
> -	  break;
> -	}
> -    }
> +  for (i = 2; i < 7 && len > i; i++)
> +    if (name[len - i] == '.')
> +      {
> +	name[len - i] = '\0';
> +	break;
> +      }
>  }
>  
>  /* Create a new temporary name with PREFIX.  Return an identifier.  */
> @@ -430,8 +428,6 @@ create_tmp_var_name (const char *prefix)
>        char *preftmp = ASTRDUP (prefix);
>  
>        remove_suffix (preftmp, strlen (preftmp));
> -      clean_symbol_name (preftmp);
> -
>        prefix = preftmp;
>      }
>  
> --- gcc/gimplify.cc.jj	2024-08-05 13:04:53.903116091 +0200
> +++ gcc/gimplify.cc	2024-08-06 15:27:40.404865291 +0200
> @@ -5599,6 +5599,18 @@ gimplify_init_constructor (tree *expr_p,
>  
>  	    DECL_INITIAL (object) = ctor;
>  	    TREE_STATIC (object) = 1;
> +	    if (DECL_NAME (object) && DECL_NAMELESS (object))
> +	      {
> +		const char *name = get_name (object);
> +		char *sname = ASTRDUP (name);
> +		clean_symbol_name (sname);
> +		/* If there are any undesirable characters in DECL_NAMELESS
> +		   name, just fall back to C.nnn name, we must ensure e.g.
> +		   SRA created names with DECL_UIDs don't make it into
> +		   assembly.  */
> +		if (strcmp (name, sname))
> +		  DECL_NAME (object) = NULL_TREE;
> +	      }

Did you actually run into such a case?  I'd expect
gimplify_init_constructor only happening on the original GENERIC IL.

In any case how about the simpler

        if (!DECL_NAME (object) || DECL_NAMELESS (object))
          DECL_NAME (object) = create_tmp_var_name ("C");

?

Otherwise looks OK to me.  I guess this should get quite some
soaking time before backporting (if that was intended).

Thanks,
Richard.

>  	    if (!DECL_NAME (object))
>  	      DECL_NAME (object) = create_tmp_var_name ("C");
>  	    walk_tree (&DECL_INITIAL (object), force_labels_r, NULL, NULL);
> 
> 	Jakub
>
  
Jakub Jelinek Aug. 7, 2024, 9:10 a.m. UTC | #2
On Wed, Aug 07, 2024 at 11:03:18AM +0200, Richard Biener wrote:
> > Richi mentioned on IRC that the non-cleaned up names might make things
> > harder to feed stuff back to the GIMPLE FE, but if so, I think it should be
> > the dumping for GIMPLE FE purposes that cleans those up (but at that point
> > it should also verify if some such cleaned up names don't collide with
> > others and somehow deal with those).
> 
> My plan was to accept an additional character in identifiers as
> extension with -fgimple and use that.  I think we already accept dollars
> but dots are of course problematic and cannot be used.  Replacing
> unwanted chars with $ and pre-existing $ with $$ might work.  There's
> not many (ASCII) characters available, @ might be.

Some kind of mangling... ;)

> > --- gcc/gimplify.cc.jj	2024-08-05 13:04:53.903116091 +0200
> > +++ gcc/gimplify.cc	2024-08-06 15:27:40.404865291 +0200
> > @@ -5599,6 +5599,18 @@ gimplify_init_constructor (tree *expr_p,
> >  
> >  	    DECL_INITIAL (object) = ctor;
> >  	    TREE_STATIC (object) = 1;
> > +	    if (DECL_NAME (object) && DECL_NAMELESS (object))
> > +	      {
> > +		const char *name = get_name (object);
> > +		char *sname = ASTRDUP (name);
> > +		clean_symbol_name (sname);
> > +		/* If there are any undesirable characters in DECL_NAMELESS
> > +		   name, just fall back to C.nnn name, we must ensure e.g.
> > +		   SRA created names with DECL_UIDs don't make it into
> > +		   assembly.  */
> > +		if (strcmp (name, sname))
> > +		  DECL_NAME (object) = NULL_TREE;
> > +	      }
> 
> Did you actually run into such a case?

No, but I haven't collected statistics about that (so didn't add
some fprintf there to log if it happened).

> I'd expect
> gimplify_init_constructor only happening on the original GENERIC IL.

In theory one can construct INIT_EXPR and gimplify it in later passes as
well.  But it is unlikely, sure.

> In any case how about the simpler
> 
>         if (!DECL_NAME (object) || DECL_NAMELESS (object))
>           DECL_NAME (object) = create_tmp_var_name ("C");
> 
> ?

This is what I had in my first version, but I was worried too many things
would be DECL_NAMELESS during gimplification.  I think I'll do another
bootstrap/regtest with statistics gathering to see what is worth.

> Otherwise looks OK to me.  I guess this should get quite some
> soaking time before backporting (if that was intended).

Definitely.  At least two months I'd say.

	Jakub
  
Jakub Jelinek Aug. 7, 2024, 6:19 p.m. UTC | #3
On Wed, Aug 07, 2024 at 11:10:37AM +0200, Jakub Jelinek wrote:
> > In any case how about the simpler
> > 
> >         if (!DECL_NAME (object) || DECL_NAMELESS (object))
> >           DECL_NAME (object) = create_tmp_var_name ("C");
> > 
> > ?
> 
> This is what I had in my first version, but I was worried too many things
> would be DECL_NAMELESS during gimplification.  I think I'll do another
> bootstrap/regtest with statistics gathering to see what is worth.

As {x86_64,i686,powerpc64le}-linux bootstraps/regtests never triggered
that, I've used the above in what I've committed:

2024-08-07  Jakub Jelinek  <jakub@redhat.com>

	PR c++/116219
	* gimple-expr.cc (remove_suffix): Formatting fixes.
	(create_tmp_var_name): Don't call clean_symbol_name.
	* gimplify.cc (gimplify_init_constructor): When promoting automatic
	DECL_NAMELESS vars to static, don't preserve their DECL_NAME.

--- gcc/gimple-expr.cc.jj	2024-01-03 11:51:28.280776310 +0100
+++ gcc/gimple-expr.cc	2024-08-06 14:43:42.328673383 +0200
@@ -406,14 +406,12 @@ remove_suffix (char *name, int len)
 {
   int i;
 
-  for (i = 2;  i < 7 && len > i;  i++)
-    {
-      if (name[len - i] == '.')
-	{
-	  name[len - i] = '\0';
-	  break;
-	}
-    }
+  for (i = 2; i < 7 && len > i; i++)
+    if (name[len - i] == '.')
+      {
+	name[len - i] = '\0';
+	break;
+      }
 }
 
 /* Create a new temporary name with PREFIX.  Return an identifier.  */
@@ -430,8 +428,6 @@ create_tmp_var_name (const char *prefix)
       char *preftmp = ASTRDUP (prefix);
 
       remove_suffix (preftmp, strlen (preftmp));
-      clean_symbol_name (preftmp);
-
       prefix = preftmp;
     }
 
--- gcc/gimplify.cc.jj	2024-08-05 13:04:53.903116091 +0200
+++ gcc/gimplify.cc	2024-08-07 20:13:05.443907561 +0200
@@ -5599,7 +5599,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 
 	    DECL_INITIAL (object) = ctor;
 	    TREE_STATIC (object) = 1;
-	    if (!DECL_NAME (object))
+	    if (!DECL_NAME (object) || DECL_NAMELESS (object))
 	      DECL_NAME (object) = create_tmp_var_name ("C");
 	    walk_tree (&DECL_INITIAL (object), force_labels_r, NULL, NULL);
 


	Jakub
  

Patch

--- gcc/gimple-expr.cc.jj	2024-01-03 11:51:28.280776310 +0100
+++ gcc/gimple-expr.cc	2024-08-06 14:43:42.328673383 +0200
@@ -406,14 +406,12 @@  remove_suffix (char *name, int len)
 {
   int i;
 
-  for (i = 2;  i < 7 && len > i;  i++)
-    {
-      if (name[len - i] == '.')
-	{
-	  name[len - i] = '\0';
-	  break;
-	}
-    }
+  for (i = 2; i < 7 && len > i; i++)
+    if (name[len - i] == '.')
+      {
+	name[len - i] = '\0';
+	break;
+      }
 }
 
 /* Create a new temporary name with PREFIX.  Return an identifier.  */
@@ -430,8 +428,6 @@  create_tmp_var_name (const char *prefix)
       char *preftmp = ASTRDUP (prefix);
 
       remove_suffix (preftmp, strlen (preftmp));
-      clean_symbol_name (preftmp);
-
       prefix = preftmp;
     }
 
--- gcc/gimplify.cc.jj	2024-08-05 13:04:53.903116091 +0200
+++ gcc/gimplify.cc	2024-08-06 15:27:40.404865291 +0200
@@ -5599,6 +5599,18 @@  gimplify_init_constructor (tree *expr_p,
 
 	    DECL_INITIAL (object) = ctor;
 	    TREE_STATIC (object) = 1;
+	    if (DECL_NAME (object) && DECL_NAMELESS (object))
+	      {
+		const char *name = get_name (object);
+		char *sname = ASTRDUP (name);
+		clean_symbol_name (sname);
+		/* If there are any undesirable characters in DECL_NAMELESS
+		   name, just fall back to C.nnn name, we must ensure e.g.
+		   SRA created names with DECL_UIDs don't make it into
+		   assembly.  */
+		if (strcmp (name, sname))
+		  DECL_NAME (object) = NULL_TREE;
+	      }
 	    if (!DECL_NAME (object))
 	      DECL_NAME (object) = create_tmp_var_name ("C");
 	    walk_tree (&DECL_INITIAL (object), force_labels_r, NULL, NULL);