libcpp: Fix _Pragma in #__VA_ARGS__ [PR103165]

Message ID 5e77a918-1943-03dc-5bb7-20eebfc61226@codesourcery.com
State New
Headers
Series libcpp: Fix _Pragma in #__VA_ARGS__ [PR103165] |

Commit Message

Tobias Burnus Nov. 10, 2021, 8:12 p.m. UTC
  This patch fixes a pseudo regression of my previous _Pragma patch.

The issue was that a tokenized '_Pragma' (-> CPP_PRAGMA) could end
up as to-be quoted argument ('#__VA_ARG__') and that wasn't never
handled and gave an ICE for a GCC after my previous patch and before
this patch.

The expected 'gcc -E' output is in the '#if 0' block in the testcase.

Before my previous patch, "gcc -E" yielded the following, which is
obviously wrong:
   const char *str =
#pragma omp error severity(warning) message ("Test") at(compilation)
   "\"1,2\" ;" ;
#pragma omp error severity(warning) message ("Test") at(compilation)
   ;

Build on x86-64-gnu-linux, the "make -k check" is running.
OK when it passes?

  * * *

Disclaimer: While this patch does a step into the right direction,
it probably does help with any of the other _Pragma issues. Neither
with 'gcc -E' when the pragma wasn't registered (still expanded too
early) nor with the 'GCC diagnostic' issues in general as there the
input_location is used to decide when to pop - and depending on the
column numbers, this may or may not work.

See https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582927.html
for some more details and links to PRs.

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
  

Comments

Joseph Myers Nov. 10, 2021, 9:30 p.m. UTC | #1
On Wed, 10 Nov 2021, Tobias Burnus wrote:

> Disclaimer: While this patch does a step into the right direction,
> it probably does help with any of the other _Pragma issues. Neither
> with 'gcc -E' when the pragma wasn't registered (still expanded too
> early) nor with the 'GCC diagnostic' issues in general as there the
> input_location is used to decide when to pop - and depending on the
> column numbers, this may or may not work.

And fully correct stringization of _Pragma should respect the spelling of 
the preprocessing tokens (of the string-literal preprocessing token, that 
is; spelling variations for the other preprocessing tokens aren't possible 
here) and the presence or absence of whitespace between them.

_Pragma("foo")
_Pragma ("foo")
_Pragma("foo" )
_Pragma(L"foo")
_Pragma ( "foo" )

(for example) should all have their spelling preserved by stringization 
(but any nonempty white space sequence becomes a single space).
  
Jakub Jelinek Nov. 18, 2021, 11:24 a.m. UTC | #2
On Wed, Nov 10, 2021 at 09:30:29PM +0000, Joseph Myers wrote:
> On Wed, 10 Nov 2021, Tobias Burnus wrote:
> 
> > Disclaimer: While this patch does a step into the right direction,
> > it probably does help with any of the other _Pragma issues. Neither
> > with 'gcc -E' when the pragma wasn't registered (still expanded too
> > early) nor with the 'GCC diagnostic' issues in general as there the
> > input_location is used to decide when to pop - and depending on the
> > column numbers, this may or may not work.
> 
> And fully correct stringization of _Pragma should respect the spelling of 
> the preprocessing tokens (of the string-literal preprocessing token, that 
> is; spelling variations for the other preprocessing tokens aren't possible 
> here) and the presence or absence of whitespace between them.
> 
> _Pragma("foo")
> _Pragma ("foo")
> _Pragma("foo" )
> _Pragma(L"foo")
> _Pragma ( "foo" )
> 
> (for example) should all have their spelling preserved by stringization 
> (but any nonempty white space sequence becomes a single space).

Yeah.  And not just that, I think also all the exact whitespace in the
string literal (this time with no replacement of nonempty white space with a
single space).

Consider in pragma-3.c e.g.
#define inner(...) #__VA_ARGS__ ; _Pragma   (	"   omp		error severity   (warning)	message (\"Test\") at(compilation)" )
should yield:
  const char *str = "\"1,2\" ; _Pragma ( \"   omp		error severity   (warning)	message (\\\"Test\\\") at(compilation)\" )";

I guess we could encode the PREV_WHITE flags from the ( and ) tokens as 2 separate
bits somewhere (e.g. in some bits of the pragma id), but we need to encode the whole
string literal somewhere too.
Now, in cpp_token we have:
  union cpp_token_u {

    /* Caller-supplied identifier for a CPP_PRAGMA.  */
    unsigned int GTY ((tag ("CPP_TOKEN_FLD_PRAGMA"))) pragma;
  }
where several other members of the union are structs, either with a pair
of unsigned and pointer or two pointers.  So, could we make
the pragma union member also a struct with the pragma id and
pointer to the _Pragma string literal cpp_token?

Though, that doesn't solve the case where in destringize_and_run
pfile->directive_result.type != CPP_PRAGMA.

Are we handling the pragma at a wrong phase of preprocessing?

	Jakub
  
Joseph Myers Nov. 18, 2021, 9:55 p.m. UTC | #3
On Thu, 18 Nov 2021, Jakub Jelinek via Gcc-patches wrote:

> Are we handling the pragma at a wrong phase of preprocessing?

I think that converting it to a single preprocessing token (rather than 
four separate preprocessing tokens), at a stage when stringizing might 
still occur, does indicate it's being processed too soon, and it would be 
better to do that only when it's known that the _Pragma preprocessing 
token will actually occur in the results of preprocessing the source file.
  

Patch

libcpp: Fix _Pragma in #__VA_ARGS__ [PR103165]

Using '#define inner(...) #__VA_ARGS__ _Pragma("...")' yields a string plus
the _Pragma, passing this to another '#__VA_ARGS__' lead to having a
CPP_PRAGMA inside stringify_arg, which wasn't handled before this commit and
gave an ICE.

In GCC versions before r12-4797-g0078a058 (cf. PR102409), instead of giving
an ICE, the _Pragma wasn't stringified but output as a #pragma before the
actual macro expansion.

	PR preprocessor/103165
libcpp/ChangeLog:

	* directives.c (_cpp_get_pragma_by_id): New.
	* internal.h (_cpp_get_pragma_by_id): New prototype.
	* macro.c (stringify_arg): Use it; hande stringification of _Pragma.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/pragma-3.c: New test.
	* c-c++-common/gomp/pragma-4.c: New test.

 gcc/testsuite/c-c++-common/gomp/pragma-3.c | 20 ++++++++++++++++++++
 gcc/testsuite/c-c++-common/gomp/pragma-4.c | 20 ++++++++++++++++++++
 libcpp/directives.c                        | 23 +++++++++++++++++++++++
 libcpp/internal.h                          |  3 +++
 libcpp/macro.c                             | 29 +++++++++++++++++++++++++++--
 5 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-3.c b/gcc/testsuite/c-c++-common/gomp/pragma-3.c
new file mode 100644
index 00000000000..f8b741138e7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-3.c
@@ -0,0 +1,20 @@ 
+/* { dg-additional-options "-fdump-tree-original" }  */
+/* PR preprocessor/103165  */
+
+#define inner(...) #__VA_ARGS__ ; _Pragma("omp error severity(warning) message (\"Test\") at(compilation)")
+#define outer(...) inner(__VA_ARGS__)
+
+void
+f (void)
+{
+  const char *str = outer(inner(1,2));  /* { dg-warning "'pragma omp error' encountered: Test" } */
+}
+
+#if 0
+After preprocessing, the expected result are the following three lines:
+     const char *str = "\"1,2\" ; _Pragma(\"omp error severity(warning) message (\"Test\") at(compilation)\")" ;
+#pragma omp error severity(warning) message ("Test") at(compilation)
+                                     ;
+#endif
+
+/* { dg-final { scan-tree-dump "const char \\* str = \\(const char \\*\\) \"\\\\\"1,2\\\\\" ; _Pragma\\(\\\\\"omp error severity\\(warning\\) message \\(\\\\\"Test\\\\\"\\) at\\(compilation\\)\\\\\"\\)\";" "original" } }  */
diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-4.c b/gcc/testsuite/c-c++-common/gomp/pragma-4.c
new file mode 100644
index 00000000000..608dddc7e6a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-4.c
@@ -0,0 +1,20 @@ 
+/* { dg-additional-options "-fdump-tree-original -save-temps" }  */
+/* PR preprocessor/103165  */
+
+#define inner(...) #__VA_ARGS__ ; _Pragma("omp error severity(warning) message (\"Test\") at(compilation)")
+#define outer(...) inner(__VA_ARGS__)
+
+void
+f (void)
+{
+  const char *str = outer(inner(1,2));  /* { dg-warning "'pragma omp error' encountered: Test" } */
+}
+
+#if 0
+After preprocessing, the expected result are the following three lines:
+     const char *str = "\"1,2\" ; _Pragma(\"omp error severity(warning) message (\"Test\") at(compilation)\")" ;
+#pragma omp error severity(warning) message ("Test") at(compilation)
+                                     ;
+#endif
+
+/* { dg-final { scan-tree-dump "const char \\* str = \\(const char \\*\\) \"\\\\\"1,2\\\\\" ; _Pragma\\(\\\\\"omp error severity\\(warning\\) message \\(\\\\\"Test\\\\\"\\) at\\(compilation\\)\\\\\"\\)\";" "original" } }  */
diff --git a/libcpp/directives.c b/libcpp/directives.c
index 34f7677f718..9eefb20d4d4 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -1236,6 +1236,29 @@  do_ident (cpp_reader *pfile)
   check_eol (pfile, false);
 }
 
+/* Convert a pragma id back to the space + name.  Currently used by
+   stringify_arg for _Pragma.  The assumption is that it is only very rarely
+   called such that O(num-pragmas + num-pragma-spaces) checks is acceptable.  */
+void
+_cpp_get_pragma_by_id (cpp_reader *pfile, const cpp_token *token,
+		       cpp_hashnode const **space, cpp_hashnode const **name)
+{
+  struct pragma_entry *s, *n = NULL;
+  for (s = pfile->pragmas; s; s = s->next)
+    {
+      if (!s->is_nspace)
+	continue;
+      for (n = s->u.space; n; n = n->next)
+	if (token->val.pragma == n->u.ident)
+	  break;
+      if (n)
+	break;
+    }
+  gcc_assert (s && n);
+  *space = s->pragma;
+  *name = n->pragma;
+}
+
 /* Lookup a PRAGMA name in a singly-linked CHAIN.  Returns the
    matching entry, or NULL if none is found.  The returned entry could
    be the start of a namespace chain, or a pragma.  */
diff --git a/libcpp/internal.h b/libcpp/internal.h
index 8577cab6c83..9e31564cf23 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -762,6 +762,9 @@  extern void _cpp_define_builtin (cpp_reader *, const char *);
 extern char ** _cpp_save_pragma_names (cpp_reader *);
 extern void _cpp_restore_pragma_names (cpp_reader *, char **);
 extern int _cpp_do__Pragma (cpp_reader *, location_t);
+extern void _cpp_get_pragma_by_id (cpp_reader *, const cpp_token*,
+				   cpp_hashnode const **,
+				   cpp_hashnode const **);
 extern void _cpp_init_directives (cpp_reader *);
 extern void _cpp_init_internal_pragmas (cpp_reader *);
 extern void _cpp_do_file_change (cpp_reader *, enum lc_reason, const char *,
diff --git a/libcpp/macro.c b/libcpp/macro.c
index b2f797cae35..1642b96e835 100644
--- a/libcpp/macro.c
+++ b/libcpp/macro.c
@@ -839,6 +839,7 @@  stringify_arg (cpp_reader *pfile, const cpp_token **first, unsigned int count,
   unsigned int i, escape_it, backslash_count = 0;
   const cpp_token *source = NULL;
   size_t len;
+  cpp_hashnode const *pragma_space = NULL, *pragma_name = NULL;
 
   if (BUFF_ROOM (pfile->u_buff) < 3)
     _cpp_extend_buff (pfile, &pfile->u_buff, 3);
@@ -887,7 +888,15 @@  stringify_arg (cpp_reader *pfile, const cpp_token **first, unsigned int count,
 
       /* Room for each char being written in octal, initial space and
 	 final quote and NUL.  */
-      len = cpp_token_len (token);
+      if (token->type == CPP_PRAGMA)
+	{
+	  gcc_assert (token->flags & PRAGMA_OP);
+	  _cpp_get_pragma_by_id (pfile, token, &pragma_space, &pragma_name);
+	  len = (strlen ("_Pragma(\"") + pragma_space->ident.len
+		 + 1 + pragma_name->ident.len);
+	}
+      else
+	len = cpp_token_len (token);
       if (escape_it)
 	len *= 4;
       len += 3;
@@ -909,7 +918,23 @@  stringify_arg (cpp_reader *pfile, const cpp_token **first, unsigned int count,
 	}
       source = NULL;
 
-      if (escape_it)
+      if (token->type == CPP_PRAGMA)
+	{
+	  memcpy (dest, "_Pragma(\\\"", strlen("_Pragma(\\\""));
+	  dest += strlen("_Pragma(\\\"");
+	  memcpy (dest, pragma_space->ident.str, pragma_space->ident.len);
+	  dest += pragma_space->ident.len;
+	  *dest = ' ';
+	  dest++;
+	  memcpy (dest, pragma_name->ident.str, pragma_name->ident.len);
+	  dest += pragma_name->ident.len;
+	}
+      else if (token->type == CPP_PRAGMA_EOL)
+	{
+	  memcpy (dest, "\\\")", 3);
+	  dest += 3;
+	}
+      else if (escape_it)
 	{
 	  _cpp_buff *buff = _cpp_get_buff (pfile, len);
 	  unsigned char *buf = BUFF_FRONT (buff);