From patchwork Fri Jun 17 22:21:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 55180 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 237AD3857404 for ; Fri, 17 Jun 2022 22:21:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 237AD3857404 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655504502; bh=cJfWtV+4bQyvO/YWbk/KsTvf8TMfFJLYcZCxrwr/d8Y=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=DqO15uIJ8OltzPdmCzjnV0+Ugs851CJ6BGY4PZiWRQrKsGmd/9deVWiwSHlePbcEz KDD0Wn7RJeEacHjmQJ1nJE/WDWNb/dYjaHSGzL6PbWQRBg0V4eSLt1phNMB3hxKHZU HZvG0nf6Li9DOVrSP89uKe4vXPpi3xc8bws1XylE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by sourceware.org (Postfix) with ESMTPS id 0F40D3857C71 for ; Fri, 17 Jun 2022 22:21:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0F40D3857C71 Received: by mail-qk1-x733.google.com with SMTP id a184so4121984qkg.5 for ; Fri, 17 Jun 2022 15:21:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition; bh=cJfWtV+4bQyvO/YWbk/KsTvf8TMfFJLYcZCxrwr/d8Y=; b=l3xFiGGjZZkcvJQ/l3mCTJFIhvnQDbV38u5yC7Uby3szQ0pHDxn0q6PufTRhP+mgzi An6rgA1Bifx+UQjz40QWX/bjfZdDBbUZtHZTMjIPHW03g6r6hhWUq6kTu3ufPBkcS53z 05Cf/57QyYp+LG780enUEBnmET67h+LzgYdQvf0yu/xEYWXXgJq9O2WjFXCWMfeI0bYI vgBVnK1RMAFdLjcBv5dDq9APeZC/+BqL2db4mATt4InNM/AP0P9eWpU9veBHNL7gNRHD 1QLzc+Uz5byw3Jt35Fei7vtAPTU6uvx1Y2WWNtmOzYi1h4FoK377JmR2RPATivMQ9/eO A6GA== X-Gm-Message-State: AJIora8zIhlyCvVGpYHrs2KtPelJhYH3zGHmQPnbnwB04nm8qxlsV/u9 FCOMdc0I6jk64JEusvM/zc7uNmmzvE8= X-Google-Smtp-Source: AGRyM1sbhZCdmNoLxAiU02IH76tyfculqB8/sLbGuWSXoiSPzJ/+8CCtbzBBPyaL/hzqIZkEjisuJg== X-Received: by 2002:a05:620a:1792:b0:6a6:c734:d983 with SMTP id ay18-20020a05620a179200b006a6c734d983mr8941589qkb.682.1655504469062; Fri, 17 Jun 2022 15:21:09 -0700 (PDT) Received: from ldh-imac.local (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id d26-20020ac8535a000000b00304ef46c06dsm5052407qto.57.2022.06.17.15.21.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 15:21:08 -0700 (PDT) Date: Fri, 17 Jun 2022 18:21:06 -0400 To: gcc-patches@gcc.gnu.org Subject: [PATCH] libcpp: Support raw strings with newlines while processing directives [PR55971] Message-ID: <20220617222106.GA79014@ldh-imac.local> MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-3039.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hello- The attached fixes PR preprocessor/55971: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55971 , which is the issue that we don't currently allow raw string literals containing embedded newlines to appear in #define. With the patch, they can be used in any preprocessing directive. While I was in there, I also added support for the argument to _Pragma() to be a raw string, which was not supported (with or without newlines) previously, but I think is nice to have. Bootstrap + regtest all languages on x86-64 Linux looks good, with no new failures and the new testcases passing: FAIL 103 103 PASS 542454 542574 UNSUPPORTED 15248 15248 UNTESTED 136 136 XFAIL 4166 4166 XPASS 17 17 Please let me know if it looks OK? Thanks! -Lewis [PATCH] libcpp: Support raw strings with newlines while processing directives [PR55971] It's not currently possible to use a C++11 raw string containing a newline as part of the definition of a macro, or in any other preprocessing directive, such as: #define X R"(two lines)" #error R"(this error has two lines)" This patch adds support for that by relaxing the conditions under which _cpp_get_fresh_line() refuses to get a new line. For the case of lexing a raw string, it's OK to do so as long as there is another line within the current buffer. The code in cpp_get_fresh_line() was refactored into a new function get_fresh_line_impl(), so that the new logic is applied only when processing a raw string and not any other times. gcc -E needed a small tweak now that it's possible to get a token from macro expansion which contains a newline; in that case, c-ppoutput.c needs to check and avoid incrementing its internal line counter in that case, otherwise it erroneously prints a line change marker after printing the expansion of a macro with an embedded newline. I have added testcases for all preprocessing directives to make sure they are OK with these kinds of raw strings. While doing that it became apparent that we do not currently accept a raw string (with or without embedded newlines) as the argument to _Pragma(). That was pretty straightforward to add, so I have done that as well, since it seems potentially handy to avoid needing to escape all the quotes inside the pragma, plus clang accepts this as well. PR preprocessor/55971 libcpp/ChangeLog: * directives.cc (destringize_and_run): Support C++11 raw strings as the argument to _Pragma(). * lex.cc (get_fresh_line_impl): New function refactoring the code from... (_cpp_get_fresh_line): ...here. (lex_raw_string): Use the new version of get_fresh_line_impl() to support raw strings containing new lines when processing a directive. gcc/testsuite/ChangeLog: * c-c++-common/raw-string-directive-1.c: New test. * c-c++-common/raw-string-directive-2.c: New test. gcc/c-family/ChangeLog: * c-ppoutput.cc (token_streamer::stream): Don't call account_for_newlines() if the tokens came from macro expansion. diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc index 9de46a9655f..4f3576fa273 100644 --- a/gcc/c-family/c-ppoutput.cc +++ b/gcc/c-family/c-ppoutput.cc @@ -292,11 +292,13 @@ token_streamer::stream (cpp_reader *pfile, const cpp_token *token, print.printed = true; } - /* CPP_COMMENT tokens and raw-string literal tokens can have - embedded new-line characters. Rather than enumerating all the - possible token types just check if token uses val.str union - member. */ - if (cpp_token_val_index (token) == CPP_TOKEN_FLD_STR) + /* CPP_COMMENT tokens and raw-string literal tokens can have embedded + new-line characters. Rather than enumerating all the possible token + types, just check if token uses val.str union member. If the token came + from a macro expansion, then no adjustment should be made since the + new-line characters did not appear in the source. */ + if (cpp_token_val_index (token) == CPP_TOKEN_FLD_STR + && !from_macro_expansion_at (loc)) account_for_newlines (token->val.str.text, token->val.str.len); } diff --git a/gcc/testsuite/c-c++-common/raw-string-directive-1.c b/gcc/testsuite/c-c++-common/raw-string-directive-1.c new file mode 100644 index 00000000000..810f11256fa --- /dev/null +++ b/gcc/testsuite/c-c++-common/raw-string-directive-1.c @@ -0,0 +1,77 @@ +/* { dg-do compile } */ +/* { dg-options "-std=gnu99" { target c } } */ +/* { dg-options "-std=c++11" { target c++ } } */ + +/* Test that multi-line raw strings are lexed OK for all preprocessing + directives where one could appear. Test raw-string-directive-2.c + checks that #define is also processed properly. */ + +/* Note that in cases where we cause GCC to produce a multi-line error + message, we construct the string so that the second line looks enough + like an error message for DejaGNU to process it as such, so that we + can use dg-warning or dg-error directives to check for it. */ + +#warning R"delim(line1 /* { dg-warning "line1" } */ +file:15:1: warning: line2)delim" /* { dg-warning "line2" } */ + +#error R"delim(line3 /* { dg-error "line3" } */ +file:18:1: error: line4)delim" /* { dg-error "line4" } */ + +#define X1 R"(line 5 +line 6 +line 7 +line 8 +/* +// +line 9)" R"delim( +line10)delim" + +#define X2(a) X1 #a R"(line 11 +/* +line12 +)" + +#if R"(line 13 /* { dg-error "line13" } */ +file:35:1: error: line14)" /* { dg-error "line14\\)\"\" is not valid" } */ +#endif R"(line 15 /* { dg-warning "extra tokens at end of #endif" } */ +\ +line16)" "" + +#ifdef XYZ R"(line17 /* { dg-warning "extra tokens at end of #ifdef" } */ +\ +\ +line18)" +#endif + +#if 1 +#else R"(line23 /* { dg-warning "extra tokens at end of #else" } */ +\ + +line24)" +#endif + +#if 0 +#elif R"(line 25 /* { dg-error "line25" } */ +file:55:1: error: line26)" /* { dg-error "line26\\)\"\" is not valid" } */ +#endif + +#line 60 R"(file:60:1: warning: this file has a space +in it!)" +#warning "line27" /* { dg-warning "line27" } */ +/* { dg-warning "this file has a space" "#line check" { target *-*-* } 60 } */ +#line 63 "file" + +#undef X1 R"(line28 /* { dg-warning "extra tokens at end of #undef" } */ +line29 +\ +)" + +#ident R"(line30 +line31)" R"(line 32 /* { dg-warning "extra tokens at end of #ident" } */ +line 33)" + +#pragma GCC diagnostic ignored R"(-Woption /* { dg-warning "-Wpragmas" } */ +-with-a-newline)" + +_Pragma(R"delim1(GCC diagnostic ignored R"delim2(-Woption /* { dg-warning "-Wpragmas" } */ +-with-a-newline)delim2")delim1") diff --git a/gcc/testsuite/c-c++-common/raw-string-directive-2.c b/gcc/testsuite/c-c++-common/raw-string-directive-2.c new file mode 100644 index 00000000000..6fc673ccd82 --- /dev/null +++ b/gcc/testsuite/c-c++-common/raw-string-directive-2.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options "-std=gnu99" { target c } } */ +/* { dg-options "-std=c++11" { target c++ } } */ + +#define S1 R"(three +line +string)" + +#define S2 R"(pasted +two line)" " string" + +#define X(a, b) a b R"( +one more)" + +const char *s1 = S1; +const char *s2 = S2; +const char *s3 = X(S1, R"( +with this line plus)"); + +int main () +{ + const char s1_correct[] = "three\nline\nstring"; + if (__builtin_memcmp (s1, s1_correct, sizeof s1_correct)) + __builtin_abort (); + + const char s2_correct[] = "pasted\ntwo line string"; + if (__builtin_memcmp (s2, s2_correct, sizeof s2_correct)) + __builtin_abort (); + + const char s3_correct[] = "three\nline\nstring\nwith this line plus\none more"; + if (__builtin_memcmp (s3, s3_correct, sizeof s3_correct)) + __builtin_abort (); +} diff --git a/libcpp/directives.cc b/libcpp/directives.cc index f804a441f39..72c7f6fd0ab 100644 --- a/libcpp/directives.cc +++ b/libcpp/directives.cc @@ -1859,8 +1859,7 @@ static void destringize_and_run (cpp_reader *pfile, const cpp_string *in, location_t expansion_loc) { - const unsigned char *src, *limit; - char *dest, *result; + uchar *dest, *result; cpp_context *saved_context; cpp_token *saved_cur_token; tokenrun *saved_cur_run; @@ -1868,15 +1867,34 @@ destringize_and_run (cpp_reader *pfile, const cpp_string *in, int count; const struct directive *save_directive; - dest = result = (char *) alloca (in->len - 1); - src = in->text + 1 + (in->text[0] == 'L'); - limit = in->text + in->len - 1; - while (src < limit) + /* If we were given a raw string literal, we don't need to destringize it, + but we do need to strip off the prefix and the suffix. */ + if (in->text[0] == 'R') { - /* We know there is a character following the backslash. */ - if (*src == '\\' && (src[1] == '\\' || src[1] == '"')) - src++; - *dest++ = *src++; + cpp_string buf; + const bool ok + = cpp_interpret_string_notranslate (pfile, in, 1, &buf, CPP_STRING); + gcc_assert (ok); + result = (uchar *) alloca (buf.len); + + /* (Terminating null will be replaced by a newline shortly.) */ + memcpy (result, buf.text, buf.len - 1); + dest = result + (buf.len - 1); + XDELETEVEC (buf.text); + } + else + { + const uchar *src, *limit; + dest = result = (uchar *) alloca (in->len - 1); + src = in->text + 1 + (in->text[0] == 'L'); + limit = in->text + in->len - 1; + while (src < limit) + { + /* We know there is a character following the backslash. */ + if (*src == '\\' && (src[1] == '\\' || src[1] == '"')) + src++; + *dest++ = *src++; + } } *dest = '\n'; @@ -1896,9 +1914,10 @@ destringize_and_run (cpp_reader *pfile, const cpp_string *in, /* Inline run_directive, since we need to delay the _cpp_pop_buffer until we've read all of the tokens that we want. */ - cpp_push_buffer (pfile, (const uchar *) result, dest - result, - /* from_stage3 */ true); - /* ??? Antique Disgusting Hack. What does this do? */ + cpp_push_buffer (pfile, result, dest - result, /* from_stage3 */ true); + + /* This is needed to make _Pragma("once") work correctly, as it needs + pfile->buffer->file to be set to the current source file. */ if (pfile->buffer->prev) pfile->buffer->file = pfile->buffer->prev->file; diff --git a/libcpp/lex.cc b/libcpp/lex.cc index f891d3e17df..73e239b2a01 100644 --- a/libcpp/lex.cc +++ b/libcpp/lex.cc @@ -1073,6 +1073,9 @@ _cpp_clean_line (cpp_reader *pfile) buffer->next_line = s + 1; } +template +static bool get_fresh_line_impl (cpp_reader *pfile); + /* Return true if the trigraph indicated by NOTE should be warned about in a comment. */ static bool @@ -2489,9 +2492,8 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) { pos--; pfile->buffer->cur = pos; - if (pfile->state.in_directive - || (pfile->state.parsing_args - && pfile->buffer->next_line >= pfile->buffer->rlimit)) + if ((pfile->state.in_directive || pfile->state.parsing_args) + && pfile->buffer->next_line >= pfile->buffer->rlimit) { cpp_error_with_line (pfile, CPP_DL_ERROR, token->src_loc, 0, "unterminated raw string"); @@ -2506,7 +2508,7 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) CPP_INCREMENT_LINE (pfile, 0); pfile->buffer->need_line = true; - if (!_cpp_get_fresh_line (pfile)) + if (!get_fresh_line_impl (pfile)) { /* We ran out of file and failed to get a line. */ location_t src_loc = token->src_loc; @@ -2518,8 +2520,15 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base) _cpp_release_buff (pfile, accum.first); cpp_error_with_line (pfile, CPP_DL_ERROR, src_loc, 0, "unterminated raw string"); - /* Now pop the buffer that _cpp_get_fresh_line did not. */ + + /* Now pop the buffer that get_fresh_line_impl() did not. Popping + is not safe if processing a directive, however this cannot + happen as we already checked above that a line would be + available, and get_fresh_line_impl() can't fail in this + case. */ + gcc_assert (!pfile->state.in_directive); _cpp_pop_buffer (pfile); + return; } @@ -3453,11 +3462,14 @@ _cpp_lex_token (cpp_reader *pfile) } /* Returns true if a fresh line has been loaded. */ -bool -_cpp_get_fresh_line (cpp_reader *pfile) +template +static bool +get_fresh_line_impl (cpp_reader *pfile) { - /* We can't get a new line until we leave the current directive. */ - if (pfile->state.in_directive) + /* We can't get a new line until we leave the current directive, unless we + are lexing a raw string, in which case it will be OK as long as we don't + pop the current buffer. */ + if (!lexing_raw_string && pfile->state.in_directive) return false; for (;;) @@ -3473,6 +3485,10 @@ _cpp_get_fresh_line (cpp_reader *pfile) return true; } + /* We can't change buffers until we leave the current directive. */ + if (lexing_raw_string && pfile->state.in_directive) + return false; + /* First, get out of parsing arguments state. */ if (pfile->state.parsing_args) return false; @@ -3500,6 +3516,13 @@ _cpp_get_fresh_line (cpp_reader *pfile) } } +bool +_cpp_get_fresh_line (cpp_reader *pfile) +{ + return get_fresh_line_impl (pfile); +} + + #define IF_NEXT_IS(CHAR, THEN_TYPE, ELSE_TYPE) \ do \ { \