From patchwork Fri Jul  8 01:56:14 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Lewis Hyatt <lhyatt@gmail.com>
X-Patchwork-Id: 55867
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id B89863858293
	for <patchwork@sourceware.org>; Fri,  8 Jul 2022 01:56:49 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B89863858293
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1657245409;
	bh=G6M3dzQqs4s8bJAUftfQ5CBrcztb78YOHZ4Ys+9Nrxw=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=nPl1PpjX1Y1nMK+TE3ILNEMdMx87XTS4iNk2Jba7SOfB4MiuGBvl+3VJ9pYQ+p4dR
	 KejdyNQHLmmAfH2/sIvDHePUOuOgK0YsQNDC+MT0VO6kzf4cejhOJW0B/pSSgoilOL
	 Y/w+vLVv5dQclqKIntuKv4+11I5lp+uRtma381WM=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com
 [IPv6:2607:f8b0:4864:20::72e])
 by sourceware.org (Postfix) with ESMTPS id C2332385843E
 for <gcc-patches@gcc.gnu.org>; Fri,  8 Jul 2022 01:56:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C2332385843E
Received: by mail-qk1-x72e.google.com with SMTP id g1so14775786qkl.9
 for <gcc-patches@gcc.gnu.org>; Thu, 07 Jul 2022 18:56:18 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version
 :content-disposition:user-agent;
 bh=G6M3dzQqs4s8bJAUftfQ5CBrcztb78YOHZ4Ys+9Nrxw=;
 b=soIEhNqpB5/tcwm/Ei+zzViP7rvj4nhQEJ+n+dpa186d4169eHXoG1S91z2jC31Feg
 WYREpiRZSH9u9WbUeYMrtF4+vsUyXQ7EprClIIGPyD+lEWR0U/3jCC+Yd7LdiJiB+Htq
 rHwt1BSW3TAlo0y/wlV9yB3FeXnXG+O8uKfXYX3jp60iODDRazFYshj2NRKUWhB6Rznx
 Vk3CRJXdD8the0oRFPpJY8QMskSS9O9YbIQy6fag2hDyoe9SPhZH0Q0Gz7k4gcaAoBv2
 AbcpYPAwmS53qcl0CFMz+eFLewmDLBGv1IGlLhQ5fOLi2nq4qTuLz2wI2qzHRweLw/yI
 mwNw==
X-Gm-Message-State: AJIora86TJ3i+9Jk8boCWUtjmq1U586W/rvE40yWPLNKIJ0iwn1YgrTe
 1GHtkXFQybIIH2Z8HfRGr981+lWXUFY=
X-Google-Smtp-Source: 
 AGRyM1uNN9OBMYqeRPyUtVagm7qthaE0vJBrFhbrjffs6obNuVmovSn6VEG5n0ozXJjzG3KszERrzw==
X-Received: by 2002:a05:620a:2a15:b0:6af:4404:99ce with SMTP id
 o21-20020a05620a2a1500b006af440499cemr725547qkp.226.1657245377760;
 Thu, 07 Jul 2022 18:56:17 -0700 (PDT)
Received: from ldh-imac.local (pool-173-70-35-242.nwrknj.fios.verizon.net.
 [173.70.35.242]) by smtp.gmail.com with ESMTPSA id
 bt5-20020ac86905000000b0031bf68f502esm18958044qtb.59.2022.07.07.18.56.16
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 07 Jul 2022 18:56:16 -0700 (PDT)
Date: Thu, 7 Jul 2022 21:56:14 -0400
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] diagnostics: Make line-ending logic consistent with libcpp
 [PR91733]
Message-ID: <20220708015614.GA90703@ldh-imac.local>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: Mutt/1.12.1 (2019-06-15)
X-Spam-Status: No, score=-3039.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Lewis Hyatt <lhyatt@gmail.com>
Reply-To: Lewis Hyatt <lhyatt@gmail.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hello-

The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91733) points out that,
while libcpp recognizes a lone '\r' as a valid line-ending character, the
infrastructure that obtains source lines to be printed in diagnostics does
not, and hence diagnostics do not output the intended portion of a source
file that uses such line endings. The PR's author suggests that libcpp
should stop accepting '\r' line endings, but that seems rather controversial
and not likely to change. Fixing the diagnostics is easy enough though, and
that's done by the attached patch. Please let me know if it looks OK,
thanks! bootstrap + regtest all languages looks good, with just new PASSes
for the new testcase.

FAIL 103 103
PASS 543592 543627
UNSUPPORTED 15298 15298
UNTESTED 136 136
XFAIL 4130 4130
XPASS 20 20


-Lewis
[PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733]

libcpp recognizes a lone \r as a valid line ending, so the infrastructure
for retrieving source lines to be output in diagnostics needs to do the
same. This patch fixes file_cache_slot::get_next_line() accordingly so that
diagnostics display the correct part of the source when \r line endings are in
use.

gcc/ChangeLog:

	PR preprocessor/91733
	* input.cc (find_end_of_line): New helper function.
	(file_cache_slot::get_next_line): Recognize \r as a line ending.
	* diagnostic-show-locus.cc (test_escaping_bytes_1): Adapt selftest
	since \r will now be interpreted as a line-ending.

gcc/testsuite/ChangeLog:

	PR preprocessor/91733
	* c-c++-common/pr91733.c: New test.

+{ dg-end-multiline-output "test4" } */

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 6eafe19785f..d267d2c258d 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -5508,7 +5508,7 @@ test_tab_expansion (const line_table_case &case_)
 static void
 test_escaping_bytes_1 (const line_table_case &case_)
 {
-  const char content[] = "before\0\1\2\3\r\x80\xff""after\n";
+  const char content[] = "before\0\1\2\3\v\x80\xff""after\n";
   const size_t sz = sizeof (content);
   temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz);
   line_table_test ltt (case_);
@@ -5523,18 +5523,18 @@ test_escaping_bytes_1 (const line_table_case &case_)
   if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
 
-  /* Locations of the NUL and \r bytes.  */
+  /* Locations of the NUL and \v bytes.  */
   location_t nul_loc
     = linemap_position_for_line_and_column (line_table, ord_map, 1, 7);
-  location_t r_loc
+  location_t v_loc
     = linemap_position_for_line_and_column (line_table, ord_map, 1, 11);
   gcc_rich_location richloc (nul_loc);
-  richloc.add_range (r_loc);
+  richloc.add_range (v_loc);
 
   {
     test_diagnostic_context dc;
     diagnostic_show_locus (&dc, &richloc, DK_ERROR);
-    ASSERT_STREQ (" before \1\2\3 \x80\xff""after\n"
+    ASSERT_STREQ (" before \1\2\3\v\x80\xff""after\n"
 		  "       ^   ~\n",
 		  pp_formatted_text (dc.printer));
   }
@@ -5544,7 +5544,7 @@ test_escaping_bytes_1 (const line_table_case &case_)
     dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_UNICODE;
     diagnostic_show_locus (&dc, &richloc, DK_ERROR);
     ASSERT_STREQ
-      (" before<U+0000><U+0001><U+0002><U+0003><U+000D><80><ff>after\n"
+      (" before<U+0000><U+0001><U+0002><U+0003><U+000B><80><ff>after\n"
        "       ^~~~~~~~                        ~~~~~~~~\n",
        pp_formatted_text (dc.printer));
   }
@@ -5552,7 +5552,7 @@ test_escaping_bytes_1 (const line_table_case &case_)
     test_diagnostic_context dc;
     dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_BYTES;
     diagnostic_show_locus (&dc, &richloc, DK_ERROR);
-    ASSERT_STREQ (" before<00><01><02><03><0d><80><ff>after\n"
+    ASSERT_STREQ (" before<00><01><02><03><0b><80><ff>after\n"
 		  "       ^~~~            ~~~~\n",
 		  pp_formatted_text (dc.printer));
   }
diff --git a/gcc/input.cc b/gcc/input.cc
index 2acbfdea4f8..060ca160126 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -646,6 +646,37 @@ file_cache_slot::maybe_read_data ()
   return read_data ();
 }
 
+/* Helper function for file_cache_slot::get_next_line (), to find the end of
+   the next line.  Returns with the memchr convention, i.e. nullptr if a line
+   terminator was not found.  We need to determine line endings in the same
+   manner that libcpp does: any of \n, \r\n, or \r is a line ending.  */
+
+static char *
+find_end_of_line (char *s, size_t len)
+{
+  for (const auto end = s + len; s != end; ++s)
+    {
+      if (*s == '\n')
+	return s;
+      if (*s == '\r')
+	{
+	  const auto next = s + 1;
+	  if (next == end)
+	    {
+	      /* Don't find the line ending if \r is the very last character
+		 in the buffer; we do not know if it's the end of the file or
+		 just the end of what has been read so far, and we wouldn't
+		 want to break in the middle of what's actually a \r\n
+		 sequence.  Instead, we will handle the case of a file ending
+		 in a \r later.  */
+	      break;
+	    }
+	  return (*next == '\n' ? next : s);
+	}
+    }
+  return nullptr;
+}
+
 /* Read a new line from file FP, using C as a cache for the data
    coming from the file.  Upon successful completion, *LINE is set to
    the beginning of the line found.  *LINE points directly in the
@@ -671,17 +702,16 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
 
   char *next_line_start = NULL;
   size_t len = 0;
-  char *line_end = (char *) memchr (line_start, '\n', remaining_size);
+  char *line_end = find_end_of_line (line_start, remaining_size);
   if (line_end == NULL)
     {
-      /* We haven't found the end-of-line delimiter in the cache.
-	 Fill the cache with more data from the file and look for the
-	 '\n'.  */
+      /* We haven't found an end-of-line delimiter in the cache.
+	 Fill the cache with more data from the file and look again.  */
       while (maybe_read_data ())
 	{
 	  line_start = m_data + m_line_start_idx;
 	  remaining_size = m_nb_read - m_line_start_idx;
-	  line_end = (char *) memchr (line_start, '\n', remaining_size);
+	  line_end = find_end_of_line (line_start, remaining_size);
 	  if (line_end != NULL)
 	    {
 	      next_line_start = line_end + 1;
@@ -690,14 +720,22 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
 	}
       if (line_end == NULL)
 	{
-	  /* We've loadded all the file into the cache and still no
-	     '\n'.  Let's say the line ends up at one byte passed the
+	  /* We've loaded all the file into the cache and still no
+	     terminator.  Let's say the line ends up at one byte past the
 	     end of the file.  This is to stay consistent with the case
-	     of when the line ends up with a '\n' and line_end points to
-	     that terminal '\n'.  That consistency is useful below in
-	     the len calculation.  */
-	  line_end = m_data + m_nb_read ;
-	  m_missing_trailing_newline = true;
+	     of when the line ends up with a terminator and line_end points to
+	     that.  That consistency is useful below in the len calculation.
+
+	     If the file ends in a \r, we didn't identify it as a line
+	     terminator above, so do that now instead.  */
+	  line_end = m_data + m_nb_read;
+	  if (m_nb_read && line_end[-1] == '\r')
+	    {
+	      --line_end;
+	      m_missing_trailing_newline = false;
+	    }
+	  else
+	    m_missing_trailing_newline = true;
 	}
       else
 	m_missing_trailing_newline = false;
@@ -711,9 +749,8 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
   if (m_fp && ferror (m_fp))
     return false;
 
-  /* At this point, we've found the end of the of line.  It either
-     points to the '\n' or to one byte after the last byte of the
-     file.  */
+  /* At this point, we've found the end of the of line.  It either points to
+     the line terminator or to one byte after the last byte of the file.  */
   gcc_assert (line_end != NULL);
 
   len = line_end - line_start;
diff --git a/gcc/testsuite/c-c++-common/pr91733.c b/gcc/testsuite/c-c++-common/pr91733.c
new file mode 100644
index 00000000000..1539bb4f386
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr91733.c
@@ -0,0 +1,17 @@
+/* { dg-do preprocess } */
+/* { dg-additional-options "-fdiagnostics-show-caret" } */
+
+const char *s = "
";
+
+/* { dg-warning "missing terminating \"" "test1" { target *-*-* } 4 } */
+/* { dg-warning "missing terminating \"" "test2" { target *-*-* } 5 } */
+
+/* { dg-begin-multiline-output "test3" }
+ const char *s = "
+                 ^
+{ dg-end-multiline-output "test3" } */
+
+/* { dg-begin-multiline-output "test4" }
+ ";
+ ^