From patchwork Fri Jul 8 01:56:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 55867 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B89863858293 for ; Fri, 8 Jul 2022 01:56:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B89863858293 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1657245409; bh=G6M3dzQqs4s8bJAUftfQ5CBrcztb78YOHZ4Ys+9Nrxw=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=nPl1PpjX1Y1nMK+TE3ILNEMdMx87XTS4iNk2Jba7SOfB4MiuGBvl+3VJ9pYQ+p4dR KejdyNQHLmmAfH2/sIvDHePUOuOgK0YsQNDC+MT0VO6kzf4cejhOJW0B/pSSgoilOL Y/w+vLVv5dQclqKIntuKv4+11I5lp+uRtma381WM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by sourceware.org (Postfix) with ESMTPS id C2332385843E for ; Fri, 8 Jul 2022 01:56:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C2332385843E Received: by mail-qk1-x72e.google.com with SMTP id g1so14775786qkl.9 for ; Thu, 07 Jul 2022 18:56:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:user-agent; bh=G6M3dzQqs4s8bJAUftfQ5CBrcztb78YOHZ4Ys+9Nrxw=; b=soIEhNqpB5/tcwm/Ei+zzViP7rvj4nhQEJ+n+dpa186d4169eHXoG1S91z2jC31Feg WYREpiRZSH9u9WbUeYMrtF4+vsUyXQ7EprClIIGPyD+lEWR0U/3jCC+Yd7LdiJiB+Htq rHwt1BSW3TAlo0y/wlV9yB3FeXnXG+O8uKfXYX3jp60iODDRazFYshj2NRKUWhB6Rznx Vk3CRJXdD8the0oRFPpJY8QMskSS9O9YbIQy6fag2hDyoe9SPhZH0Q0Gz7k4gcaAoBv2 AbcpYPAwmS53qcl0CFMz+eFLewmDLBGv1IGlLhQ5fOLi2nq4qTuLz2wI2qzHRweLw/yI mwNw== X-Gm-Message-State: AJIora86TJ3i+9Jk8boCWUtjmq1U586W/rvE40yWPLNKIJ0iwn1YgrTe 1GHtkXFQybIIH2Z8HfRGr981+lWXUFY= X-Google-Smtp-Source: AGRyM1uNN9OBMYqeRPyUtVagm7qthaE0vJBrFhbrjffs6obNuVmovSn6VEG5n0ozXJjzG3KszERrzw== X-Received: by 2002:a05:620a:2a15:b0:6af:4404:99ce with SMTP id o21-20020a05620a2a1500b006af440499cemr725547qkp.226.1657245377760; Thu, 07 Jul 2022 18:56:17 -0700 (PDT) Received: from ldh-imac.local (pool-173-70-35-242.nwrknj.fios.verizon.net. [173.70.35.242]) by smtp.gmail.com with ESMTPSA id bt5-20020ac86905000000b0031bf68f502esm18958044qtb.59.2022.07.07.18.56.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Jul 2022 18:56:16 -0700 (PDT) Date: Thu, 7 Jul 2022 21:56:14 -0400 To: gcc-patches@gcc.gnu.org Subject: [PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733] Message-ID: <20220708015614.GA90703@ldh-imac.local> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.12.1 (2019-06-15) X-Spam-Status: No, score=-3039.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lewis Hyatt via Gcc-patches From: Lewis Hyatt Reply-To: Lewis Hyatt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hello- The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91733) points out that, while libcpp recognizes a lone '\r' as a valid line-ending character, the infrastructure that obtains source lines to be printed in diagnostics does not, and hence diagnostics do not output the intended portion of a source file that uses such line endings. The PR's author suggests that libcpp should stop accepting '\r' line endings, but that seems rather controversial and not likely to change. Fixing the diagnostics is easy enough though, and that's done by the attached patch. Please let me know if it looks OK, thanks! bootstrap + regtest all languages looks good, with just new PASSes for the new testcase. FAIL 103 103 PASS 543592 543627 UNSUPPORTED 15298 15298 UNTESTED 136 136 XFAIL 4130 4130 XPASS 20 20 -Lewis [PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733] libcpp recognizes a lone \r as a valid line ending, so the infrastructure for retrieving source lines to be output in diagnostics needs to do the same. This patch fixes file_cache_slot::get_next_line() accordingly so that diagnostics display the correct part of the source when \r line endings are in use. gcc/ChangeLog: PR preprocessor/91733 * input.cc (find_end_of_line): New helper function. (file_cache_slot::get_next_line): Recognize \r as a line ending. * diagnostic-show-locus.cc (test_escaping_bytes_1): Adapt selftest since \r will now be interpreted as a line-ending. gcc/testsuite/ChangeLog: PR preprocessor/91733 * c-c++-common/pr91733.c: New test. +{ dg-end-multiline-output "test4" } */ diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc index 6eafe19785f..d267d2c258d 100644 --- a/gcc/diagnostic-show-locus.cc +++ b/gcc/diagnostic-show-locus.cc @@ -5508,7 +5508,7 @@ test_tab_expansion (const line_table_case &case_) static void test_escaping_bytes_1 (const line_table_case &case_) { - const char content[] = "before\0\1\2\3\r\x80\xff""after\n"; + const char content[] = "before\0\1\2\3\v\x80\xff""after\n"; const size_t sz = sizeof (content); temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz); line_table_test ltt (case_); @@ -5523,18 +5523,18 @@ test_escaping_bytes_1 (const line_table_case &case_) if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS) return; - /* Locations of the NUL and \r bytes. */ + /* Locations of the NUL and \v bytes. */ location_t nul_loc = linemap_position_for_line_and_column (line_table, ord_map, 1, 7); - location_t r_loc + location_t v_loc = linemap_position_for_line_and_column (line_table, ord_map, 1, 11); gcc_rich_location richloc (nul_loc); - richloc.add_range (r_loc); + richloc.add_range (v_loc); { test_diagnostic_context dc; diagnostic_show_locus (&dc, &richloc, DK_ERROR); - ASSERT_STREQ (" before \1\2\3 \x80\xff""after\n" + ASSERT_STREQ (" before \1\2\3\v\x80\xff""after\n" " ^ ~\n", pp_formatted_text (dc.printer)); } @@ -5544,7 +5544,7 @@ test_escaping_bytes_1 (const line_table_case &case_) dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_UNICODE; diagnostic_show_locus (&dc, &richloc, DK_ERROR); ASSERT_STREQ - (" before<80>after\n" + (" before<80>after\n" " ^~~~~~~~ ~~~~~~~~\n", pp_formatted_text (dc.printer)); } @@ -5552,7 +5552,7 @@ test_escaping_bytes_1 (const line_table_case &case_) test_diagnostic_context dc; dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_BYTES; diagnostic_show_locus (&dc, &richloc, DK_ERROR); - ASSERT_STREQ (" before<00><01><02><03><0d><80>after\n" + ASSERT_STREQ (" before<00><01><02><03><0b><80>after\n" " ^~~~ ~~~~\n", pp_formatted_text (dc.printer)); } diff --git a/gcc/input.cc b/gcc/input.cc index 2acbfdea4f8..060ca160126 100644 --- a/gcc/input.cc +++ b/gcc/input.cc @@ -646,6 +646,37 @@ file_cache_slot::maybe_read_data () return read_data (); } +/* Helper function for file_cache_slot::get_next_line (), to find the end of + the next line. Returns with the memchr convention, i.e. nullptr if a line + terminator was not found. We need to determine line endings in the same + manner that libcpp does: any of \n, \r\n, or \r is a line ending. */ + +static char * +find_end_of_line (char *s, size_t len) +{ + for (const auto end = s + len; s != end; ++s) + { + if (*s == '\n') + return s; + if (*s == '\r') + { + const auto next = s + 1; + if (next == end) + { + /* Don't find the line ending if \r is the very last character + in the buffer; we do not know if it's the end of the file or + just the end of what has been read so far, and we wouldn't + want to break in the middle of what's actually a \r\n + sequence. Instead, we will handle the case of a file ending + in a \r later. */ + break; + } + return (*next == '\n' ? next : s); + } + } + return nullptr; +} + /* Read a new line from file FP, using C as a cache for the data coming from the file. Upon successful completion, *LINE is set to the beginning of the line found. *LINE points directly in the @@ -671,17 +702,16 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len) char *next_line_start = NULL; size_t len = 0; - char *line_end = (char *) memchr (line_start, '\n', remaining_size); + char *line_end = find_end_of_line (line_start, remaining_size); if (line_end == NULL) { - /* We haven't found the end-of-line delimiter in the cache. - Fill the cache with more data from the file and look for the - '\n'. */ + /* We haven't found an end-of-line delimiter in the cache. + Fill the cache with more data from the file and look again. */ while (maybe_read_data ()) { line_start = m_data + m_line_start_idx; remaining_size = m_nb_read - m_line_start_idx; - line_end = (char *) memchr (line_start, '\n', remaining_size); + line_end = find_end_of_line (line_start, remaining_size); if (line_end != NULL) { next_line_start = line_end + 1; @@ -690,14 +720,22 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len) } if (line_end == NULL) { - /* We've loadded all the file into the cache and still no - '\n'. Let's say the line ends up at one byte passed the + /* We've loaded all the file into the cache and still no + terminator. Let's say the line ends up at one byte past the end of the file. This is to stay consistent with the case - of when the line ends up with a '\n' and line_end points to - that terminal '\n'. That consistency is useful below in - the len calculation. */ - line_end = m_data + m_nb_read ; - m_missing_trailing_newline = true; + of when the line ends up with a terminator and line_end points to + that. That consistency is useful below in the len calculation. + + If the file ends in a \r, we didn't identify it as a line + terminator above, so do that now instead. */ + line_end = m_data + m_nb_read; + if (m_nb_read && line_end[-1] == '\r') + { + --line_end; + m_missing_trailing_newline = false; + } + else + m_missing_trailing_newline = true; } else m_missing_trailing_newline = false; @@ -711,9 +749,8 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len) if (m_fp && ferror (m_fp)) return false; - /* At this point, we've found the end of the of line. It either - points to the '\n' or to one byte after the last byte of the - file. */ + /* At this point, we've found the end of the of line. It either points to + the line terminator or to one byte after the last byte of the file. */ gcc_assert (line_end != NULL); len = line_end - line_start; diff --git a/gcc/testsuite/c-c++-common/pr91733.c b/gcc/testsuite/c-c++-common/pr91733.c new file mode 100644 index 00000000000..1539bb4f386 --- /dev/null +++ b/gcc/testsuite/c-c++-common/pr91733.c @@ -0,0 +1,17 @@ +/* { dg-do preprocess } */ +/* { dg-additional-options "-fdiagnostics-show-caret" } */ + +const char *s = " "; + +/* { dg-warning "missing terminating \"" "test1" { target *-*-* } 4 } */ +/* { dg-warning "missing terminating \"" "test2" { target *-*-* } 5 } */ + +/* { dg-begin-multiline-output "test3" } + const char *s = " + ^ +{ dg-end-multiline-output "test3" } */ + +/* { dg-begin-multiline-output "test4" } + "; + ^