[0/6] diagnostics: libcpp: Overhaul locations for _Pragma tokens

Message ID cover.1667514153.git.lhyatt@gmail.com
Headers
Series diagnostics: libcpp: Overhaul locations for _Pragma tokens |

Message

Lewis Hyatt Nov. 4, 2022, 1:44 p.m. UTC
  Hello-

In the past couple years there has been a ton of progress in fixing bugs
related to _Pragma, especially its use in the type of macros that many
projects like to implement for manipulating GCC diagnostic pragmas more
easily. For GCC 13 I have been going through the remaining open PRs, fixing a
couple and adding testcases for several that were already fixed. I felt that
made it a good time to overhaul one of the last remaining issues with _Pragma
processing, which is that we do not currently assign good locations to the
tokens involved. The locations are very important, however, because that is
how GCC diagnostic pragmas will ultimately determine whether a given warning
should or should not apply at a given point. Currently, the tokens inside a
_Pragma string are all assigned the same location as the _Pragma token itself,
which is sufficient to make diagnostic pragmas work correctly. It does produce
somewhat inferior diagnostics, though, since we do not point the user to which
part of the _Pragma string caused the problem; and if the _Pragma string was
expanded from a macro, we do not even point them to the string at all.

Further, the assignment of the fake location to the tokens inside the _Pragma
string takes place after all the tokens have been lexed -- consequently, if a
diagnostic is issued by libcpp during that process, it doesn't benefit from the
patched-up location and instead uses a bogus location. As a quick example,
compiling:

    =====
    _Pragma("GCC diagnostic ignored \"oops")
    =====

produces:

    =====
    file:1:24: warning: missing terminating " character
        1 | _Pragma("GCC diagnostic ignored \"oops")
          |                        ^
    =====

It is surprisingly involved to make that caret point to something
reasonable. The reason it points to the middle of nowhere is that the current
implementation of _Pragma in directives.cc:destringize_and_run() does not
touch the line_maps instance at all, and so does not inform it where the
tokens are coming from. But the line_maps API in fact does not provide any way
to handle this case, so this needs to be added first. With all the changes in
this patch set, we would output instead:

    ======
    In buffer generated from file:1:
    <generated>:1:24: warning: missing terminating " character
        1 | GCC diagnostic ignored "oops
          |                        ^
    file:1:1: note: in <_Pragma directive>
        1 | _Pragma("GCC diagnostic ignored \"oops")
          | ^~~~~~~
    ======

Treating the _Pragma like a macro expansion makes everything consistent and
solves a ton of problems; all the locations involved will just make sense from
the user's point of view.

Patches 1-3 are tiny bug fixes that I came across while working on the new
testcases. I was a bit surprised that #1 and #3 especially did not have PRs
open, but I guess these small glitches have gone unnoticed so far.

Patch 4 is the largest one. It adds a new reason=LC_GEN for ordinary line
maps. These maps are just like normal ones, except the file name pointer
points not to a file name, but to the actual data in memory instead. This is
how we can issue diagnostics for code that did not appear in the user's input,
such as the de-stringized _Pragma string. The changes needed in libcpp to
support this concept are pretty small and straightforward. Most of the changes
outside of libcpp are in input.cc and diagnostic-show-locus.cc, which need to
learn how to obtain code from LC_GEN maps, and also a lot of the changes are
in selftests that are pretty sensitive to the internal implementation.

Patch 5 is a continuation of 4 that supports LC_GEN maps in less commonly used
places, such as the new SARIF output format, that also need to know how to
read source back from in-memory buffers in addition to files.

Patch 6 updates the implementation of _Pragma handling to use LC_GEN maps and
to create virtual locations for the tokens as in the example above. I have
also added support for the argument of the _Pragma to be a raw string, as
requested by PR83473, since this was easy to do while I was there.

1/6: diagnostics: Fix macro tracking for ad-hoc locations
2/6: diagnostics: Use an inline function rather than hardcoding <built-in>
     string
3/6: libcpp: Fix paste error with unknown pragma after macro expansion
4/6: diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
5/6: diagnostics: Support generated data in additional contexts
6/6: diagnostics: libcpp: Assign real locations to the tokens inside
     _Pragma strings

Bootstrap and regtest all languages on x86-64 Linux looks good.

I realize it's near the end of stage 1 now. It would still be great and I
would appreciate very much if this patch could get reviewed please? For GCC 13,
there have been several _Pragma-related bugs fixed (especially PR53431), and
addressing this location issue would tie it together nicely. Thanks very much!

-Lewis