[0/3] Add diagram support to gcc diagnostics

Message ID 20230531180630.3127108-1-dmalcolm@redhat.com
Headers
Series Add diagram support to gcc diagnostics |

Message

David Malcolm May 31, 2023, 6:06 p.m. UTC
  Existing diagnostic text output in GCC has to be implemented by writing
sequentially to a pretty_printer instance.  This makes it hard to
implement some kinds of diagnostic output (see e.g.
diagnostic-show-locus.cc, which is reaching the limits of
maintainability).

I've posted various experimental patches over the years that add other
kinds of output to GCC, such as ASCII art:
- "rich vectorization hints":
  - https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01576.html
- visualizations of -Wformat-overflow:
  - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77696 comment 9 onwards
  - https://gcc.gnu.org/legacy-ml/gcc-patches/2018-09/msg00771.html

This patch kit combines the above ideas.  It:
- adds more flexible ways to create diagnostic output:
  - a canvas class, which can be "painted" to via random-access (rather
    than sequentially), and then printed when the painting is complete.
    A formatted pretty_printer can be roundtripped to a canvas and back,
    preserving formatting data (colors and URLs)
  - a table class for 2D grid layout, supporting items that span multiple
    rows/columns
  - a widget class for organizing diagrams hierarchically and painting
  them to a canvas
- expands GCC's diagnostics subsystem so that diagnostics can have
  "text art" diagrams - think ASCII art, but potentially including some
  Unicode characters, such as box-drawing chars (by using the canvas
  class)
- uses this to implement visualizations of -Wanalyzer-out-of-bounds so
  that, where possible, it will emit a text art diagram visualizing the
  spatial relationship between (a) the memory region that the analyzer
  predicts would be accessed, versus (b) the range of memory that is
  valid to access - whether they overlap, are touching, are close or far
  apart; which one is before or after in memory, the relative sizes
  involved, the direction of the access (read vs write), and, in some
  cases, the values of data involved.

The new code is in a new "gcc/text-art" subdirectory and "text_art"
namespace.

Many examples of the visualizations can be seen in patch 3 of the kit;
here are two examples; given:

  int32_t arr[10];

  int32_t int_arr_read_element_before_start_far(void)
  {
    return arr[-100];
  }

it emits:

demo-1.c: In function ‘int_arr_read_element_before_start_far’:
demo-1.c:7:13: warning: buffer under-read [CWE-127] [-Wanalyzer-out-of-bounds]
    7 |   return arr[-100];
      |          ~~~^~~~~~
  ‘int_arr_read_element_before_start_far’: event 1
    |
    |    7 |   return arr[-100];
    |      |          ~~~^~~~~~
    |      |             |
    |      |             (1) out-of-bounds read from byte -400 till byte -397 but ‘arr’ starts at byte 0
    |
demo-1.c:7:13: note: valid subscripts for ‘arr’ are ‘[0]’ to ‘[9]’

  ┌───────────────────────────┐
  │read of ‘int32_t’ (4 bytes)│
  └───────────────────────────┘
                ^
                │
                │
  ┌───────────────────────────┐              ┌────────┬────────┬─────────┐
  │                           │              │  [0]   │  ...   │   [9]   │
  │    before valid range     │              ├────────┴────────┴─────────┤
  │                           │              │‘arr’ (type: ‘int32_t[10]’)│
  └───────────────────────────┘              └───────────────────────────┘
  ├─────────────┬─────────────┤├─────┬──────┤├─────────────┬─────────────┤
                │                    │                     │
   ╭────────────┴───────────╮   ╭────┴────╮        ╭───────┴──────╮
   │⚠️  under-read of 4 bytes│   │396 bytes│        │size: 40 bytes│
   ╰────────────────────────╯   ╰─────────╯        ╰──────────────╯

and given:

  #include <string.h>

  void
  test_non_ascii ()
  {
    char buf[5];
    strcpy (buf, "文字化け");
  }

it emits:

demo-2.c: In function ‘test_non_ascii’:
demo-2.c:7:3: warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds]
    7 |   strcpy (buf, "文字化け");
      |   ^~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_non_ascii’: events 1-2
    |
    |    6 |   char buf[5];
    |      |        ^~~
    |      |        |
    |      |        (1) capacity: 5 bytes
    |    7 |   strcpy (buf, "文字化け");
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (2) out-of-bounds write from byte 5 till byte 12 but ‘buf’ ends at byte 5
    |
demo-2.c:7:3: note: write of 8 bytes to beyond the end of ‘buf’
    7 |   strcpy (buf, "文字化け");
      |   ^~~~~~~~~~~~~~~~~~~~~~~~
demo-2.c:7:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[4]’

  ┌─────┬─────┬─────┬────┬────┐┌────┬────┬────┬────┬────┬────┬────┬──────┐
  │ [0] │ [1] │ [2] │[3] │[4] ││[5] │[6] │[7] │[8] │[9] │[10]│[11]│ [12] │
  ├─────┼─────┼─────┼────┼────┤├────┼────┼────┼────┼────┼────┼────┼──────┤
  │0xe6 │0x96 │0x87 │0xe5│0xad││0x97│0xe5│0x8c│0x96│0xe3│0x81│0x91│ 0x00 │
  ├─────┴─────┴─────┼────┴────┴┴────┼────┴────┴────┼────┴────┴────┼──────┤
  │     U+6587      │    U+5b57     │    U+5316    │    U+3051    │U+0000│
  ├─────────────────┼───────────────┼──────────────┼──────────────┼──────┤
  │       文        │      字       │      化      │      け      │ NUL  │
  ├─────────────────┴───────────────┴──────────────┴──────────────┴──────┤
  │                  string literal (type: ‘char[13]’)                   │
  └──────────────────────────────────────────────────────────────────────┘
     │     │     │    │    │     │    │    │    │    │    │    │     │
     │     │     │    │    │     │    │    │    │    │    │    │     │
     v     v     v    v    v     v    v    v    v    v    v    v     v
  ┌─────┬────────────────┬────┐┌─────────────────────────────────────────┐
  │ [0] │      ...       │[4] ││                                         │
  ├─────┴────────────────┴────┤│            after valid range            │
  │  ‘buf’ (type: ‘char[5]’)  ││                                         │
  └───────────────────────────┘└─────────────────────────────────────────┘
  ├─────────────┬─────────────┤├────────────────────┬────────────────────┤
                │                                   │
       ╭────────┴────────╮              ╭───────────┴──────────╮
       │capacity: 5 bytes│              │⚠️  overflow of 8 bytes│
       ╰─────────────────╯              ╰──────────────────────╯

showing that the overflow occurs partway through the UTF-8 encoding of
the U+5b57 code point.

It doesn't show up in this email, but the above diagrams are colorized
to constrast the valid and invalid access ranges.

There are lots more examples in the test suites of patches 2 and 3,
including symbolic expressions.

I can self-approve most of this but:
- patch 1 touches the testsuite for handling newlines in multiline
  strings in DejaGnu tests
- patches 2 and 3 add string literals with non-ASCII, encoded in UTF-8,
  for use in selftests.  Is this OK?

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, both with
gcc 4.8.5 and with gcc 10.3.1

Lightly tested with valgrind.

OK for trunk?


David Malcolm (3):
  testsuite: move handle-multiline-outputs to before check for blank
    lines
  diagnostics: add support for "text art" diagrams
  analyzer: add text-art visualizations of out-of-bounds accesses
    [PR106626]

 contrib/unicode/gen-box-drawing-chars.py      |   94 +
 contrib/unicode/gen-combining-chars.py        |   75 +
 contrib/unicode/gen-printable-chars.py        |   77 +
 gcc/Makefile.in                               |   12 +-
 gcc/analyzer/access-diagram.cc                | 2405 +++++++++++++++++
 gcc/analyzer/access-diagram.h                 |  165 ++
 gcc/analyzer/analyzer.h                       |   30 +
 gcc/analyzer/analyzer.opt                     |   20 +
 gcc/analyzer/bounds-checking.cc               |  270 +-
 gcc/analyzer/diagnostic-manager.cc            |    2 +-
 gcc/analyzer/engine.cc                        |    4 +-
 gcc/analyzer/infinite-recursion.cc            |    2 +-
 gcc/analyzer/kf-analyzer.cc                   |    2 +-
 gcc/analyzer/kf.cc                            |    6 +-
 gcc/analyzer/pending-diagnostic.h             |    2 +-
 gcc/analyzer/region-model-manager.cc          |   32 +-
 gcc/analyzer/region-model-manager.h           |    2 +-
 gcc/analyzer/region-model.cc                  |   52 +-
 gcc/analyzer/region-model.h                   |    4 +
 gcc/analyzer/region.cc                        |  369 ++-
 gcc/analyzer/region.h                         |    1 +
 gcc/analyzer/sm-fd.cc                         |   14 +-
 gcc/analyzer/sm-file.cc                       |    4 +-
 gcc/analyzer/sm-malloc.cc                     |   20 +-
 gcc/analyzer/sm-pattern-test.cc               |    2 +-
 gcc/analyzer/sm-sensitive.cc                  |    3 +-
 gcc/analyzer/sm-signal.cc                     |    2 +-
 gcc/analyzer/sm-taint.cc                      |   16 +-
 gcc/analyzer/store.cc                         |   11 +-
 gcc/analyzer/store.h                          |    9 +
 gcc/analyzer/varargs.cc                       |    8 +-
 gcc/color-macros.h                            |   16 +
 gcc/common.opt                                |   23 +
 gcc/configure                                 |    2 +-
 gcc/configure.ac                              |    2 +-
 gcc/diagnostic-diagram.h                      |   51 +
 gcc/diagnostic-format-json.cc                 |   10 +
 gcc/diagnostic-format-sarif.cc                |  106 +-
 gcc/diagnostic-text-art.h                     |   49 +
 gcc/diagnostic.cc                             |   72 +
 gcc/diagnostic.h                              |   21 +
 gcc/doc/invoke.texi                           |   40 +-
 gcc/gcc.cc                                    |    6 +
 gcc/opts-common.cc                            |    1 +
 gcc/opts.cc                                   |    6 +
 gcc/pretty-print.cc                           |   29 +
 gcc/pretty-print.h                            |    1 +
 gcc/selftest-run-tests.cc                     |    3 +
 .../c-c++-common/Wlogical-not-parentheses-2.c |    2 +
 gcc/testsuite/gcc.dg/analyzer/data-model-1.c  |    4 +-
 .../analyzer/malloc-macro-inline-events.c     |    5 -
 .../analyzer/out-of-bounds-diagram-1-ascii.c  |   55 +
 .../analyzer/out-of-bounds-diagram-1-debug.c  |   40 +
 .../analyzer/out-of-bounds-diagram-1-emoji.c  |   55 +
 .../analyzer/out-of-bounds-diagram-1-json.c   |   13 +
 .../analyzer/out-of-bounds-diagram-1-sarif.c  |   24 +
 .../out-of-bounds-diagram-1-unicode.c         |   55 +
 .../analyzer/out-of-bounds-diagram-10.c       |   29 +
 .../analyzer/out-of-bounds-diagram-11.c       |   82 +
 .../analyzer/out-of-bounds-diagram-12.c       |   54 +
 .../analyzer/out-of-bounds-diagram-13.c       |   43 +
 .../analyzer/out-of-bounds-diagram-14.c       |  110 +
 .../analyzer/out-of-bounds-diagram-15.c       |   42 +
 .../gcc.dg/analyzer/out-of-bounds-diagram-2.c |   30 +
 .../gcc.dg/analyzer/out-of-bounds-diagram-3.c |   45 +
 .../gcc.dg/analyzer/out-of-bounds-diagram-4.c |   45 +
 .../analyzer/out-of-bounds-diagram-5-ascii.c  |   40 +
 .../out-of-bounds-diagram-5-unicode.c         |   42 +
 .../gcc.dg/analyzer/out-of-bounds-diagram-6.c |  125 +
 .../gcc.dg/analyzer/out-of-bounds-diagram-7.c |   36 +
 .../gcc.dg/analyzer/out-of-bounds-diagram-8.c |   34 +
 .../gcc.dg/analyzer/out-of-bounds-diagram-9.c |   42 +
 .../gcc.dg/analyzer/pattern-test-2.c          |    4 +-
 gcc/testsuite/gcc.dg/missing-header-fixit-5.c |   10 +-
 .../gcc.dg/plugin/analyzer_gil_plugin.c       |    6 +-
 .../diagnostic-test-text-art-ascii-bw.c       |   57 +
 .../diagnostic-test-text-art-ascii-color.c    |   58 +
 .../plugin/diagnostic-test-text-art-none.c    |    5 +
 .../diagnostic-test-text-art-unicode-bw.c     |   58 +
 .../diagnostic-test-text-art-unicode-color.c  |   59 +
 .../plugin/diagnostic_plugin_test_text_art.c  |  257 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp        |    6 +
 gcc/testsuite/lib/gcc-dg.exp                  |    5 +
 gcc/testsuite/lib/multiline.exp               |    7 +-
 gcc/testsuite/lib/prune.exp                   |    7 -
 gcc/text-art/box-drawing-chars.inc            |   18 +
 gcc/text-art/box-drawing.cc                   |   72 +
 gcc/text-art/box-drawing.h                    |   32 +
 gcc/text-art/canvas.cc                        |  437 +++
 gcc/text-art/canvas.h                         |   74 +
 gcc/text-art/ruler.cc                         |  723 +++++
 gcc/text-art/ruler.h                          |  125 +
 gcc/text-art/selftests.cc                     |   77 +
 gcc/text-art/selftests.h                      |   60 +
 gcc/text-art/style.cc                         |  632 +++++
 gcc/text-art/styled-string.cc                 | 1107 ++++++++
 gcc/text-art/table.cc                         | 1272 +++++++++
 gcc/text-art/table.h                          |  262 ++
 gcc/text-art/theme.cc                         |  183 ++
 gcc/text-art/theme.h                          |  123 +
 gcc/text-art/types.h                          |  504 ++++
 gcc/text-art/widget.cc                        |  275 ++
 gcc/text-art/widget.h                         |  246 ++
 libcpp/charset.cc                             |   89 +-
 libcpp/combining-chars.inc                    |   68 +
 libcpp/include/cpplib.h                       |    3 +
 libcpp/printable-chars.inc                    |  231 ++
 107 files changed, 12163 insertions(+), 194 deletions(-)
 create mode 100755 contrib/unicode/gen-box-drawing-chars.py
 create mode 100755 contrib/unicode/gen-combining-chars.py
 create mode 100755 contrib/unicode/gen-printable-chars.py
 create mode 100644 gcc/analyzer/access-diagram.cc
 create mode 100644 gcc/analyzer/access-diagram.h
 create mode 100644 gcc/diagnostic-diagram.h
 create mode 100644 gcc/diagnostic-text-art.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-ascii.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-debug.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-json.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-sarif.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-10.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-11.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-12.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-14.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-15.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-7.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-8.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-9.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-ascii-bw.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-ascii-color.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-none.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-unicode-bw.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-unicode-color.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_text_art.c
 create mode 100644 gcc/text-art/box-drawing-chars.inc
 create mode 100644 gcc/text-art/box-drawing.cc
 create mode 100644 gcc/text-art/box-drawing.h
 create mode 100644 gcc/text-art/canvas.cc
 create mode 100644 gcc/text-art/canvas.h
 create mode 100644 gcc/text-art/ruler.cc
 create mode 100644 gcc/text-art/ruler.h
 create mode 100644 gcc/text-art/selftests.cc
 create mode 100644 gcc/text-art/selftests.h
 create mode 100644 gcc/text-art/style.cc
 create mode 100644 gcc/text-art/styled-string.cc
 create mode 100644 gcc/text-art/table.cc
 create mode 100644 gcc/text-art/table.h
 create mode 100644 gcc/text-art/theme.cc
 create mode 100644 gcc/text-art/theme.h
 create mode 100644 gcc/text-art/types.h
 create mode 100644 gcc/text-art/widget.cc
 create mode 100644 gcc/text-art/widget.h
 create mode 100644 libcpp/combining-chars.inc
 create mode 100644 libcpp/printable-chars.inc