mbox

[0/6] RFC: adding support to GCC for detecting trust boundaries

Message ID 20211113203732.2098220-1-dmalcolm@redhat.com
Headers

Message

David Malcolm Nov. 13, 2021, 8:37 p.m. UTC
  [Crossposting between gcc-patches@gcc.gnu.org and
linux-toolchains@vger.kernel.org; sorry about my lack of kernel
knowledge, in case of the following seems bogus]

I've been trying to turn my prototype from the LPC2021 session on
"Adding kernel-specific test coverage to GCC's -fanalyzer option"
( https://linuxplumbersconf.org/event/11/contributions/1076/ ) into
something that can go into GCC upstream without adding kernel-specific
special cases, or requiring a GCC plugin.  The prototype simply
specialcased "copy_from_user" and "copy_to_user" in GCC, which is
clearly not OK.

This GCC patch kit implements detection of "trust boundaries", aimed at
detection of "infoleaks" and of use of unsanitized attacker-controlled
values ("taint") in the Linux kernel.

For example, here's an infoleak diagnostic (using notes to
express what fields and padding within a struct have not been
initialized):

infoleak-CVE-2011-1078-2.c: In function ‘test_1’:
infoleak-CVE-2011-1078-2.c:28:9: warning: potential exposure of sensitive
  information by copying uninitialized data from stack across trust
  boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_1’: events 1-3
    |
    |   21 |         struct sco_conninfo cinfo;
    |      |                             ^~~~~
    |      |                             |
    |      |                             (1) region created on stack here
    |      |                             (2) capacity: 6 bytes
    |......
    |   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
    |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (3) uninitialized data copied from stack here
    |
infoleak-CVE-2011-1078-2.c:28:9: note: 1 byte is uninitialized
   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
infoleak-CVE-2011-1078-2.c:14:15: note: padding after field ‘dev_class’ is uninitialized (1 byte)
   14 |         __u8  dev_class[3];
      |               ^~~~~~~~~
infoleak-CVE-2011-1078-2.c:21:29: note: suggest forcing zero-initialization by providing a ‘{0}’ initializer
   21 |         struct sco_conninfo cinfo;
      |                             ^~~~~
      |                                   = {0}

I have to come up with a way of expressing trust boundaries in a way
that will be:
- acceptable to the GCC community (not be too kernel-specific), and
- useful to the Linux kernel community.

At LPC it was pointed out that the kernel already has various
annotations e.g. "__user" for different kinds of pointers, and that it
would be best to reuse those.


Approach 1: Custom Address Spaces
=================================

GCC's C frontend supports target-specific address spaces; see:
  https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
Quoting the N1275 draft of ISO/IEC DTR 18037:
  "Address space names are ordinary identifiers, sharing the same name
  space as variables and typedef names.  Any such names follow the same
  rules for scope as other ordinary identifiers (such as typedef names).
  An implementation may provide an implementation-defined set of
  intrinsic address spaces that are, in effect, predefined at the start
  of every translation unit.  The names of intrinsic address spaces must
  be reserved identifiers (beginning with an underscore and an uppercase
  letter or with two underscores).  An implementation may also
  optionally support a means for new address space names to be defined
  within a translation unit."

Patch 1a in the following patch kit for GCC implements such a means to
define new address spaces names in a translation unit, via a pragma:
  #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)

For example, the Linux kernel could perhaps write:

  #define __kernel
  #pragma GCC custom_address_space(__user)
  #pragma GCC custom_address_space(__iomem)
  #pragma GCC custom_address_space(__percpu)
  #pragma GCC custom_address_space(__rcu)

and thus the C frontend can complain about code that mismatches __user
and kernel pointers, e.g.:

custom-address-space-1.c: In function ‘test_argpass_to_p’:
custom-address-space-1.c:29:14: error: passing argument 1 of ‘accepts_p’
from pointer to non-enclosed address space
   29 |   accepts_p (p_user);
      |              ^~~~~~
custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
of type ‘__user void *’
   21 | extern void accepts_p (void *);
      |                        ^~~~~~
custom-address-space-1.c: In function ‘test_cast_k_to_u’:
custom-address-space-1.c:135:12: warning: cast to ‘__user’ address space
pointer from disjoint generic address space pointer
  135 |   p_user = (void __user *)p_kernel;
      |            ^

The patch doesn't yet maintain a good distinction between implicit
target-specific address spaces and user-defined address spaces, has at
least one known major bug, and has only been lightly tested.  I can
fix these issues, but was hoping for feedback that this approach is the
right direction from both the GCC and Linux development communities.

Implementation status: doesn't yet bootstrap; am running into stage2
vs stage3 comparison issues.


Approach 2: An "untrusted" attribute
====================================

Alternatively, patch 1b in the kit implements:

  __attribute__((untrusted))

which can be applied to types as a qualifier (similarly to const,
volatile, etc) to mark a trust boundary, hence the kernel could have:

  #define __user __attribute__((untrusted))

where my patched GCC treats
  T *
vs 
  T __attribute__((untrusted)) *
as being different types and thus the C frontend can complain (even without
-fanalyzer) about e.g.:

extern void accepts_p(void *);

void test_argpass_to_p(void __user *p_user)
{
  accepts_p(p_user);
}

untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
from pointer with different trust level
   22 |   accepts_p(p_user);
      |              ^~~~~~
untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
type ‘__attribute__((untrusted)) void *’
   14 | extern void accepts_p(void *);
      |                        ^~~~~~

So you'd get enforcement of __user vs non-__user pointers as part of
GCC's regular type-checking.  (You need an explicit cast to convert
between the untrusted vs trusted types).

This approach is much less expressive that the custom addres space
approach; it would only cover the trust boundary aspect; it wouldn't
cover any differences between generic pointers and __user, vs __iomem,
__percpu, and __rcu which I admit I only dimly understand.

Implementation status: bootstraps and passes regression testing.
Builds most of the kernel, but am running into various conversion
issues.  It would be good to have some clarity on what conversions
the compiler ought to warn about, and what conversions should be OK.


Approach 3: some kind of custom qualifier
=========================================

Approach 1 extends the existing "named address space" machinery to add
new values; approach 2 adds a new flag to cv-qualifiers.  Both of these
approaches work in terms of cv-qualifiers.  We have some spare bits
available for these; perhaps a third approach could be to add a new
kind of user-defined qualifier, like named address spaces, but othogonal
to them.   I haven't attempted to implement this.


Other attributes
================

Patch 2 in the kit adds:
  __attribute__((returns_zero_on_success))
and
  __attribute__((returns_nonzero_on_success))
as hints to the analyzer that it's worth bifurcating the analysis of
such functions (to explore failure vs success, and thus to better
explore error-handling paths).  It's also a hint to the human reader of
the source code.

Given the above, the kernel could then have:

extern int copy_from_user(void *to, const void __user *from, long n)
  __attribute__((access (write_only, 1, 3),
		 access (read_only, 2, 3),
		 returns_zero_on_success));

extern long copy_to_user(void __user *to, const void *from, unsigned long n)
  __attribute__((access (write_only, 1, 3),
		 access (read_only, 2, 3),
		 returns_zero_on_success));

with suitable macros in compiler.h or whatnot.

("access" is an existing GCC attribute; see
 https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html )

My patched GCC add a heuristic to -fanalyzer that a 3-argument function
with a read_only buffer, a write_only buffer and a shared size argument
is a "copy function", and treats it as a copy from *from to *to of up to
n bytes that succeeds, or, given one of the above attributes can succeed
or fail.  I'm wiring things up so that values read from *untrusted_ptr
are tracked as tainted, and values written to *untrusted_ptr are treated
as possible infoleaks (e.g. uninitialized values written to
*untrusted_ptr are specifically called out).  This gets the extra
checking for infoleaks and taint that my earlier prototype had, but is
thus expressed via attributes, without having to have kernel-specific
special cases.

Patch 3 of the kit adds infoleak detection to GCC's -fanalyzer (as
in the example above).

Possibly silly question: is it always a bug for the value of a kernel
pointer to leak into user space?  i.e. should I be complaining about an
infoleak if the value of a trusted_ptr itself is written to
*untrusted_ptr?  e.g.

  s.p = some_kernel_ptr;
  copy_to_user(user_p, &s, sizeof (s));
     /* value of some_kernel_ptr is written to user space;
        is this something we should warn for?  */

Patch 4a/4b wire up the different implementations of "untrusted" into
GCC's -fanalyzer, which is used by...

Patch 5 uses this so that "untrusted" values are used in taint detection
in the analyzer, so that it can complain about attacker-controlled
values being used without sanitization.

Patch 6 adds a new __attribute__ ((tainted)) allowing for further
taint detection (e.g. identifying syscalls), with minimal patching of
the kernel, and without requiring a lot of link-time interprocedural
analysis.  I believe that some of this could work independently of
the trust boundary marking from the rest of the patch kit.

The combined patch kit (using approach 2 i.e. the "b" patches)
successfully bootstraps and passes regression testing on
x86_64-pc-linux-gnu.


Which of the 3 approaches looks best to:
- the GCC community?
- the Linux kernel community?

Does clang/LLVM have anything similar?

There are many examples in the patches, some of which are taken from
historical kernel vulnerabilities, and others from my "antipatterns.ko"
project ( https://github.com/davidmalcolm/antipatterns.ko ).

Thoughts?

Dave


David Malcolm (6 or 8, depending how you count):
  1a: RFC: Implement "#pragma GCC custom_address_space"
  1b: Add __attribute__((untrusted))
  2: Add returns_zero_on_success/failure attributes
  3: analyzer: implement infoleak detection
  4a: analyzer: implemention of region::untrusted_p in terms of custom
    address spaces
  4b: analyzer: implement region::untrusted_p in terms of
    __attribute__((untrusted))
  5: analyzer: use region::untrusted_p in taint detection
  6: Add __attribute__ ((tainted))

 gcc/Makefile.in                               |   3 +-
 gcc/analyzer/analyzer.opt                     |  20 +
 gcc/analyzer/checker-path.cc                  | 104 +++
 gcc/analyzer/checker-path.h                   |  47 +
 gcc/analyzer/diagnostic-manager.cc            |  75 +-
 gcc/analyzer/diagnostic-manager.h             |   3 +-
 gcc/analyzer/engine.cc                        | 342 ++++++-
 gcc/analyzer/exploded-graph.h                 |   3 +
 gcc/analyzer/pending-diagnostic.cc            |  30 +
 gcc/analyzer/pending-diagnostic.h             |  24 +
 gcc/analyzer/program-state.cc                 |  26 +-
 gcc/analyzer/region-model-impl-calls.cc       |  26 +-
 gcc/analyzer/region-model.cc                  | 504 ++++++++++-
 gcc/analyzer/region-model.h                   |  46 +-
 gcc/analyzer/region.cc                        |  52 ++
 gcc/analyzer/region.h                         |   4 +
 gcc/analyzer/sm-taint.cc                      | 839 ++++++++++++++++--
 gcc/analyzer/sm.h                             |   9 +
 gcc/analyzer/store.h                          |   1 +
 gcc/analyzer/trust-boundaries.cc              | 615 +++++++++++++
 gcc/c-family/c-attribs.c                      | 132 +++
 gcc/c-family/c-pretty-print.c                 |   2 +
 gcc/c/c-typeck.c                              |  64 ++
 gcc/doc/extend.texi                           |  63 +-
 gcc/doc/invoke.texi                           |  80 +-
 gcc/print-tree.c                              |   3 +
 .../c-c++-common/attr-returns-zero-on-1.c     |  68 ++
 gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++++
 .../gcc.dg/analyzer/attr-tainted-1.c          |  88 ++
 .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
 .../gcc.dg/analyzer/copy-function-1.c         |  98 ++
 .../gcc.dg/analyzer/copy_from_user-1.c        |  45 +
 gcc/testsuite/gcc.dg/analyzer/infoleak-1.c    | 181 ++++
 gcc/testsuite/gcc.dg/analyzer/infoleak-2.c    |  29 +
 gcc/testsuite/gcc.dg/analyzer/infoleak-3.c    | 141 +++
 gcc/testsuite/gcc.dg/analyzer/infoleak-5.c    |  35 +
 .../analyzer/infoleak-CVE-2011-1078-1.c       | 134 +++
 .../analyzer/infoleak-CVE-2011-1078-2.c       |  42 +
 .../analyzer/infoleak-CVE-2014-1446-1.c       | 117 +++
 .../analyzer/infoleak-CVE-2017-18549-1.c      | 101 +++
 .../analyzer/infoleak-CVE-2017-18550-1.c      | 171 ++++
 .../gcc.dg/analyzer/infoleak-antipatterns-1.c | 162 ++++
 .../gcc.dg/analyzer/infoleak-fixit-1.c        |  22 +
 gcc/testsuite/gcc.dg/analyzer/pr93382.c       |   2 +-
 .../analyzer/taint-CVE-2011-0521-1-fixed.c    | 113 +++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-1.c   | 113 +++
 .../analyzer/taint-CVE-2011-0521-2-fixed.c    |  93 ++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-2.c   |  93 ++
 .../analyzer/taint-CVE-2011-0521-3-fixed.c    |  56 ++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-3.c   |  57 ++
 .../gcc.dg/analyzer/taint-CVE-2011-0521-4.c   |  40 +
 .../gcc.dg/analyzer/taint-CVE-2011-0521-5.c   |  42 +
 .../gcc.dg/analyzer/taint-CVE-2011-0521-6.c   |  37 +
 .../gcc.dg/analyzer/taint-CVE-2011-0521.h     | 136 +++
 .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 ++
 .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +
 .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 +
 .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 ++
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c |  64 ++
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c |  27 +
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 +
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 +
 .../gcc.dg/analyzer/taint-antipatterns-1.c    | 137 +++
 .../gcc.dg/analyzer/taint-divisor-1.c         |  26 +
 .../{taint-1.c => taint-read-index-1.c}       |  19 +-
 .../gcc.dg/analyzer/taint-read-offset-1.c     | 128 +++
 .../taint-read-through-untrusted-ptr-1.c      |  37 +
 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c  |  32 +
 .../gcc.dg/analyzer/taint-write-index-1.c     | 132 +++
 .../gcc.dg/analyzer/taint-write-offset-1.c    | 132 +++
 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h  |  19 +
 .../torture/infoleak-net-ethtool-ioctl.c      |  78 ++
 .../torture/infoleak-vfio_iommu_type1.c       |  39 +
 gcc/tree-core.h                               |   6 +-
 gcc/tree.c                                    |   1 +
 gcc/tree.h                                    |  11 +-
 76 files changed, 6558 insertions(+), 140 deletions(-)
 create mode 100644 gcc/analyzer/trust-boundaries.cc
 create mode 100644 gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy-function-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy_from_user-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2014-1446-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18549-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18550-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-antipatterns-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-fixit-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c
 rename gcc/testsuite/gcc.dg/analyzer/{taint-1.c => taint-read-index-1.c} (72%)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/infoleak-net-ethtool-ioctl.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/infoleak-vfio_iommu_type1.c