[00/10] __builtin_dynamic_object_size

Message ID 20211109190137.1107736-1-siddhesh@gotplt.org
Headers
Series __builtin_dynamic_object_size |

Message

Siddhesh Poyarekar Nov. 9, 2021, 7:01 p.m. UTC
  This patchset implements the __builtin_dynamic_object_size builtin for
gcc.  The primary motivation to have this builtin in gcc is to enable
_FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
in use cases where the potential performance tradeoff is acceptable.

Semantics:
----------

__builtin_dynamic_object_size has the same signature as
__builtin_object_size; it accepts a pointer and type ranging from 0 to 3
and it returns an object size estimate for the pointer based on an
analysis of which objects the pointer could point to.  The actual
properties of the object size estimate are different:

- In the best case __builtin_dynamic_object_size evaluates to an
  expression that represents a precise size of the object being pointed
  to.

- In case a precise object size expression cannot be evaluated,
  __builtin_dynamic_object_size attempts to evaluate an estimate size
  expression based on the object size type.

- In what situations the builtin returns an estimate vs a precise
  expression is an implementation detail and may change in future.
  Users must always assume, as in the case of __builtin_object_size, that
  the returned value is the maximum or minimum based on the object size
  type they have provided.

- In the worst case of failure, __builtin_dynamic_object_size returns a
  constant (size_t)-1 or (size_t)0.

Implementation:
---------------

- The __builtin_dynamic_object_size support is implemented in
  tree-object-size.  In most cases the first pass (early_objsz) is able
  to evaluate object size expressions.  The second phase mainly ends up
  simplifying the __builtin_dynamic_object_size to
  __builtin_object_size.

- The patchset begins with structural modification of the
  tree-object-size pass, followed by enhancement to return size
  expressions.  I have split the implementation into one feature per
  patch (calls, function parameters, PHI, etc.) to hopefully ease
  review.

Performance:
------------

Expressions generated by this pass in theory could be arbitrarily
complex.  I have not made an attempt to limit nesting of objects since
it seemed too early to do that.  In practice based on the few
applications I built, most of the complexity of the expressions got
folded away.  Even so, the performance overhead is likely to be
non-zero.  If we find performance degradation to be significant, we
could later add nesting limits to bail out if a size expression gets too
complex.

I have also not implemented simplification of __*_chk to their normal
variants if we can determine at compile time that it is safe, which
still depends on the object size to be constant.  I hope to do this as a
minor performance improvement in stage 3.

Build time performance doesn't seem to be affected much based on an
unscientific check to time
`make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`.  It only increases by
about a couple of seconds when the dynamic tests are added and remains
more or less in the same ballpark otherwise.

Testing:
--------

I have added tests for dynamic object sizes as well as wrappers for all
__builtin_object_size tests to provide wide coverage.  With that in
place I have run full bootstrap builds on Fedora rawhide by backporting the
patches to the gcc11 branch and x86_64 and i686 have no failures in any
of the builtin-*object-size* tests and no new failures.

I have also built bash, cmake, zfs-fuse and systemtap with
_FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw
no issues in any of those builds.  I did some rudimentary analysis of
the generated binaries to see if there was any difference in coverage
and found that there was.  In terms of pure numbers, there were far more
_chk variants of calls than the regular ones due to _FORTIFY_SOURCE
(from about 4% to 70% in bash), but that could well be due to the _chk
variants not being folded into regular variants when safe.  However, on
manual inspection of some of these sites, it was clear that coverage was
increasing significantly where __builtin_object_size was previously
bailing out.

Specifically for bash, the coverage went from 30.81% to 86.87% (it was
84.5% with the v1 patch).  I actually hope to reduce this a bit with
folding improvements for __builtin___memcpy_chk, etc.

A bootstrap build is in progress on x86_64.

Limitations/Future work:
------------------------

- The most immediate work is to fold _chk variants of builtins into
  regular ones when it can be proven at compile time that the object
  size will alwasy be less than the length of the write.  I am working
  on it right now.

- I need to enable _FORTIFY_SOURCE=3 for gcc in glibc; currently it is
  llvm-only.  I've started working on these patches too on the side.

- Instead of bailing out on non-constant sizes with
  __builtin_object_size, it should be possible to use ranger to
  get an upper and lower bound on the size expression and use that to
  implement __builtin_object_size.

- More work could to be done to reduce the performance impact of the
  computation.  One way could be to add a heuristic where the pass keeps
  track of nesting in the expression and either bail out or compute an
  estimate if nesting crosses a threshold.  I'll take this up once we
  have more data on the nature of the bottlenecks.


Siddhesh Poyarekar (10):
  tree-object-size: Replace magic numbers with enums
  tree-object-size: Abstract object_sizes array
  tree-object-size: Use tree instead of HOST_WIDE_INT
  tree-object-size: Single pass dependency loop resolution
  __builtin_dynamic_object_size: Recognize builtin
  tree-object-size: Support dynamic sizes in conditions
  tree-object-size: Handle function parameters
  tree-object-size: Handle GIMPLE_CALL
  tree-object-size: Dynamic sizes for ADDR_EXPR
  tree-object-size: Handle dynamic offsets

 gcc/builtins.c                                |   22 +-
 gcc/builtins.def                              |    1 +
 gcc/doc/extend.texi                           |   13 +
 gcc/gimple-fold.c                             |    9 +-
 .../g++.dg/ext/builtin-dynamic-object-size1.C |    5 +
 .../g++.dg/ext/builtin-dynamic-object-size2.C |    5 +
 .../gcc.dg/builtin-dynamic-alloc-size.c       |    7 +
 .../gcc.dg/builtin-dynamic-object-size-0.c    |  463 +++++
 .../gcc.dg/builtin-dynamic-object-size-1.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-10.c   |    9 +
 .../gcc.dg/builtin-dynamic-object-size-11.c   |    7 +
 .../gcc.dg/builtin-dynamic-object-size-12.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-13.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-14.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-15.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-16.c   |    7 +
 .../gcc.dg/builtin-dynamic-object-size-17.c   |    8 +
 .../gcc.dg/builtin-dynamic-object-size-18.c   |    8 +
 .../gcc.dg/builtin-dynamic-object-size-19.c   |  104 +
 .../gcc.dg/builtin-dynamic-object-size-2.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-3.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-4.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-5.c    |    8 +
 .../gcc.dg/builtin-dynamic-object-size-6.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-7.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-8.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-9.c    |    5 +
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  |  160 +-
 gcc/testsuite/gcc.dg/builtin-object-size-16.c |    2 +
 gcc/testsuite/gcc.dg/builtin-object-size-17.c |    2 +
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  134 ++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  |  151 ++
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  |   99 +
 gcc/testsuite/gcc.dg/builtin-object-size-5.c  |   12 +
 gcc/tree-object-size.c                        | 1766 +++++++++++------
 gcc/tree-object-size.h                        |    3 +-
 gcc/ubsan.c                                   |   46 +-
 37 files changed, 2499 insertions(+), 620 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-14.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-15.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-16.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-17.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-18.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-19.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-5.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-6.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-7.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-8.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-9.c
  

Comments

Jakub Jelinek Nov. 19, 2021, 3:56 p.m. UTC | #1
On Wed, Nov 10, 2021 at 12:31:26AM +0530, Siddhesh Poyarekar wrote:
> - Instead of bailing out on non-constant sizes with
>   __builtin_object_size, it should be possible to use ranger to
>   get an upper and lower bound on the size expression and use that to
>   implement __builtin_object_size.

I'd prefer not to.  One thing is that VRP heavily relies on UB not happening
in the program, while __bos is typically used to catch UB in those programs.
And, I'm afraid fairly often VRP would result in runtime tests for limits
that aren't useful for security at all.  Say VRP figures out that some
integer isn't negative or doesn't have MSB set etc., suddenly we have range
of [0, INT_MAX] or similar and making that imply __builtin_object_size
INT_MAX rather than ~(size_t) 0 would mean we need to use __*_chk and
compare at runtime, even when it is very unlikely an object would be that
big...
The compiler computes some range, but there is not information on how much
the range actually maps to the actual range the program is using, or when it
is some much larger superset of the actual range (same problem is with
Martin's warnings BTW).  Some unrelated inlined function can perform some
comparison just in case, perhaps some jump threading is done and all of sudden
there is non-VARYING range.

	Jakub