[0/8] __builtin_dynamic_object_size and more

Message ID 20211007221432.1029249-1-siddhesh@gotplt.org
Headers
Series __builtin_dynamic_object_size and more |

Message

Siddhesh Poyarekar Oct. 7, 2021, 10:14 p.m. UTC
  This patchset implements the __builtin_dynamic_object_size builtin for
gcc.  The primary motivation to have this builtin in gcc is to enable
_FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
in use cases where the potential performance tradeoff is acceptable.

Semantics:
----------

__builtin_dynamic_object_size has the same signature as
__builtin_object_size; it accepts a pointer and type ranging from 0 to 3
and it returns an object size estimate for the pointer based on an
analysis of which objects the pointer could point to.  The actual
properties of the object size estimate are different:

- In the best case __builtin_dynamic_object_size evaluates to an
  expression that represents a precise size of the object being pointed
  to.

- In case a precise object size expression cannot be evaluated,
  __builtin_dynamic_object_size attempts to evaluate an estimate size
  expression based on the object size type.

- In what situations the builtin returns an estimate vs a precise
  expression is an implementation detail and may change in future.
  Users must always assume, as in the case of __builtin_object_size, that
  the returned value is the maximum or minimum based on the object size
  type they have provided.

- In the worst case of failure, __builtin_dynamic_object_size returns a
  constant (size_t)-1 or (size_t)0.

- If __builtin_dynamic_object_size returns a non-constant expression,
  the value of that expression is never (size_t)-1.

Implementation:
---------------

- The __builtin_dynamic_object_size support is implemented in
  tree-dynamic-object-size.c as two passes, just like tree-object-size.
  In most cases the first pass (early_dynobjsz) is able to evaluate
  object size expressions.  The second phase mainly ends up simplifying
  the __builtin_dynamic_object_size to __builtin_object_size.  Speaking
  of which...

- Currently if __builtin_dynamic_object_size fails to evaluate a precise
  size expression, it replaces the call with __builtin_object_size and
  punts to the last tree-object-size pass.  In future, this could become
  more sophisticated.

- Size expressions returned directly by __builtin_dynamic_object_size
  are bounded in the range of [0, SIZE_MAX - 1] so that equality tests
  with (size_t)-1 fail.  This is necessary to eliminate unnecessary
  branches in _FORTIFY_SOURCE=3.  This could be tightened further in
  future to SIZE_MAX / 2 since there are already assumptions built in
  that prevent object sizes from crossing that limit.

- I have split the implementation into one feature per patchset (calls,
  function parameters, PHI, etc.) to hopefully ease review.

- Some code duplication from tree-object-size is intentional.  The
  immediate need for that is to reduce impact on existing passes.  Over
  the medium/long term though, the intention is to fully replace
  tree-object-size.  See Future work for more information.

- I have marked the new files as Copyright (C) me (since I'm
  contributing under DCO), but I've based some of the code on the
  existing tree-object-size.c, so I need advice on the correct copyright
  notice.

Performance:
------------

Expressions generated by this pass in theory could be arbitrarily
complex.  I have not made an attempt to limit nesting of objects since
it seemed too early to do that.  In practice based on the few
applications I built, most of the complexity of the expressions got
folded away.  Even so, the performance overhead is likely to be
non-zero.  If we find performance degradation to be significant, we
could later add nesting limits to bail out if a size expression gets too
complex.

I have also not implemented simplification of __*_chk to their normal
variants if we can determine at compile time that it is safe.  I am
working on that right now and hope to have it ready before stage 1
closes.

Testing:
--------

I have added tests for dynamic object sizes as well as wrappers for all
__builtin_object_size tests to provide wide coverage.  With that in
place I have run full bootstrap builds on Fedora rawhide by backporting the
patches to the gcc11 branch and x86_64 and i686 have no failures in any
of the builtin-*object-size* tests and no new failures.

I have also built bash, cmake, zfs-fuse and systemtap with
_FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw
no issues in any of those builds.  I did some rudimentary analysis of
the generated binaries to see if there was any difference in coverage
and found that there was.  In terms of pure numbers, there were far more
_chk variants of calls than the regular ones due to _FORTIFY_SOURCE
(from about 4% to 70% in bash), but that could well be due to the _chk
variants not being folded into regular variants when safe.  However, on
manual inspection of some of these sites, it was clear that coverage was
increasing significantly where __builtin_object_size was previously
bailing out.

I'm running more tests with _FORTIFY_SOURCE=3 and will do more analysis
to quantify the coverage improvement more clearly.

Limitations/Future work:

- The most immediate work is to fold _chk variants of builtins into
  regular ones when it can be proven at compile time that the object
  size will alwasy be less than the length of the write.  I am working
  on it right now.

- I need to enable _FORTIFY_SOURCE=3 for gcc in glibc; currently it is
  llvm-only.  I've started working on these patches too on the side.

- AFAICT, there are no cases where __builtin_dynamic_object_size misses
  and __builtin_object_size gets a non-fail estimate.  I have still kept
  the backstop in place since this is a new pass and it's very likely
  that I've missed something.  It should be possible to use ranger to
  get an upper and lower bound on the size expression computed by
  __builtin_dynamic_object_size and use that to implement
  __builtin_object_size.  In practice though, ranger currently seems to
  have some difficulty evaluating ranges for some PHI objects.  I intend
  to look into that a bit more once this pass has settled in.

- More work could to be done to reduce the performance impact of the
  computation.  One way could be to add a heuristic where the pass keeps
  track of nesting in the expression and either bail out or compute an
  estimate if nesting crosses a threshold.  I'll take this up once we
  have more data on the nature of the bottlenecks.

Siddhesh Poyarekar (8):
  __builtin_dynamic_object_size: Recognize builtin name
  tree-dynamic-object-size: New pass
  tree-dynamic-object-size: Handle GIMPLE_PHI
  tree-dynamic-object-size: Support ADDR_EXPR
  tree-dynamic-object-size: Handle GIMPLE_ASSIGN
  tree-dynamic-object-size: Handle function parameters
  tree-dynamic-object-size: Get subobject sizes
  tree-dynamic-object-size: Add test wrappers to extend testing

 gcc/Makefile.in                               |   19 +-
 gcc/builtins.c                                |   71 +-
 gcc/builtins.def                              |    1 +
 gcc/doc/extend.texi                           |   11 +
 gcc/passes.def                                |    3 +
 .../g++.dg/ext/builtin-dynamic-object-size1.C |    5 +
 .../g++.dg/ext/builtin-dynamic-object-size2.C |    5 +
 .../gcc.dg/builtin-dynamic-alloc-size.c       |    7 +
 .../gcc.dg/builtin-dynamic-object-size-0.c    |  401 +++++
 .../gcc.dg/builtin-dynamic-object-size-1.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-10.c   |    9 +
 .../gcc.dg/builtin-dynamic-object-size-11.c   |    7 +
 .../gcc.dg/builtin-dynamic-object-size-12.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-13.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-14.c   |    5 +
 .../gcc.dg/builtin-dynamic-object-size-15.c   |    6 +
 .../gcc.dg/builtin-dynamic-object-size-16.c   |    7 +
 .../gcc.dg/builtin-dynamic-object-size-17.c   |    8 +
 .../gcc.dg/builtin-dynamic-object-size-18.c   |    8 +
 .../gcc.dg/builtin-dynamic-object-size-19.c   |  104 ++
 .../gcc.dg/builtin-dynamic-object-size-2.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-3.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-4.c    |    7 +
 .../gcc.dg/builtin-dynamic-object-size-6.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-7.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-8.c    |    5 +
 .../gcc.dg/builtin-dynamic-object-size-9.c    |    5 +
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  |  110 ++
 gcc/testsuite/gcc.dg/builtin-object-size-16.c |   10 +
 gcc/testsuite/gcc.dg/builtin-object-size-17.c |    2 +
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  126 ++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  |  148 ++
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  |  117 ++
 gcc/tree-dynamic-object-size.c                | 1513 +++++++++++++++++
 gcc/tree-dynamic-object-size.h                |   26 +
 gcc/tree-pass.h                               |    2 +
 36 files changed, 2769 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-14.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-15.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-16.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-17.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-18.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-19.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-6.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-7.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-8.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-9.c
 create mode 100644 gcc/tree-dynamic-object-size.c
 create mode 100644 gcc/tree-dynamic-object-size.h
  

Comments

Siddhesh Poyarekar Oct. 8, 2021, 4:50 a.m. UTC | #1
On 10/8/21 03:44, Siddhesh Poyarekar wrote:
> (from about 4% to 70% in bash), but that could well be due to the _chk

I should also clarify that this is for memcpy.  For all fortifiable 
functions, the coverage percentage went from 30.81% to 84.5% for bash. 
Below is the full table.  Please note that this is only based on symbols 
emitted in the end as I didn't want to rebuild the _FORTIFIED_SOURCE=2 
binaries, so it does not take into account the fact that _chk could get 
folded to regular calls if we know at compile time that it's safe to do so.

No more posting patches at 4am; it only leads to more clarification 
follow-ups :/

   Func  	F(2)	U(2)	Cov(2)	F(3)	U(3)	Cov(3)
----------------------------------------------------------------
  asprintf	   1	   0	100.00%	   1	   0	100.00%
*confstr	   0	   2	  0.00%	   1	   1	 50.00%
  fdelt	          10	   0	100.00%	  10	   0	100.00%
*fgets	           0	   1	  0.00%	   1	   0	100.00%
  fprintf	  82	   0	100.00%	  82	   0	100.00%
  getcwd	           0	   3	  0.00%	   0	   3	  0.00%
*getgroups	   0	   1	  0.00%	   1	   0	100.00%
  gethostname	   0	   1	  0.00%	   0	   1	  0.00%
  longjmp	  23	   0	100.00%	  23	   0	100.00%
  mbsnrtowcs	   0	   2	  0.00%	   0	   2	  0.00%
*mbsrtowcs	   0	   1	  0.00%	   1	   0	100.00%
*mbstowcs	   0	  10	  0.00%	   5	   5	 50.00%
*memcpy	           7	 170	  3.95%	 116	  40	 74.36%
*memmove	   4	  21	 16.00%	  12	  17	 41.38%
*memset	           0	  24	  0.00%	   3	  21	 12.50%
  printf	         150	   0	100.00%	 150	   0	100.00%
*read	           0	  19	  0.00%	   8	  11	 42.11%
*readlink	   0	   3	  0.00%	   0	   3	  0.00%
  snprintf	  11	   0	100.00%	  11	   0	100.00%
  sprintf	  28	   0	100.00%	  28	   0	100.00%
*strcat 	   0	   5	  0.00%	   8	   0	100.00%
*strcpy 	   6	 440	  1.35%	 413	  36	 91.98%
*strncpy	   2	  52	  3.70%	  33	  19	 63.46%
  vfprintf	  14	   0	100.00%	  14	   0	100.00%
  vsnprintf	   4	   0	100.00%	   4	   0	100.00%
  wcrtomb	   0	   7	  0.00%	   0	   7	  0.00%
*wcsrtombs	   0	   3	  0.00%	   2	   1	 66.67%
  wctomb 	   0	   3	  0.00%	   0	   3	  0.00%
-----------------------------------------------------------------
Total   	 342	 768	 30.81%	927	 170	 84.50%
  
Jeff Law Oct. 17, 2021, 7:57 p.m. UTC | #2
On 10/7/2021 10:50 PM, Siddhesh Poyarekar wrote:
> On 10/8/21 03:44, Siddhesh Poyarekar wrote:
>> (from about 4% to 70% in bash), but that could well be due to the _chk
>
> I should also clarify that this is for memcpy.  For all fortifiable 
> functions, the coverage percentage went from 30.81% to 84.5% for bash. 
> Below is the full table.  Please note that this is only based on 
> symbols emitted in the end as I didn't want to rebuild the 
> _FORTIFIED_SOURCE=2 binaries, so it does not take into account the 
> fact that _chk could get folded to regular calls if we know at compile 
> time that it's safe to do so.
>
> No more posting patches at 4am; it only leads to more clarification 
> follow-ups :/
FWIW, that 30% number is roughly in-line with the data we saw from a Red 
Hat partner a year or so ago.  Bringing that up to 80%+ would be a 
notable win, even if folks have to explicitly opt-in, as I expect some 
projects would without hesitation.

I'd really like it if Jakub could take the lead on this.  While I'm a 
big proponent of the workn Jakub knows the relevant code far better than 
I and it'll affect the Red Hat team far more than I'll affect me these 
days :-)


Jeff