From patchwork Fri Aug 11 09:53:04 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Florian Weimer <fweimer@redhat.com>
X-Patchwork-Id: 22071
Received: (qmail 105900 invoked by alias); 11 Aug 2017 09:53:14 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 105860 invoked by uid 89); 11 Aug 2017 09:53:11 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
	GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD,
	SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=exhaust,
	suffers, Compared, resizing
X-HELO: mx1.redhat.com
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 437F6147619
Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com;
	dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com;
	spf=fail smtp.mailfrom=fweimer@redhat.com
To: GNU C Library <libc-alpha@sourceware.org>
From: Florian Weimer <fweimer@redhat.com>
Subject: [PATCH] manual: Update alloca and variable length array
	documentation
Message-ID: <63922f93-860a-249a-c14c-7efea3a3189b@redhat.com>
Date: Fri, 11 Aug 2017 11:53:04 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.2.1
MIME-Version: 1.0

I'm resubmitting my old documentation patch (from early 2016) for
reconsideration.

I reinstated the mention of GNU compiler compatibility for strdupa and
strndupa.

Thanks,
Florian

manual: Update alloca and variable length array documentation

2017-08-11  Florian Weimer  <fweimer@redhat.com>

	* manual/memory.texi (Variable Size Automatic): Document
	interaction between alloca and variable length arrays.  Mention
	function inlining.  Remove obsolete warning about alloca in
	function parameter lists.
	(Advantages of Alloca): Note that alloca is async-signal-safe.
	Mention C++ destructors and lack of length checking in open2.
	(Disadvantages of Alloca): Clarify consequences of the lack of
	error checking.  Do no mention the non-existing alloca emulation.
	(GNU C Variable-Size Arrays): Switch terminology from GNU C
	variable-sized arrays to ISO C variable length arrays.  Mention
	security aspect and aliasing violations.  Clarify loop behavior.
	Remove NB, now part of the alloca documentation.

	* manual/string.texi (Copying Strings and Arrays): Add warning
	about alloca and length checking to strdupa.  Rephrase restriction
	to GNU CC.
	(Truncating Strings): Add warning to strndupa.  Rephrase
	restriction to GNU CC.

diff --git a/manual/memory.texi b/manual/memory.texi
index 82f473806c..a5707e1e91 100644
--- a/manual/memory.texi
+++ b/manual/memory.texi
@@ -2802,13 +2802,28 @@ The function @code{alloca} supports a kind of half-dynamic allocation in
 which blocks are allocated dynamically but freed automatically.
 
 Allocating a block with @code{alloca} is an explicit action; you can
-allocate as many blocks as you wish, and compute the size at run time.  But
-all the blocks are freed when you exit the function that @code{alloca} was
-called from, just as if they were automatic variables declared in that
-function.  There is no way to free the space explicitly.
+allocate as many blocks as you wish, and compute the size at run time.
+Memory allocated this way is freed automatically, at some point after
+the scope which contains the @code{alloca} call is left:
+
+@itemize @bullet
+@item
+@cindex variable length arrays
+If the scope calling @code{alloca} contains a variable length array, or
+is nested in such a scope, then the object allocated with @code{alloca}
+is deallocated when the closest enclosing scope which defines a variable
+length array is left.
+
+@item
+If no enclosing scope with a variable length array exist, the allocated
+object is deallocated when the function is exited, either normally or
+abnormally (for example, by throwing a C++ exception).  The life time of
+such objects is not extended by function inlining.
+@end itemize
 
 The prototype for @code{alloca} is in @file{stdlib.h}.  This function is
-a BSD extension.
+a BSD extension.  It requires special support from the compiler, but
+most compilers (including the GNU compilers) support it.
 @pindex stdlib.h
 
 @deftypefun {void *} alloca (size_t @var{size})
@@ -2819,21 +2834,11 @@ The return value of @code{alloca} is the address of a block of @var{size}
 bytes of memory, allocated in the stack frame of the calling function.
 @end deftypefun
 
-Do not use @code{alloca} inside the arguments of a function call---you
-will get unpredictable results, because the stack space for the
-@code{alloca} would appear on the stack in the middle of the space for
-the function arguments.  An example of what to avoid is @code{foo (x,
-alloca (4), y)}.
-@c This might get fixed in future versions of GCC, but that won't make
-@c it safe with compilers generally.
-
 @menu
 * Alloca Example::              Example of using @code{alloca}.
 * Advantages of Alloca::        Reasons to use @code{alloca}.
 * Disadvantages of Alloca::     Reasons to avoid @code{alloca}.
-* GNU C Variable-Size Arrays::  Only in GNU C, here is an alternative
-				 method of allocating dynamically and
-				 freeing automatically.
+* GNU C Variable-Size Arrays::  On-stack dynamic allocation in ISO C.
 @end menu
 
 @node Alloca Example
@@ -2891,6 +2896,14 @@ blocks, space used for any size block can be reused for any other size.
 @code{alloca} does not cause memory fragmentation.
 
 @item
+@cindex mmap
+The @code{alloca} function can be safely called from a signal handler.
+But signal handlers may run with little stack space available, so
+it is unclear how much memory can be safely allocted with @code{alloca}.
+This means that robust code may have to use @code{mmap} instead.
+@xref{Memory-mapped I/O}.
+
+@item
 @cindex longjmp
 Nonlocal exits done with @code{longjmp} (@pxref{Non-Local Exits})
 automatically free the space allocated with @code{alloca} when they exit
@@ -2922,7 +2935,13 @@ freed even when an error occurs, with no special effort required.
 By contrast, the previous definition of @code{open2} (which uses
 @code{malloc} and @code{free}) would develop a memory leak if it were
 changed in this way.  Even if you are willing to make more changes to
-fix it, there is no easy way to do so.
+fix it, there is no easy way to do so (except to switch to C++ and
+destructors).
+
+Note that the @code{open2} example with @code{alloca} is incorrect if
+@code{str1} and @code{str2} can be very long strings because
+@code{alloca} does not fail gracefully in case too many bytes are
+requested (see below).
 @end itemize
 
 @node Disadvantages of Alloca
@@ -2936,22 +2955,38 @@ These are the disadvantages of @code{alloca} in comparison with
 @itemize @bullet
 @item
 If you try to allocate more memory than the machine can provide, you
-don't get a clean error message.  Instead you get a fatal signal like
-the one you would get from an infinite recursion; probably a
-segmentation violation (@pxref{Program Error Signals}).
+don't get a clean error message.  Instead, you end up with undefined
+behavior.  In many cases, the program will just crash (which can still
+result in a denial-of-service vulnerability), but sometimes, it is
+possible to abuse an unbounded @code{alloca} to cause other security
+vulnerabilities such as information disclosure or arbitrary code
+execution.
 
 @item
 Some @nongnusystems{} fail to support @code{alloca}, so it is less
-portable.  However, a slower emulation of @code{alloca} written in C
-is available for use on systems with this deficiency.
+portable.
 @end itemize
 
+Due to lack of error checking, security-sensitive code must ensure that
+no large objects are allocated with @code{alloca}.  In general this
+means that the size argument is checked against an arbitrary limit (say,
+4096), and an error is returned if it is exceeded, or fallback to
+@code{malloc} is performed.
+
+Extra care is required when @code{alloca} is called from within the loop
+or from a function called recursively.  In these cases, depending on the
+loop iteration count or the depth of the recursion, smaller allocation
+sizes can exhaust the stack and trigger undefined behavior.  This
+problem exists with callback functions as well.
+
+@c Node name preserved for backwards compatibility; the correct
+@c terminology is ``variable length array''.
 @node GNU C Variable-Size Arrays
-@subsubsection GNU C Variable-Size Arrays
-@cindex variable-sized arrays
+@subsubsection ISO C Variable Length Arrays
+@cindex variable length arrays
 
-In GNU C, you can replace most uses of @code{alloca} with an array of
-variable size.  Here is how @code{open2} would look then:
+In ISO C, you can replace most uses of @code{alloca} with an array of
+variable length.  Here is how @code{open2} would look then:
 
 @smallexample
 int open2 (char *str1, char *str2, int flags, int mode)
@@ -2962,26 +2997,40 @@ int open2 (char *str1, char *str2, int flags, int mode)
 @}
 @end smallexample
 
+Compared to @code{malloc}, variable length arrays share the same
+advantages and disadvantages as @code{alloca}.  In particular, there is
+no error checking (and security vulnerabilities can result from large
+allocation requests), and some @nongnusystems{} do not support variable
+length arrays because they only support earlier versions of ISO C which
+do not include variable length arrays.
+
+The variable length array version of @code{open2}, as shown above, still
+suffers from the same problem as the @code{alloca}-based variant: It
+does not check that the strings are short enough, to avoid undefined
+behavior which are the result of large allocation requests.
+
 But @code{alloca} is not always equivalent to a variable-sized array, for
 several reasons:
 
 @itemize @bullet
 @item
-A variable size array's space is freed at the end of the scope of the
-name of the array.  The space allocated with @code{alloca}
-remains until the end of the function.
+Memory returned by @code{alloca} is untyped.  A variable length array
+has always a specific type (even if it is an array of characters), and
+using it with another type can introduce aliasing violations into the
+program.
 
 @item
-It is possible to use @code{alloca} within a loop, allocating an
-additional block on each iteration.  This is impossible with
-variable-sized arrays.
+A variable length array is deallocated at the end of the scope of the
+name of the array.  The space allocated with @code{alloca} remains until
+the end of the function or the closest enclosing scope which defines any
+variable length array.
 @end itemize
 
-@strong{NB:} If you mix use of @code{alloca} and variable-sized arrays
-within one function, exiting a scope in which a variable-sized array was
-declared frees all blocks allocated with @code{alloca} during the
-execution of that scope.
-
+The second difference is most pronounced in loops: With @code{alloca},
+the allocated object can be referenced from later iterations and after
+the loop body has been exited.  But a loop with a variable length array
+can execute an arbitrary number of times, without exhausting the
+available stack, as long as the individual arrays are short enough.
 
 @node Resizing the Data Segment
 @section Resizing the Data Segment
diff --git a/manual/string.texi b/manual/string.texi
index ac02c6d85e..d50527f585 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -626,24 +626,20 @@ The behavior of @code{wcpcpy} is undefined if the strings overlap.
 This macro is similar to @code{strdup} but allocates the new string
 using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
 Automatic}).  This means of course the returned string has the same
-limitations as any block of memory allocated using @code{alloca}.
+limitations as any block of memory allocated using @code{alloca}, and
+@code{strdupa} can introduce security vulnerabilities due to the lack of
+failure checking.
 
-For obvious reasons @code{strdupa} is implemented only as a macro;
-you cannot get the address of this function.  Despite this limitation
-it is a useful function.  The following code shows a situation where
-using @code{malloc} would be a lot more expensive.
+For obvious reasons @code{strdupa} is implemented only as a macro; you
+cannot get the address of this function.  The following code shows an
+example of its use:
 
 @smallexample
 @include strdupa.c.texi
 @end smallexample
 
-Please note that calling @code{strtok} using @var{path} directly is
-invalid.  It is also not allowed to call @code{strdupa} in the argument
-list of @code{strtok} since @code{strdupa} uses @code{alloca}
-(@pxref{Variable Size Automatic}) can interfere with the parameter
-passing.
-
-This function is only available if GNU CC is used.
+The @code{strdupa} macro is only available with GNU-compatible
+compilers.
 @end deftypefn
 
 @deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
@@ -934,16 +930,16 @@ processing text.
 This function is similar to @code{strndup} but like @code{strdupa} it
 allocates the new string using @code{alloca} @pxref{Variable Size
 Automatic}.  The same advantages and limitations of @code{strdupa} are
-valid for @code{strndupa}, too.
+valid for @code{strndupa}.  In particular, @code{strndupa} can introduce
+security vulnerabilities due to the lack of error checking.
 
 This function is implemented only as a macro, just like @code{strdupa}.
-Just as @code{strdupa} this macro also must not be used inside the
-parameter list in a function call.
 
 As noted below, this function is generally a poor choice for
 processing text.
 
-@code{strndupa} is only available if GNU CC is used.
+The @code{strndupa} macro is only available with GNU-compatible
+compilers.
 @end deftypefn
 
 @deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})