From patchwork Fri Aug 11 09:53:04 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 22071 Received: (qmail 105900 invoked by alias); 11 Aug 2017 09:53:14 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 105860 invoked by uid 89); 11 Aug 2017 09:53:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=exhaust, suffers, Compared, resizing X-HELO: mx1.redhat.com DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 437F6147619 Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=fweimer@redhat.com To: GNU C Library From: Florian Weimer Subject: [PATCH] manual: Update alloca and variable length array documentation Message-ID: <63922f93-860a-249a-c14c-7efea3a3189b@redhat.com> Date: Fri, 11 Aug 2017 11:53:04 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 I'm resubmitting my old documentation patch (from early 2016) for reconsideration. I reinstated the mention of GNU compiler compatibility for strdupa and strndupa. Thanks, Florian manual: Update alloca and variable length array documentation 2017-08-11 Florian Weimer * manual/memory.texi (Variable Size Automatic): Document interaction between alloca and variable length arrays. Mention function inlining. Remove obsolete warning about alloca in function parameter lists. (Advantages of Alloca): Note that alloca is async-signal-safe. Mention C++ destructors and lack of length checking in open2. (Disadvantages of Alloca): Clarify consequences of the lack of error checking. Do no mention the non-existing alloca emulation. (GNU C Variable-Size Arrays): Switch terminology from GNU C variable-sized arrays to ISO C variable length arrays. Mention security aspect and aliasing violations. Clarify loop behavior. Remove NB, now part of the alloca documentation. * manual/string.texi (Copying Strings and Arrays): Add warning about alloca and length checking to strdupa. Rephrase restriction to GNU CC. (Truncating Strings): Add warning to strndupa. Rephrase restriction to GNU CC. diff --git a/manual/memory.texi b/manual/memory.texi index 82f473806c..a5707e1e91 100644 --- a/manual/memory.texi +++ b/manual/memory.texi @@ -2802,13 +2802,28 @@ The function @code{alloca} supports a kind of half-dynamic allocation in which blocks are allocated dynamically but freed automatically. Allocating a block with @code{alloca} is an explicit action; you can -allocate as many blocks as you wish, and compute the size at run time. But -all the blocks are freed when you exit the function that @code{alloca} was -called from, just as if they were automatic variables declared in that -function. There is no way to free the space explicitly. +allocate as many blocks as you wish, and compute the size at run time. +Memory allocated this way is freed automatically, at some point after +the scope which contains the @code{alloca} call is left: + +@itemize @bullet +@item +@cindex variable length arrays +If the scope calling @code{alloca} contains a variable length array, or +is nested in such a scope, then the object allocated with @code{alloca} +is deallocated when the closest enclosing scope which defines a variable +length array is left. + +@item +If no enclosing scope with a variable length array exist, the allocated +object is deallocated when the function is exited, either normally or +abnormally (for example, by throwing a C++ exception). The life time of +such objects is not extended by function inlining. +@end itemize The prototype for @code{alloca} is in @file{stdlib.h}. This function is -a BSD extension. +a BSD extension. It requires special support from the compiler, but +most compilers (including the GNU compilers) support it. @pindex stdlib.h @deftypefun {void *} alloca (size_t @var{size}) @@ -2819,21 +2834,11 @@ The return value of @code{alloca} is the address of a block of @var{size} bytes of memory, allocated in the stack frame of the calling function. @end deftypefun -Do not use @code{alloca} inside the arguments of a function call---you -will get unpredictable results, because the stack space for the -@code{alloca} would appear on the stack in the middle of the space for -the function arguments. An example of what to avoid is @code{foo (x, -alloca (4), y)}. -@c This might get fixed in future versions of GCC, but that won't make -@c it safe with compilers generally. - @menu * Alloca Example:: Example of using @code{alloca}. * Advantages of Alloca:: Reasons to use @code{alloca}. * Disadvantages of Alloca:: Reasons to avoid @code{alloca}. -* GNU C Variable-Size Arrays:: Only in GNU C, here is an alternative - method of allocating dynamically and - freeing automatically. +* GNU C Variable-Size Arrays:: On-stack dynamic allocation in ISO C. @end menu @node Alloca Example @@ -2891,6 +2896,14 @@ blocks, space used for any size block can be reused for any other size. @code{alloca} does not cause memory fragmentation. @item +@cindex mmap +The @code{alloca} function can be safely called from a signal handler. +But signal handlers may run with little stack space available, so +it is unclear how much memory can be safely allocted with @code{alloca}. +This means that robust code may have to use @code{mmap} instead. +@xref{Memory-mapped I/O}. + +@item @cindex longjmp Nonlocal exits done with @code{longjmp} (@pxref{Non-Local Exits}) automatically free the space allocated with @code{alloca} when they exit @@ -2922,7 +2935,13 @@ freed even when an error occurs, with no special effort required. By contrast, the previous definition of @code{open2} (which uses @code{malloc} and @code{free}) would develop a memory leak if it were changed in this way. Even if you are willing to make more changes to -fix it, there is no easy way to do so. +fix it, there is no easy way to do so (except to switch to C++ and +destructors). + +Note that the @code{open2} example with @code{alloca} is incorrect if +@code{str1} and @code{str2} can be very long strings because +@code{alloca} does not fail gracefully in case too many bytes are +requested (see below). @end itemize @node Disadvantages of Alloca @@ -2936,22 +2955,38 @@ These are the disadvantages of @code{alloca} in comparison with @itemize @bullet @item If you try to allocate more memory than the machine can provide, you -don't get a clean error message. Instead you get a fatal signal like -the one you would get from an infinite recursion; probably a -segmentation violation (@pxref{Program Error Signals}). +don't get a clean error message. Instead, you end up with undefined +behavior. In many cases, the program will just crash (which can still +result in a denial-of-service vulnerability), but sometimes, it is +possible to abuse an unbounded @code{alloca} to cause other security +vulnerabilities such as information disclosure or arbitrary code +execution. @item Some @nongnusystems{} fail to support @code{alloca}, so it is less -portable. However, a slower emulation of @code{alloca} written in C -is available for use on systems with this deficiency. +portable. @end itemize +Due to lack of error checking, security-sensitive code must ensure that +no large objects are allocated with @code{alloca}. In general this +means that the size argument is checked against an arbitrary limit (say, +4096), and an error is returned if it is exceeded, or fallback to +@code{malloc} is performed. + +Extra care is required when @code{alloca} is called from within the loop +or from a function called recursively. In these cases, depending on the +loop iteration count or the depth of the recursion, smaller allocation +sizes can exhaust the stack and trigger undefined behavior. This +problem exists with callback functions as well. + +@c Node name preserved for backwards compatibility; the correct +@c terminology is ``variable length array''. @node GNU C Variable-Size Arrays -@subsubsection GNU C Variable-Size Arrays -@cindex variable-sized arrays +@subsubsection ISO C Variable Length Arrays +@cindex variable length arrays -In GNU C, you can replace most uses of @code{alloca} with an array of -variable size. Here is how @code{open2} would look then: +In ISO C, you can replace most uses of @code{alloca} with an array of +variable length. Here is how @code{open2} would look then: @smallexample int open2 (char *str1, char *str2, int flags, int mode) @@ -2962,26 +2997,40 @@ int open2 (char *str1, char *str2, int flags, int mode) @} @end smallexample +Compared to @code{malloc}, variable length arrays share the same +advantages and disadvantages as @code{alloca}. In particular, there is +no error checking (and security vulnerabilities can result from large +allocation requests), and some @nongnusystems{} do not support variable +length arrays because they only support earlier versions of ISO C which +do not include variable length arrays. + +The variable length array version of @code{open2}, as shown above, still +suffers from the same problem as the @code{alloca}-based variant: It +does not check that the strings are short enough, to avoid undefined +behavior which are the result of large allocation requests. + But @code{alloca} is not always equivalent to a variable-sized array, for several reasons: @itemize @bullet @item -A variable size array's space is freed at the end of the scope of the -name of the array. The space allocated with @code{alloca} -remains until the end of the function. +Memory returned by @code{alloca} is untyped. A variable length array +has always a specific type (even if it is an array of characters), and +using it with another type can introduce aliasing violations into the +program. @item -It is possible to use @code{alloca} within a loop, allocating an -additional block on each iteration. This is impossible with -variable-sized arrays. +A variable length array is deallocated at the end of the scope of the +name of the array. The space allocated with @code{alloca} remains until +the end of the function or the closest enclosing scope which defines any +variable length array. @end itemize -@strong{NB:} If you mix use of @code{alloca} and variable-sized arrays -within one function, exiting a scope in which a variable-sized array was -declared frees all blocks allocated with @code{alloca} during the -execution of that scope. - +The second difference is most pronounced in loops: With @code{alloca}, +the allocated object can be referenced from later iterations and after +the loop body has been exited. But a loop with a variable length array +can execute an arbitrary number of times, without exhausting the +available stack, as long as the individual arrays are short enough. @node Resizing the Data Segment @section Resizing the Data Segment diff --git a/manual/string.texi b/manual/string.texi index ac02c6d85e..d50527f585 100644 --- a/manual/string.texi +++ b/manual/string.texi @@ -626,24 +626,20 @@ The behavior of @code{wcpcpy} is undefined if the strings overlap. This macro is similar to @code{strdup} but allocates the new string using @code{alloca} instead of @code{malloc} (@pxref{Variable Size Automatic}). This means of course the returned string has the same -limitations as any block of memory allocated using @code{alloca}. +limitations as any block of memory allocated using @code{alloca}, and +@code{strdupa} can introduce security vulnerabilities due to the lack of +failure checking. -For obvious reasons @code{strdupa} is implemented only as a macro; -you cannot get the address of this function. Despite this limitation -it is a useful function. The following code shows a situation where -using @code{malloc} would be a lot more expensive. +For obvious reasons @code{strdupa} is implemented only as a macro; you +cannot get the address of this function. The following code shows an +example of its use: @smallexample @include strdupa.c.texi @end smallexample -Please note that calling @code{strtok} using @var{path} directly is -invalid. It is also not allowed to call @code{strdupa} in the argument -list of @code{strtok} since @code{strdupa} uses @code{alloca} -(@pxref{Variable Size Automatic}) can interfere with the parameter -passing. - -This function is only available if GNU CC is used. +The @code{strdupa} macro is only available with GNU-compatible +compilers. @end deftypefn @deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size}) @@ -934,16 +930,16 @@ processing text. This function is similar to @code{strndup} but like @code{strdupa} it allocates the new string using @code{alloca} @pxref{Variable Size Automatic}. The same advantages and limitations of @code{strdupa} are -valid for @code{strndupa}, too. +valid for @code{strndupa}. In particular, @code{strndupa} can introduce +security vulnerabilities due to the lack of error checking. This function is implemented only as a macro, just like @code{strdupa}. -Just as @code{strdupa} this macro also must not be used inside the -parameter list in a function call. As noted below, this function is generally a poor choice for processing text. -@code{strndupa} is only available if GNU CC is used. +The @code{strndupa} macro is only available with GNU-compatible +compilers. @end deftypefn @deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})