Patchwork manual: Update alloca and variable length array documentation

login
register
mail settings
Submitter Florian Weimer
Date Aug. 11, 2017, 9:53 a.m.
Message ID <63922f93-860a-249a-c14c-7efea3a3189b@redhat.com>
Download mbox | patch
Permalink /patch/22071/
State New
Headers show

Comments

Florian Weimer - Aug. 11, 2017, 9:53 a.m.
I'm resubmitting my old documentation patch (from early 2016) for
reconsideration.

I reinstated the mention of GNU compiler compatibility for strdupa and
strndupa.

Thanks,
Florian
Carlos O'Donell - Aug. 11, 2017, 12:41 p.m.
On 08/11/2017 05:53 AM, Florian Weimer wrote:
> I'm resubmitting my old documentation patch (from early 2016) for
> reconsideration.
> 
> I reinstated the mention of GNU compiler compatibility for strdupa and
> strndupa.

Can you pleas provide a reference to previous reviews and dicussions so
I can look over what was covered or rejected?

Thanks.

c.
Florian Weimer - Aug. 11, 2017, 12:52 p.m.
On 08/11/2017 02:41 PM, Carlos O'Donell wrote:
> On 08/11/2017 05:53 AM, Florian Weimer wrote:
>> I'm resubmitting my old documentation patch (from early 2016) for
>> reconsideration.
>>
>> I reinstated the mention of GNU compiler compatibility for strdupa and
>> strndupa.
> 
> Can you pleas provide a reference to previous reviews and dicussions so
> I can look over what was covered or rejected?

I think this was the old thread:

  <https://sourceware.org/ml/libc-alpha/2016-01/msg00019.html>

Florian
Carlos O'Donell - Aug. 11, 2017, 1:11 p.m.
On 08/11/2017 05:53 AM, Florian Weimer wrote:
> I'm resubmitting my old documentation patch (from early 2016) for
> reconsideration.
> 
> I reinstated the mention of GNU compiler compatibility for strdupa and
> strndupa.

Thanks for the updated reference to the previous conversation with Paul Eggert.
I read that as a reference for this review.

The changes look good to me, they are better than what we had before.

At a high level we are removing the limitation about use in parameter lists,
and adding warnings about security, all good to me.

Cheers,
Carlos.
Florian Weimer - Aug. 11, 2017, 1:28 p.m.
On 08/11/2017 03:11 PM, Carlos O'Donell wrote:
> On 08/11/2017 05:53 AM, Florian Weimer wrote:
>> I'm resubmitting my old documentation patch (from early 2016) for
>> reconsideration.
>>
>> I reinstated the mention of GNU compiler compatibility for strdupa and
>> strndupa.
> 
> Thanks for the updated reference to the previous conversation with Paul Eggert.
> I read that as a reference for this review.
> 
> The changes look good to me, they are better than what we had before.
> 
> At a high level we are removing the limitation about use in parameter lists,
> and adding warnings about security, all good to me.

What about this node name?

@c Node name preserved for backwards compatibility; the correct
@c terminology is ``variable length array''.
@node GNU C Variable-Size Arrays
@subsubsection ISO C Variable Length Arrays
@cindex variable length arrays

Should I change it, potentially invalidating hyperlinks?

The node name is quite visible even in the HTML version.

Thanks,
Florian
Paul Eggert - Aug. 11, 2017, 6:28 p.m.
Thanks, this looks good, with one quibble:
> +Compared to @code{malloc}, variable length arrays share the same
> +advantages and disadvantages as @code{alloca}.  In particular, there is
> +no error checking (and security vulnerabilities can result from large
> +allocation requests), and some @nongnusystems{} do not support variable
> +length arrays because they only support earlier versions of ISO C which
> +do not include variable length arrays.

The last 2 lines are obsolete. Although C99 required VLAs, a standard 
C11 implementation that defines __STDC_NO_VLA__ need not support VLAs. I 
suggest removing " because they only support earlier versions of ISO C 
which do not include variable length arrays".

Patch

manual: Update alloca and variable length array documentation

2017-08-11  Florian Weimer  <fweimer@redhat.com>

	* manual/memory.texi (Variable Size Automatic): Document
	interaction between alloca and variable length arrays.  Mention
	function inlining.  Remove obsolete warning about alloca in
	function parameter lists.
	(Advantages of Alloca): Note that alloca is async-signal-safe.
	Mention C++ destructors and lack of length checking in open2.
	(Disadvantages of Alloca): Clarify consequences of the lack of
	error checking.  Do no mention the non-existing alloca emulation.
	(GNU C Variable-Size Arrays): Switch terminology from GNU C
	variable-sized arrays to ISO C variable length arrays.  Mention
	security aspect and aliasing violations.  Clarify loop behavior.
	Remove NB, now part of the alloca documentation.

	* manual/string.texi (Copying Strings and Arrays): Add warning
	about alloca and length checking to strdupa.  Rephrase restriction
	to GNU CC.
	(Truncating Strings): Add warning to strndupa.  Rephrase
	restriction to GNU CC.

diff --git a/manual/memory.texi b/manual/memory.texi
index 82f473806c..a5707e1e91 100644
--- a/manual/memory.texi
+++ b/manual/memory.texi
@@ -2802,13 +2802,28 @@  The function @code{alloca} supports a kind of half-dynamic allocation in
 which blocks are allocated dynamically but freed automatically.
 
 Allocating a block with @code{alloca} is an explicit action; you can
-allocate as many blocks as you wish, and compute the size at run time.  But
-all the blocks are freed when you exit the function that @code{alloca} was
-called from, just as if they were automatic variables declared in that
-function.  There is no way to free the space explicitly.
+allocate as many blocks as you wish, and compute the size at run time.
+Memory allocated this way is freed automatically, at some point after
+the scope which contains the @code{alloca} call is left:
+
+@itemize @bullet
+@item
+@cindex variable length arrays
+If the scope calling @code{alloca} contains a variable length array, or
+is nested in such a scope, then the object allocated with @code{alloca}
+is deallocated when the closest enclosing scope which defines a variable
+length array is left.
+
+@item
+If no enclosing scope with a variable length array exist, the allocated
+object is deallocated when the function is exited, either normally or
+abnormally (for example, by throwing a C++ exception).  The life time of
+such objects is not extended by function inlining.
+@end itemize
 
 The prototype for @code{alloca} is in @file{stdlib.h}.  This function is
-a BSD extension.
+a BSD extension.  It requires special support from the compiler, but
+most compilers (including the GNU compilers) support it.
 @pindex stdlib.h
 
 @deftypefun {void *} alloca (size_t @var{size})
@@ -2819,21 +2834,11 @@  The return value of @code{alloca} is the address of a block of @var{size}
 bytes of memory, allocated in the stack frame of the calling function.
 @end deftypefun
 
-Do not use @code{alloca} inside the arguments of a function call---you
-will get unpredictable results, because the stack space for the
-@code{alloca} would appear on the stack in the middle of the space for
-the function arguments.  An example of what to avoid is @code{foo (x,
-alloca (4), y)}.
-@c This might get fixed in future versions of GCC, but that won't make
-@c it safe with compilers generally.
-
 @menu
 * Alloca Example::              Example of using @code{alloca}.
 * Advantages of Alloca::        Reasons to use @code{alloca}.
 * Disadvantages of Alloca::     Reasons to avoid @code{alloca}.
-* GNU C Variable-Size Arrays::  Only in GNU C, here is an alternative
-				 method of allocating dynamically and
-				 freeing automatically.
+* GNU C Variable-Size Arrays::  On-stack dynamic allocation in ISO C.
 @end menu
 
 @node Alloca Example
@@ -2891,6 +2896,14 @@  blocks, space used for any size block can be reused for any other size.
 @code{alloca} does not cause memory fragmentation.
 
 @item
+@cindex mmap
+The @code{alloca} function can be safely called from a signal handler.
+But signal handlers may run with little stack space available, so
+it is unclear how much memory can be safely allocted with @code{alloca}.
+This means that robust code may have to use @code{mmap} instead.
+@xref{Memory-mapped I/O}.
+
+@item
 @cindex longjmp
 Nonlocal exits done with @code{longjmp} (@pxref{Non-Local Exits})
 automatically free the space allocated with @code{alloca} when they exit
@@ -2922,7 +2935,13 @@  freed even when an error occurs, with no special effort required.
 By contrast, the previous definition of @code{open2} (which uses
 @code{malloc} and @code{free}) would develop a memory leak if it were
 changed in this way.  Even if you are willing to make more changes to
-fix it, there is no easy way to do so.
+fix it, there is no easy way to do so (except to switch to C++ and
+destructors).
+
+Note that the @code{open2} example with @code{alloca} is incorrect if
+@code{str1} and @code{str2} can be very long strings because
+@code{alloca} does not fail gracefully in case too many bytes are
+requested (see below).
 @end itemize
 
 @node Disadvantages of Alloca
@@ -2936,22 +2955,38 @@  These are the disadvantages of @code{alloca} in comparison with
 @itemize @bullet
 @item
 If you try to allocate more memory than the machine can provide, you
-don't get a clean error message.  Instead you get a fatal signal like
-the one you would get from an infinite recursion; probably a
-segmentation violation (@pxref{Program Error Signals}).
+don't get a clean error message.  Instead, you end up with undefined
+behavior.  In many cases, the program will just crash (which can still
+result in a denial-of-service vulnerability), but sometimes, it is
+possible to abuse an unbounded @code{alloca} to cause other security
+vulnerabilities such as information disclosure or arbitrary code
+execution.
 
 @item
 Some @nongnusystems{} fail to support @code{alloca}, so it is less
-portable.  However, a slower emulation of @code{alloca} written in C
-is available for use on systems with this deficiency.
+portable.
 @end itemize
 
+Due to lack of error checking, security-sensitive code must ensure that
+no large objects are allocated with @code{alloca}.  In general this
+means that the size argument is checked against an arbitrary limit (say,
+4096), and an error is returned if it is exceeded, or fallback to
+@code{malloc} is performed.
+
+Extra care is required when @code{alloca} is called from within the loop
+or from a function called recursively.  In these cases, depending on the
+loop iteration count or the depth of the recursion, smaller allocation
+sizes can exhaust the stack and trigger undefined behavior.  This
+problem exists with callback functions as well.
+
+@c Node name preserved for backwards compatibility; the correct
+@c terminology is ``variable length array''.
 @node GNU C Variable-Size Arrays
-@subsubsection GNU C Variable-Size Arrays
-@cindex variable-sized arrays
+@subsubsection ISO C Variable Length Arrays
+@cindex variable length arrays
 
-In GNU C, you can replace most uses of @code{alloca} with an array of
-variable size.  Here is how @code{open2} would look then:
+In ISO C, you can replace most uses of @code{alloca} with an array of
+variable length.  Here is how @code{open2} would look then:
 
 @smallexample
 int open2 (char *str1, char *str2, int flags, int mode)
@@ -2962,26 +2997,40 @@  int open2 (char *str1, char *str2, int flags, int mode)
 @}
 @end smallexample
 
+Compared to @code{malloc}, variable length arrays share the same
+advantages and disadvantages as @code{alloca}.  In particular, there is
+no error checking (and security vulnerabilities can result from large
+allocation requests), and some @nongnusystems{} do not support variable
+length arrays because they only support earlier versions of ISO C which
+do not include variable length arrays.
+
+The variable length array version of @code{open2}, as shown above, still
+suffers from the same problem as the @code{alloca}-based variant: It
+does not check that the strings are short enough, to avoid undefined
+behavior which are the result of large allocation requests.
+
 But @code{alloca} is not always equivalent to a variable-sized array, for
 several reasons:
 
 @itemize @bullet
 @item
-A variable size array's space is freed at the end of the scope of the
-name of the array.  The space allocated with @code{alloca}
-remains until the end of the function.
+Memory returned by @code{alloca} is untyped.  A variable length array
+has always a specific type (even if it is an array of characters), and
+using it with another type can introduce aliasing violations into the
+program.
 
 @item
-It is possible to use @code{alloca} within a loop, allocating an
-additional block on each iteration.  This is impossible with
-variable-sized arrays.
+A variable length array is deallocated at the end of the scope of the
+name of the array.  The space allocated with @code{alloca} remains until
+the end of the function or the closest enclosing scope which defines any
+variable length array.
 @end itemize
 
-@strong{NB:} If you mix use of @code{alloca} and variable-sized arrays
-within one function, exiting a scope in which a variable-sized array was
-declared frees all blocks allocated with @code{alloca} during the
-execution of that scope.
-
+The second difference is most pronounced in loops: With @code{alloca},
+the allocated object can be referenced from later iterations and after
+the loop body has been exited.  But a loop with a variable length array
+can execute an arbitrary number of times, without exhausting the
+available stack, as long as the individual arrays are short enough.
 
 @node Resizing the Data Segment
 @section Resizing the Data Segment
diff --git a/manual/string.texi b/manual/string.texi
index ac02c6d85e..d50527f585 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -626,24 +626,20 @@  The behavior of @code{wcpcpy} is undefined if the strings overlap.
 This macro is similar to @code{strdup} but allocates the new string
 using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
 Automatic}).  This means of course the returned string has the same
-limitations as any block of memory allocated using @code{alloca}.
+limitations as any block of memory allocated using @code{alloca}, and
+@code{strdupa} can introduce security vulnerabilities due to the lack of
+failure checking.
 
-For obvious reasons @code{strdupa} is implemented only as a macro;
-you cannot get the address of this function.  Despite this limitation
-it is a useful function.  The following code shows a situation where
-using @code{malloc} would be a lot more expensive.
+For obvious reasons @code{strdupa} is implemented only as a macro; you
+cannot get the address of this function.  The following code shows an
+example of its use:
 
 @smallexample
 @include strdupa.c.texi
 @end smallexample
 
-Please note that calling @code{strtok} using @var{path} directly is
-invalid.  It is also not allowed to call @code{strdupa} in the argument
-list of @code{strtok} since @code{strdupa} uses @code{alloca}
-(@pxref{Variable Size Automatic}) can interfere with the parameter
-passing.
-
-This function is only available if GNU CC is used.
+The @code{strdupa} macro is only available with GNU-compatible
+compilers.
 @end deftypefn
 
 @deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
@@ -934,16 +930,16 @@  processing text.
 This function is similar to @code{strndup} but like @code{strdupa} it
 allocates the new string using @code{alloca} @pxref{Variable Size
 Automatic}.  The same advantages and limitations of @code{strdupa} are
-valid for @code{strndupa}, too.
+valid for @code{strndupa}.  In particular, @code{strndupa} can introduce
+security vulnerabilities due to the lack of error checking.
 
 This function is implemented only as a macro, just like @code{strdupa}.
-Just as @code{strdupa} this macro also must not be used inside the
-parameter list in a function call.
 
 As noted below, this function is generally a poor choice for
 processing text.
 
-@code{strndupa} is only available if GNU CC is used.
+The @code{strndupa} macro is only available with GNU-compatible
+compilers.
 @end deftypefn
 
 @deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})