From patchwork Wed Dec 30 08:31:52 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paul Eggert <eggert@cs.ucla.edu>
X-Patchwork-Id: 10173
Received: (qmail 119112 invoked by alias); 30 Dec 2015 08:32:07 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>,
	<http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Delivered-To: mailing list libc-alpha@sourceware.org
Received: (qmail 119028 invoked by uid 89); 30 Dec 2015 08:32:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL, BAYES_50,
	RP_MATCHES_RCVD,
	SPF_PASS autolearn=ham version=3.3.2 spammy=SIZE,
	documentations, occasions, textual
X-HELO: zimbra.cs.ucla.edu
Subject: Re: [PATCH v7] Implement strlcpy, strlcat [BZ #178]
To: Florian Weimer <fweimer@redhat.com>,
	GNU C Library <libc-alpha@sourceware.org>
References: <5682DD7E.6000301@redhat.com>
From: Paul Eggert <eggert@cs.ucla.edu>
Message-ID: <56839678.8040304@cs.ucla.edu>
Date: Wed, 30 Dec 2015 00:31:52 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.4.0
MIME-Version: 1.0
In-Reply-To: <5682DD7E.6000301@redhat.com>

Florian Weimer wrote:

> I got conflicting advice (on-list and off-list)

It might be helpful to see a summary of the off-list advice, as it appears to 
have influenced the proposed patch without leaving much trace of what was 
discussed. Did the off-list advisors consider the points raised on libc-alpha?

> One suggestion is to make a non-terminated buffer undefined, but that breaks
> the snprintf analogy for size 0 inputs.

Sorry, what analogy is that? snprintf does not concatenate to a buffer directly. 
What is the practical use case here? How does the use case ignore the principle 
that strlcpy should null-terminate its output?

> Another option is to leave the
> array untouched (which is what I implemented and documented).

It is OK to *implement* it that way. But *documenting* this weird corner case 
would raise a can of worms. For example, it would mean strlcat's destination 
might not be a string, which would mean that the proposed documentation's talk 
about behavior being "undefined if the strings overlap" is not worded correctly. 
These problems are discussed in more detail below.

> I did not pick up
> the full criticism because glibc has repeatedly demonstrated that its
> recommended approach to string processing is rather error-prone, so we
> should not judge other approaches too harshly.

Fair enough. However, the wording in the latest proposal has the opposite 
problem: when documenting strncpy it inserts a seemingly-partisan suggestion to 
use strlcpy instead. Furthermore, its strlcpy and strlcat sections both promote 
these functions as ways to avoid buffer overruns even though strlcpy and strlcat 
are not on their face more effective at that than the other truncation 
functions. The strlcpy/strlcat documentation should be like the other truncation 
functions, and defer to the general wording about buffer overruns that is 
already in that section of the manual, rather than contain special advocacy for 
that particular approach.

Here are some other (in some cases, related) issues with the documentation 
change just proposed:

* It starts off by saying strcpy copies data that is "including a terminating 
null byte" but this is not true if SIZE is zero.

* The paragraph that specifies strlcpy semantics "If @var{size} is less than or 
equal to the the length of the string" is incorrect when SIZE is zero. The 
wording should be something more like "If @var{size} is nonzero and less than or 
equal to the the length of the string".

* It says strlcpy's behavior is undefined if the strings overlap, but if SIZE is 
zero there is no destination string. It should say the behavior is undefined if 
SIZE is nonzero and the strings overflap.

* When giving the differences between strncpy and strlcpy, it does not mention 
that strlcpy requires the source string to be null-terminated, nor that it 
computes the source string's length. Similarly when giving the differences 
between strncat and strlcat.

* The initial sentence describing strlcat assumes that the destination is a 
string, which is not necessarily the case.

* Typo: "If array @var{to}" should be "If the array @var{to}".

* The phrase "the length of the untruncated string" does not apply if the 
destination is not a string.

* On multiple occasions, the description for strlcat talks about "the length of 
the string @var{to}" without making it clear whether it is talking about the old 
length of TO, or about the new length.

* It does not mention that truncation in strlcat occurs when the result is 
greater than or equal to the size.

* It says that strlcat's behavior is undefined if the strings overlap, but the 
destination is not necessarily a string, and the description of overlap does not 
make it clear whether it's talking about the old destination string or array, 
the new destination string or array, or the modified part of the destination.

* When giving the differences between strncat and strlcat, it does not mention 
that strlcat ensures that an already-null-terminated destination stays 
null-terminated.

* The documentation does not state that it is OK for the destination pointer to 
be NULL if SIZE is zero.

* In some places the documentation says "length" (not counting the null 
terminator) when it is referring to size (counting the null terminator).


Fixing these problems would take some work and would complicate the 
documentation. Instead, I'll update my earlier patch (which leaves these weird 
corner case behavior undefined; this is considerably simpler), by removing the 
"full criticisms" that you mentioned and by adopting many of the other textual 
changes you made. This revised doc patch is shorter than before, and is 
compatible with the most-recently-proposed implementation.

As before I hope that the strlcpy/strlcat code changes do not go in, but if they 
do go in they will need proper documentation.

The first attachment is a patch relative to master; the second is a diff of 
string.texi compared to the text you just proposed.

diff --git a/manual/string.texi b/manual/string.texi
index ef83bdf..65166b8 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -904,10 +904,6 @@ non-null bytes followed by zero or more null bytes.  It needs to set
 all @var{size} bytes of the destination, even when @var{size} is much
 greater than the length of @var{from}.  As noted below, this function
 is generally a poor choice for processing text.
-
-See @code{strlcpy} below for an alternative which null-terminates the
-destination string as long as the destination buffer does not have
-length zero.
 @end deftypefun
 
 @comment wchar.h
@@ -1111,15 +1107,13 @@ This function is similar to @code{strcpy}, but copies at most
 @var{size} bytes from the string @var{from} into the destination
 array @var{to}, including a terminating null byte.
 
-If @var{size} is zero, nothing is written to @var{to}.
-
 If @var{size} is greater than the length of the string @var{from},
 this function copies all of the string @var{from} to the destination
 array @var{to}, including the terminating null byte.  Like other
 string functions such as @code{strcpy}, but unlike @code{strncpy}, any
 remaining bytes in the destination array remain unchanged.
 
-If @var{size} is less than or equal to the the length of the string
+If @var{size} is nonzero and less than or equal to the the length of the string
 @var{from}, this function copies only the first @samp{@var{size} - 1}
 bytes to the destination array @var{to}, and writes a terminating null
 byte to the last byte of the array.
@@ -1128,14 +1122,15 @@ The return value @var{result} of @code{strlcpy} is the length of the
 string @var{from}.  This means that @samp{@var{result} >= @var{size}} is
 true whenever truncation occurs.
 
-The behavior of @code{strlcpy} is undefined if the strings overlap.
+The behavior of @code{strlcpy} is undefined if @var{size} is zero, or
+if the source and destination strings overlap.
 
 As noted below, this function is generally a poor choice for processing
-text.  Using @code{strlcpy} instead of @code{strcpy} is a way to avoid
-bugs caused by writing past the end of the allocated space for @var{to}.
-Unlike @code{strncpy}, @code{strlcpy} ensures that the destination
-string is always null-terminated (unless the array size is zero), and it
-does not fill the remaining part of the array with null bytes.
+text.  Unlike @code{strncpy}, @code{strlcpy} requires @var{size} to be
+nonzero and the source string to be null-terminated, computes the
+source string's length, ensures that the destination is
+null-terminated, and does not fill the remaining part of the destination
+with null bytes.
 
 This function is derived from BSD.
 @end deftypefun
@@ -1145,38 +1140,33 @@ This function is derived from BSD.
 @deftypefun size_t strlcat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
 This function appends the string @var{from} to the
-string @var{to}, limiting the total length of the result string at
+string @var{to}, limiting the total size of the result string at
 @var{to} (including the null terminator) to @var{size}.
 
-If array @var{to} contains a null terminator among the first @var{size}
-bytes, this function copies as much as possible of the string @var{from}
-into the array at @var{to} of @var{size} bytes, starting at the
-terminating null byte of the original string @var{to}.  In effect, this
-appends the string @var{from} to the string @var{to}.  In this case, the
-resulting array at @var{to} will contain a null terminator, but the
-result string can be truncated (not all bytes in @var{from} are copied).
-
-If the array @var{to} does not contain a null terminator among the first
-@var{size} bytes, the array @var{to} is not modified, and it will not
-contain a null terminator after this function returns.  This also covers
-the case when @var{size} is zero.
+This function copies as much as possible of the string @var{from} into
+the array at @var{to} of @var{size} bytes, starting at the terminating
+null byte of the original string @var{to}.  In effect, this appends
+the string @var{from} to the string @var{to}.  Although the resulting
+string will contain a null terminator, it can be truncated (not all
+bytes in @var{from} are copied).
 
-This function returns the length of the untruncated string: the length
-of @var{from} plus the length of the string @var{to} (if there is a null
-terminator within the first @var{size} bytes of the array @var{to}), or
-the length of @var{from} plus @var{size} (if the array @var{to} of
-@var{size} bytes does not contain a null terminator).
+This function returns the sum of the original length of @var{to} and
+the length of @var{from}.  This means that truncation occurs unless
+the returned value is less than @var{size}.
 
-The behavior of @code{strlcat} is undefined if the strings overlap.
+The behavior is undefined if @var{to} does not contain a null byte in
+its first @var{size} bytes, or if the source and resulting destination
+strings overlap.
 
 As noted below, this function is generally a poor choice for processing
 text.  Also, this function has significant performance issues.
-@xref{Concatenating Strings}.  Using @code{strlcat} instead of
-@code{strcat} is a way to avoid bugs caused by writing past the end of
-the allocated space for @var{to}.  Unlike @code{strncat}, @var{size}
-specifies the maximum total length of the result string (including its
+@xref{Concatenating Strings}.  Unlike @code{strncat}, @var{size}
+specifies the maximum total size of the result string (including its
 null terminator), not the number of bytes copied from the source string
 @var{from}.
+Also, unlike @code{strncat} this function requires the source and
+destination to be null-terminated, computes the source string's
+length, and keeps the destination null-terminated.
 
 This function is derived from BSD.
 @end deftypefun