From patchwork Wed Dec 30 08:31:52 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Eggert X-Patchwork-Id: 10173 Received: (qmail 119112 invoked by alias); 30 Dec 2015 08:32:07 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 119028 invoked by uid 89); 30 Dec 2015 08:32:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL, BAYES_50, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=SIZE, documentations, occasions, textual X-HELO: zimbra.cs.ucla.edu Subject: Re: [PATCH v7] Implement strlcpy, strlcat [BZ #178] To: Florian Weimer , GNU C Library References: <5682DD7E.6000301@redhat.com> From: Paul Eggert Message-ID: <56839678.8040304@cs.ucla.edu> Date: Wed, 30 Dec 2015 00:31:52 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <5682DD7E.6000301@redhat.com> Florian Weimer wrote: > I got conflicting advice (on-list and off-list) It might be helpful to see a summary of the off-list advice, as it appears to have influenced the proposed patch without leaving much trace of what was discussed. Did the off-list advisors consider the points raised on libc-alpha? > One suggestion is to make a non-terminated buffer undefined, but that breaks > the snprintf analogy for size 0 inputs. Sorry, what analogy is that? snprintf does not concatenate to a buffer directly. What is the practical use case here? How does the use case ignore the principle that strlcpy should null-terminate its output? > Another option is to leave the > array untouched (which is what I implemented and documented). It is OK to *implement* it that way. But *documenting* this weird corner case would raise a can of worms. For example, it would mean strlcat's destination might not be a string, which would mean that the proposed documentation's talk about behavior being "undefined if the strings overlap" is not worded correctly. These problems are discussed in more detail below. > I did not pick up > the full criticism because glibc has repeatedly demonstrated that its > recommended approach to string processing is rather error-prone, so we > should not judge other approaches too harshly. Fair enough. However, the wording in the latest proposal has the opposite problem: when documenting strncpy it inserts a seemingly-partisan suggestion to use strlcpy instead. Furthermore, its strlcpy and strlcat sections both promote these functions as ways to avoid buffer overruns even though strlcpy and strlcat are not on their face more effective at that than the other truncation functions. The strlcpy/strlcat documentation should be like the other truncation functions, and defer to the general wording about buffer overruns that is already in that section of the manual, rather than contain special advocacy for that particular approach. Here are some other (in some cases, related) issues with the documentation change just proposed: * It starts off by saying strcpy copies data that is "including a terminating null byte" but this is not true if SIZE is zero. * The paragraph that specifies strlcpy semantics "If @var{size} is less than or equal to the the length of the string" is incorrect when SIZE is zero. The wording should be something more like "If @var{size} is nonzero and less than or equal to the the length of the string". * It says strlcpy's behavior is undefined if the strings overlap, but if SIZE is zero there is no destination string. It should say the behavior is undefined if SIZE is nonzero and the strings overflap. * When giving the differences between strncpy and strlcpy, it does not mention that strlcpy requires the source string to be null-terminated, nor that it computes the source string's length. Similarly when giving the differences between strncat and strlcat. * The initial sentence describing strlcat assumes that the destination is a string, which is not necessarily the case. * Typo: "If array @var{to}" should be "If the array @var{to}". * The phrase "the length of the untruncated string" does not apply if the destination is not a string. * On multiple occasions, the description for strlcat talks about "the length of the string @var{to}" without making it clear whether it is talking about the old length of TO, or about the new length. * It does not mention that truncation in strlcat occurs when the result is greater than or equal to the size. * It says that strlcat's behavior is undefined if the strings overlap, but the destination is not necessarily a string, and the description of overlap does not make it clear whether it's talking about the old destination string or array, the new destination string or array, or the modified part of the destination. * When giving the differences between strncat and strlcat, it does not mention that strlcat ensures that an already-null-terminated destination stays null-terminated. * The documentation does not state that it is OK for the destination pointer to be NULL if SIZE is zero. * In some places the documentation says "length" (not counting the null terminator) when it is referring to size (counting the null terminator). Fixing these problems would take some work and would complicate the documentation. Instead, I'll update my earlier patch (which leaves these weird corner case behavior undefined; this is considerably simpler), by removing the "full criticisms" that you mentioned and by adopting many of the other textual changes you made. This revised doc patch is shorter than before, and is compatible with the most-recently-proposed implementation. As before I hope that the strlcpy/strlcat code changes do not go in, but if they do go in they will need proper documentation. The first attachment is a patch relative to master; the second is a diff of string.texi compared to the text you just proposed. diff --git a/manual/string.texi b/manual/string.texi index ef83bdf..65166b8 100644 --- a/manual/string.texi +++ b/manual/string.texi @@ -904,10 +904,6 @@ non-null bytes followed by zero or more null bytes. It needs to set all @var{size} bytes of the destination, even when @var{size} is much greater than the length of @var{from}. As noted below, this function is generally a poor choice for processing text. - -See @code{strlcpy} below for an alternative which null-terminates the -destination string as long as the destination buffer does not have -length zero. @end deftypefun @comment wchar.h @@ -1111,15 +1107,13 @@ This function is similar to @code{strcpy}, but copies at most @var{size} bytes from the string @var{from} into the destination array @var{to}, including a terminating null byte. -If @var{size} is zero, nothing is written to @var{to}. - If @var{size} is greater than the length of the string @var{from}, this function copies all of the string @var{from} to the destination array @var{to}, including the terminating null byte. Like other string functions such as @code{strcpy}, but unlike @code{strncpy}, any remaining bytes in the destination array remain unchanged. -If @var{size} is less than or equal to the the length of the string +If @var{size} is nonzero and less than or equal to the the length of the string @var{from}, this function copies only the first @samp{@var{size} - 1} bytes to the destination array @var{to}, and writes a terminating null byte to the last byte of the array. @@ -1128,14 +1122,15 @@ The return value @var{result} of @code{strlcpy} is the length of the string @var{from}. This means that @samp{@var{result} >= @var{size}} is true whenever truncation occurs. -The behavior of @code{strlcpy} is undefined if the strings overlap. +The behavior of @code{strlcpy} is undefined if @var{size} is zero, or +if the source and destination strings overlap. As noted below, this function is generally a poor choice for processing -text. Using @code{strlcpy} instead of @code{strcpy} is a way to avoid -bugs caused by writing past the end of the allocated space for @var{to}. -Unlike @code{strncpy}, @code{strlcpy} ensures that the destination -string is always null-terminated (unless the array size is zero), and it -does not fill the remaining part of the array with null bytes. +text. Unlike @code{strncpy}, @code{strlcpy} requires @var{size} to be +nonzero and the source string to be null-terminated, computes the +source string's length, ensures that the destination is +null-terminated, and does not fill the remaining part of the destination +with null bytes. This function is derived from BSD. @end deftypefun @@ -1145,38 +1140,33 @@ This function is derived from BSD. @deftypefun size_t strlcat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} This function appends the string @var{from} to the -string @var{to}, limiting the total length of the result string at +string @var{to}, limiting the total size of the result string at @var{to} (including the null terminator) to @var{size}. -If array @var{to} contains a null terminator among the first @var{size} -bytes, this function copies as much as possible of the string @var{from} -into the array at @var{to} of @var{size} bytes, starting at the -terminating null byte of the original string @var{to}. In effect, this -appends the string @var{from} to the string @var{to}. In this case, the -resulting array at @var{to} will contain a null terminator, but the -result string can be truncated (not all bytes in @var{from} are copied). - -If the array @var{to} does not contain a null terminator among the first -@var{size} bytes, the array @var{to} is not modified, and it will not -contain a null terminator after this function returns. This also covers -the case when @var{size} is zero. +This function copies as much as possible of the string @var{from} into +the array at @var{to} of @var{size} bytes, starting at the terminating +null byte of the original string @var{to}. In effect, this appends +the string @var{from} to the string @var{to}. Although the resulting +string will contain a null terminator, it can be truncated (not all +bytes in @var{from} are copied). -This function returns the length of the untruncated string: the length -of @var{from} plus the length of the string @var{to} (if there is a null -terminator within the first @var{size} bytes of the array @var{to}), or -the length of @var{from} plus @var{size} (if the array @var{to} of -@var{size} bytes does not contain a null terminator). +This function returns the sum of the original length of @var{to} and +the length of @var{from}. This means that truncation occurs unless +the returned value is less than @var{size}. -The behavior of @code{strlcat} is undefined if the strings overlap. +The behavior is undefined if @var{to} does not contain a null byte in +its first @var{size} bytes, or if the source and resulting destination +strings overlap. As noted below, this function is generally a poor choice for processing text. Also, this function has significant performance issues. -@xref{Concatenating Strings}. Using @code{strlcat} instead of -@code{strcat} is a way to avoid bugs caused by writing past the end of -the allocated space for @var{to}. Unlike @code{strncat}, @var{size} -specifies the maximum total length of the result string (including its +@xref{Concatenating Strings}. Unlike @code{strncat}, @var{size} +specifies the maximum total size of the result string (including its null terminator), not the number of bytes copied from the source string @var{from}. +Also, unlike @code{strncat} this function requires the source and +destination to be null-terminated, computes the source string's +length, and keeps the destination null-terminated. This function is derived from BSD. @end deftypefun