From patchwork Thu Dec 3 17:33:05 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Paul Eggert X-Patchwork-Id: 9870 Received: (qmail 97131 invoked by alias); 3 Dec 2015 17:33:29 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 97114 invoked by uid 89); 3 Dec 2015 17:33:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.5 required=5.0 tests=AWL, BAYES_50, LIKELY_SPAM_BODY, SPF_PASS, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: zimbra.cs.ucla.edu From: Paul Eggert Subject: Re: [PATCH] Implement strlcat [BZ#178] To: Florian Weimer References: <56547472.3010302@redhat.com> <5654B1FE.5020100@cs.ucla.edu> <5654B796.7070302@redhat.com> <5656E018.5020608@cs.ucla.edu> <565F211A.2030909@redhat.com> Cc: libc-alpha@sourceware.org Message-ID: <56607CD1.3050209@cs.ucla.edu> Date: Thu, 3 Dec 2015 09:33:05 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <565F211A.2030909@redhat.com> Thanks for the review. Revised patchset attached. At this point I'm inclined to install the first two patches (which don't change semantics) to make it easier to review and maintain the third one (which adds strlcpy+strlcat), but I'll hold off a bit longer on that to get more feedback. Replying to your comments: > I don't understand the “and as for strings usually pointers … are used” > part. Changed to "A wide-string variable is usually declared to be a pointer of type @code{wchar_t *}, by analogy with string variables and @code{char *}." Hope this makes it clear. > Standard uses “wide string” by the way, not “wide character string”. C11 says "wide string", POSIX says "wide-character string". True, the more-concise form is better here, so I've changed it to that. Similarly, I changed "multibyte character string" (POSIX wording) to "multibyte string" (C Standard wording). I wish the standards could standardize the wording... > Please also use “integer constant @code{0}”. Only the integer constant > zero represents a null pointer, an integer variable which stores the > value zero does not. Done. > +literal. Strings can also be formed by @dfn{string concatenation}: > +@code{"a" "b"} is the > > The original had “string literal”. As this only works for string > literals, it's best to keep it. The original was incorrect; it said "string literals can also be formed by @dfn{string concatenation}" but string literals are the input to string concatenation, not the output from it. I changed the wording to "String literals can also contribute to @dfn{string concatenation}:...". > blocks of memory, and functions that are specific to null-terminated > -arrays of characters and wide characters. > +strings and wide strings. > > Should be “specific to strings and wide character strings” (the > “null-terminated” is redundant, OK, though this will be "wide strings" as per the above-described changed. > and “wide strings” has not been defined. > > Most of the remaining “null-terminated” occurrences in string.texi > should be removed, for consistency and clarity. Sure, done. > @@ -309,7 +318,8 @@ returns @var{maxlen}. Therefore this function is > equivalent to > @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})} > but it > is more efficient and works even if the string @var{s} is not > -null-terminated. > +null-terminated so long as @var{maxlen} does not exceed the > +size of @var{s}'s array. > > This doesn't make much sense anymore because strings are defined to be > always null-terminated. Reworded to "If the array @var{s} of size @var{maxlen} contains a null byte, the @code{strnlen} function returns the length of the string @var{s} in bytes. Otherwise it returns @var{maxlen}. Therefore this function is equivalent to @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})} but it is more efficient and works even if @var{s} is not null-terminated so long as @var{maxlen} does not exceed the size of @var{s}'s array." > >> PS. This time around I noticed that in some cases strlcat is required >> to not null-terminate its output, even when there's room for a null byte >> in the destination buffer. Eeeuuuw. Do we really want to document that >> particular misfeature? Does user code really require it? > It's part of the specification, as far as I can tell. Yes and no. The FreeBSD man page says that "for strlcat() both src and dst must be NUL-terminated"; see . Admittedly the FreeBSD spec is confused here, as the second part of its RETURN VALUES section describes what happens when the destination is not null-terminated! But that section also says "this should not happen", and obviously user code should not depend on behavior that "should not happen" and in practice I expect user code follows this, so let's go with the slightly-tighter spec. Come to think of it, this is related to the confusion between one of the main goals of strlcpy (namely, the result is always null-terminated), and the weird special case where the destination size is zero (where strlcpy cannot null-terminate the destination). In practice user code does not and should not depend on this weird special case. We can fix this confusion by making strlcpy have undefined behavior if the destination size is zero. This simplifies the spec, and gives us an opportunity to add one more runtime sanity check that the destination size is nonzero in our debugging implementation, if we want to do that. I've added this idea to the third patch in the attached patchset (which changes only the documentation). From 69684f3044af8b1554696054e2f8d6f20bab79c2 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 3 Dec 2015 09:14:57 -0800 Subject: [PATCH 3/3] Add strlcpy, strlcat [BZ #178] This patch was derived from text by Florian Weimer in: https://sourceware.org/ml/libc-alpha/2015-11/msg00558.html * manual/string.texi (Truncating Strings): New functions from BSD. --- ChangeLog | 6 ++++ manual/string.texi | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 88 insertions(+), 1 deletion(-) diff --git a/ChangeLog b/ChangeLog index d02f7ad..fc2f093 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,11 @@ 2015-12-03 Paul Eggert + Add strlcpy, strlcat + [BZ #178] + This patch was derived from text by Florian Weimer in: + https://sourceware.org/ml/libc-alpha/2015-11/msg00558.html + * manual/string.texi (Truncating Strings): New functions from BSD. + Split large string section; add truncation advice * manual/string.texi (Copying Strings and Arrays): Split into three sections Copying Strings and Arrays, Concatenating Strings, diff --git a/manual/string.texi b/manual/string.texi index eec17f7..d32e956 100644 --- a/manual/string.texi +++ b/manual/string.texi @@ -745,7 +745,7 @@ As noted below, this function has significant performance issues. @end deftypefun Programmers using the @code{strcat} or @code{wcscat} function (or the -@code{strncat} or @code{wcsncat} functions defined in +@code{strlcat}, @code{strncat}, or @code{wcsncat} functions defined in a later section, for that matter) can easily be recognized as lazy and reckless. In almost all situations the lengths of the participating strings are known (it better should be @@ -906,6 +906,48 @@ greater than the length of @var{from}. As noted below, this function is generally a poor choice for processing text. @end deftypefun +@comment string.h +@comment BSD +@deftypefun size_t strlcpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function is similar to @code{strcpy}, but copies at most +@var{size} bytes from the string @var{from} into the destination +buffer @var{to}, including a terminating null byte if @var{size} is +nonzero. + +If @var{size} is greater than the length of @var{from}, @code{strlcpy} +copies all of the string @var{from} to the destination buffer +@var{to}, including the terminating null byte. Like other string +functions such as @code{strcpy}, but unlike @code{strncpy}, any +remaining bytes in the destination buffer remain unchanged. + +If @var{size} is nonzero and is not greater than the length of the +string @var{from}, @code{strlcpy} copies only the first +@samp{@var{size} - 1} bytes to the destination buffer @var{to}, and +writes a terminating null byte to the last byte in the buffer. + +This function returns the length of @var{from}. This means that +truncation occurs whenever the returned value is not less than +@var{size}. + +The behavior of @code{strlcpy} is undefined if @var{size} is zero, if +the strings overlap, or if the source or destination are null +pointers. + +Unlike @code{strncpy}, @code{strcpy} always null-terminates the +destination buffer, does not zero-fill the destination buffer, +requires @var{size} to be nonzero, requires @var{from} to be a +null-terminated string, and always computes @var{from}'s length even +when this length is greater than @var{size}. + +This function was designed as a stopgap for quickly retrofitting +possibly-unsafe uses of @code{strcpy} on platforms lacking +buffer-overrun protection. As noted below, this function is generally +a poor choice for processing text. + +@code{strlcpy} is derived from BSD. +@end deftypefun + @comment wchar.h @comment ISO @deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) @@ -1064,6 +1106,45 @@ choice for processing text. Also, this function has significant performance issues. @xref{Concatenating Strings}. @end deftypefun +@comment string.h +@comment BSD +@deftypefun size_t strlcat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) +@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +This function is similar to @code{strcat}, except that it truncates +the result to at most @var{size} bytes (including the terminating null +byte) when appending the string @var{from} to the string @var{to}. + +The @code{strlcat} function appends a prefix of the string @var{from} +to the string @var{to}. The prefix contains all of @var{from} if that +can be appended to @var{to} without requiring more than @var{size} +bytes total. Otherwise, it contains only as many leading bytes of +@var{from} as will fit. The resulting string in @var{to} is always +null-terminated, and any excess trailing bytes of @var{from} are not +copied. + +@code{strlcat} returns the sum of the original length of @var{to} and +the length of @var{from}. This means that truncation occurs whenever +the returned value is not less than @var{size}. + +The behavior of @code{strlcat} is undefined if @var{from} does not +contain a null byte in its first @var{size} bytes, if the strings +overlap, or if the source or destination are null pointers. + +Unlike @code{strncat}, @code{strlcat} requires @var{to} and @var{from} +to be null-terminated, ensures that @var{to} stays null-terminated, +and always computes @var{from}'s length even when this length is +greater than that of the appended string. + +As a companion to @code{strlcpy}, @code{strlcat} was designed as a +stopgap for quickly retrofitting possibly-unsafe uses of @code{strcat} +on platforms lacking buffer-overrun protection. As noted below, this +function is generally a poor choice for processing text. Also, this +function has significant performance issues. @xref{Concatenating +Strings}. + +@code{strlcat} is derived from BSD. +@end deftypefun + @comment wchar.h @comment ISO @deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) -- 2.1.0