Message ID | 568B1587.4030905@cs.ucla.edu |
---|---|
State | New, archived |
Headers |
Received: (qmail 84702 invoked by alias); 5 Jan 2016 00:59:55 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 84200 invoked by uid 89); 5 Jan 2016 00:59:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=appeal, dubious, disagree, null-terminated X-HELO: zimbra.cs.ucla.edu Subject: Re: [PATCH v7] Implement strlcpy, strlcat [BZ #178] To: Florian Weimer <fweimer@redhat.com> References: <5682DD7E.6000301@redhat.com> <56839678.8040304@cs.ucla.edu> <568ADC5F.5010608@redhat.com> Cc: GNU C Library <libc-alpha@sourceware.org> From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <568B1587.4030905@cs.ucla.edu> Date: Mon, 4 Jan 2016 16:59:51 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <568ADC5F.5010608@redhat.com> Content-Type: multipart/mixed; boundary="------------090803040704080202010708" |
Commit Message
Paul Eggert
Jan. 5, 2016, 12:59 a.m. UTC
On 01/04/2016 12:55 PM, Florian Weimer wrote: > I think it should say “if the source string and destination array > overlap”. That sounds good. Any such usage is likely an application bug. > We can probably ditch the size-0 documented special case for strlcat > (where it is just extremely confusing and not very helpful), but not for > strlcpy, where it is part of the specification. The phrase "part of the specification" begs the question, no? We are discussing what should be in the glibc spec if we add strlcpy+strlcat. There is no standard spec to appeal to, as size-zero and NULL strlcpy is an area where the BSD implementations and documentation are confused and in some cases disagree, and likewise for strlcat. As any application usage of these weird corner cases for either strlcpy or strlcat likely indicates a bug, it'd be good to make it undefined in the glibc spec. Besides, it would be strange to define size-zero strlcpy while leaving size-zero strlcat undefined. They're supposed to be companion functions, typically used on the same output buffer, so why should one work while the other has undefined behavior? The same issue that makes size-zero strlcat dubious (namely, the destination is not a string) also makes size-zero strlcpy dubious. > +The behavior of @code{strlcpy} is undefined if @var{size} is nonzero and > +the source string and the first @var{size} bytes of the destination > +array overlap. The phrase "@var{size} is nonzero and" is unnecessary, since a zero-length array cannot overlap anything. The phrase should be removed, but better yet the spec should simply disallow size-zero destinations. I'm attaching a diff against the diff you sent, to highlight this remaining issue in the spec. (I prefer the shorter and simpler version. :-)
Comments
On 01/05/2016 01:59 AM, Paul Eggert wrote: > On 01/04/2016 12:55 PM, Florian Weimer wrote: > >> I think it should say “if the source string and destination array >> overlap”. > > That sounds good. Any such usage is likely an application bug. > >> We can probably ditch the size-0 documented special case for strlcat >> (where it is just extremely confusing and not very helpful), but not for >> strlcpy, where it is part of the specification. > > The phrase "part of the specification" begs the question, no? We are > discussing what should be in the glibc spec if we add strlcpy+strlcat. We need to compatible with as many of the implementation we replace as possible. This is the burden we have because we are so late to provide this function. > There is no standard spec to appeal to, as size-zero and NULL strlcpy is > an area where the BSD implementations and documentation are confused and > in some cases disagree, and likewise for strlcat. I think the size-zero buffer is quite clear from the original description of strlcpy. Things are more messy with strlcat, I will grant that. > Besides, it would be strange to define size-zero strlcpy while leaving > size-zero strlcat undefined. They're supposed to be companion functions, > typically used on the same output buffer, so why should one work while > the other has undefined behavior? The same issue that makes size-zero > strlcat dubious (namely, the destination is not a string) also makes > size-zero strlcpy dubious. I think the last part is not actually true. For strlcpy, we don't care if the destination is a string or not because we overwrite what's in the destination array. >> +The behavior of @code{strlcpy} is undefined if @var{size} is nonzero and >> +the source string and the first @var{size} bytes of the destination >> +array overlap. > > The phrase "@var{size} is nonzero and" is unnecessary, since a > zero-length array cannot overlap anything. It's okay as far as the formal description of restrict is concerned, but it's not clear with the informal “overlap”. Florian
On 01/05/2016 10:18 AM, Florian Weimer wrote: > We need to compatible with as many of the implementation we replace as > possible. Sure, but any need for bug-for-bug compatibility would not be absolute. A more-important need would be to support applications that use glibc. If glibc applications would benefit from better checking for weird corner cases that BSD man pages say should not happen, the glibc spec should allow such checking even if the BSD implementations don't happen to do such checking now, or do it sometimes but not others. > the size-zero buffer is quite clear from the original description of > strlcpy. No, the original description of strlcpy says "Note that you should include a byte for the NUL in 'size'.", which means that 'size' should be nonzero. The original description also says "The strlcpy function copies up to size - 1 characters from the NUL-terminated string src to dst, NUL-terminating the result.", which means that the result is null-terminated. See: http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/lib/libc/string/strlcpy.3?rev=1.1 The original description also contains some contrary wording elsewhere "(as long as size is larger than 0)", but the point is that the spec has *always* been confusing and contradictory in this area. Really, the only safe way for an application to follow the original spec, is to avoid calling these functions with zero sizes. Granted, the description of strlcpy has mutated and there are now versions floating around that talk about size-zero more consistently, but it is not at all clear that this was prompted by application needs or that applications depend on it or should depend on it. It looks more like a case of library implementers trying to clean up a messy spec without thinking things through carefully enough. > For strlcpy, we don't care if the destination is a string or not > because we overwrite what's in the destination array. No, applications that call strlcpy expect it to create a string. For example, the design intent is for applications to use strlcpy followed by strlcat, and because strlcat (S, ...) assumes that S is a string, strlcpy must create a string in S; otherwise the two functions would not be the companion functions that they're clearly supposed to be. >>> +The behavior of @code{strlcpy} is undefined if @var{size} is nonzero and >>> +the source string and the first @var{size} bytes of the destination >>> +array overlap. >> The phrase "@var{size} is nonzero and" is unnecessary, since a >> zero-length array cannot overlap anything. > It's okay as far as the formal description of restrict is concerned, but > it's not clear with the informal “overlap”. > "overlap" has a standard usage in C11 (one that agrees with the formal description of 'restrict'). The use of the phrase "@var{size} is nonzero and" implies that our use of "overlap" has a different meaning. I suppose we could fix the problem by mentioning somehow that the wording's nonzero restriction is not intended to imply that size-zero objects can overlap anything. However, it'd be better to simply disallow size-zero strlcpy, as in my most-recent proposal.
On 01/05/2016 11:08 PM, Paul Eggert wrote: > The original description also contains some contrary wording elsewhere > "(as long as size is larger than 0)", but the point is that the spec has > *always* been confusing and contradictory in this area. Really, the only > safe way for an application to follow the original spec, is to avoid > calling these functions with zero sizes. > > Granted, the description of strlcpy has mutated and there are now > versions floating around that talk about size-zero more consistently, > but it is not at all clear that this was prompted by application needs > or that applications depend on it or should depend on it. It looks more > like a case of library implementers trying to clean up a messy spec > without thinking things through carefully enough. Hmm. So maybe it is better to leave this undefined after all. >> For strlcpy, we don't care if the destination is a string or not >> because we overwrite what's in the destination array. > > No, applications that call strlcpy expect it to create a string. For > example, the design intent is for applications to use strlcpy followed > by strlcat, and because strlcat (S, ...) assumes that S is a string, > strlcpy must create a string in S; otherwise the two functions would not > be the companion functions that they're clearly supposed to be. I see the value as a post-condition, but I felt that your wording was using the post-condition as a way to specify a pre-condition an application has to adhere to, which I think is misleading. >>>> +The behavior of @code{strlcpy} is undefined if @var{size} is >>>> nonzero and >>>> +the source string and the first @var{size} bytes of the destination >>>> +array overlap. >>> The phrase "@var{size} is nonzero and" is unnecessary, since a >>> zero-length array cannot overlap anything. >> It's okay as far as the formal description of restrict is concerned, but >> it's not clear with the informal “overlap”. > "overlap" has a standard usage in C11 (one that agrees with the formal > description of 'restrict'). The use of the phrase "@var{size} is nonzero > and" implies that our use of "overlap" has a different meaning. C11 doesn't have zero-sized objects, and I can't find a clear definition of “overlap” either. That's why I added it. I'm not really wedded to it, I just thought it would make things clearer. Fortunately, we don't have to resolve this dispute if a zero size is undefined. Florian
Florian Weimer wrote: > I see the value as a post-condition, but I felt that your wording was > using the post-condition as a way to specify a pre-condition an > application has to adhere to, which I think is misleading. Perhaps we can think of a way to reword it that is not so misleading. The postcondition that the output is always null-terminated, together with the constraint that the output fits within the size, means that the size is nonzero. Perhaps add "Requiring @var{size} to be nonzero guarantees space to null-terminate the destination."?
On 01/06/2016 02:28 AM, Paul Eggert wrote: > Florian Weimer wrote: >> I see the value as a post-condition, but I felt that your wording was >> using the post-condition as a way to specify a pre-condition an >> application has to adhere to, which I think is misleading. > > Perhaps we can think of a way to reword it that is not so misleading. > The postcondition that the output is always null-terminated, together > with the constraint that the output fits within the size, means that the > size is nonzero. Perhaps add "Requiring @var{size} to be nonzero > guarantees space to null-terminate the destination."? I think you wanted size 0 to be undefined in the documentation? If we do that, we can simplify the description. (It is meaningless to discuss behavior of a function which is called when its preconditions does not hold. If it appears to make sense to add something like this to a specification, it means that the preconditions have not been described correctly.) Florian
On 2016-01-06 01:08, Paul Eggert wrote: >> For strlcpy, we don't care if the destination is a string or not >> because we overwrite what's in the destination array. > > No, applications that call strlcpy expect it to create a string. For > example, the design intent is for applications to use strlcpy followed > by strlcat, and because strlcat (S, ...) assumes that S is a string, > strlcpy must create a string in S; otherwise the two functions would not > be the companion functions that they're clearly supposed to be. Perhaps this is not the only way to use strlcpy? It seems the case of strlcpy with size=0 is special. Let's consider a couple of use cases. 1. strlcpy followed by strlcat. This is what you are talking about, and you want to keep the freedom to redefine these function, e.g., to crash when size is 0. But this use case implies the same size in all calls to these function. Hence you can catch size=0 with strlcat even if strlcpy accepts size=0 without triggering any protections. 2. strlcpy with a retry. If you try to copy a string into a buffer and there is not enough memory you can allocate more memory and retry copying. Something like this: void strlcpy_with_growing(char **dest, const char *src, size_t *size) { size_t len = strlcpy(*dest, src, *size); if (len >= *size) { free(*dest); *dest = xmalloc(len); *size = len; strlcpy(*dest, src, len); } } Then it's handy to permit to start with a null pointer and a zero size. Think of realloc which accepts a null pointer as input. strlcpy with a retry is also one the intended uses of strlcpy: "Thus, if truncation has occurred, the number of bytes needed to store the entire string is now known and the programmer may allocate more space and re-copy the strings if he or she wishes."[1] [1] https://www.sudo.ws/todd/papers/strlcpy.html Unrelatedly, it turned out useful to look into what valgrind does for strlcpy. Apparently, Solaris actively uses strlcpy with size=0. From [2]: /* This is just a fancy way how to write strlen(src). Undocumented but heavily used. */ copied = strlcpy(NULL, src, 0); See also [3]. [2] https://sources.debian.net/src/valgrind/1:3.11.0-1/memcheck/tests/solaris/strlcpy.c/?hl=34:36#L34 [3] https://sources.debian.net/src/valgrind/1:3.11.0-1/shared/vg_replace_strmem.c/#L563 I'm not arguing for any specific way to define strlcpy, just feeling that the current discussion is not sufficiently informed.
It remains my opinion that if strl{cpy,cat} are to appear at all in glibc they should be bug-for-bug compatible with the original OpenBSD *implementation* (not the ambiguously worded manpage), or else they will fail to achieve the desired effect -- applications will continue to use their own implementations.
On 2016-01-08 22:30, Zack Weinberg wrote: > It remains my opinion that if strl{cpy,cat} are to appear at all in > glibc they should be bug-for-bug compatible with the original OpenBSD > *implementation* (not the ambiguously worded manpage), OpenBSD strlcat has undefined behavior when dst=NULL and dsize=0. Should glibc implementation have it too? :-) Another question: OpenBSD strlcpy/strlcat copy memory in a specific order with an explicit loop, AFAIK using memcmp doesn't guarantee the same, should memcmp be ditched here? Actually, everything is not that bad. Discussion is mostly about documentation so it should not affect what is used. But perhaps it's worth stating it explicitly. There are two separate questions: 1) what should ordinary glibc implementation of strlcpy/strlcat do; 2) which cases should be documented as supported so that all other cases could be catched by a hardened implementation. Paraphrasing the famous principle: be conservative in what you document, be liberal in what you implement. So, I guess it's better to keep size==0 check in strlcat but document this case as invalid. > or else they > will fail to achieve the desired effect -- applications will continue > to use their own implementations. Or worse: they will use glibc implementation and fail in unpredictable ways.
Alexander Cherepanov wrote: > it's handy No, because even if that example's silent-truncation bug were fixed (which is yet another bit of evidence against strlcpy!), its use case is already well-addressed by the standard API, and portable programs can (and already do) use something like the following instead. void cpy_with_growing (char **dest, char const *src, size_t *size) { size_t s = strlen (src) + 1; if (*size <= s) { free (*dest); size_t doubled = 2 * *size; *size = s <= doubled ? doubled : s; *dest = xmalloc (*size); } memcpy (*dest, src, s); } > you can catch size=0 with strlcat even if strlcpy accepts size=0 without triggering any protections. No, because often only strlcpy is executed, with strlcat used only in relatively unusual circumstances. There are sound software-engineering reasons for insisting that strlcpy be no less reliable and strict than strlcat, and for the two functions to follow the same rules. > be conservative in what you document, be liberal in what you implement. Yes, that's a goal of my most recently-proposed documentation. Again, I don't think we should add strlcpy+strlcat to glibc; but if we do add them, we should document them conservatively.
On 2016-01-09 01:31, Paul Eggert wrote: > Alexander Cherepanov wrote: > >> it's handy > > No, because even if that example's silent-truncation bug were fixed > (which is yet another bit of evidence against strlcpy!), Hm, strlcpy doesn't really differ from an ordinary strlen in this regard, so I'm not sure to which degree this is evidence against strlcpy and to which -- against C strings. Perhaps the problem is that all official examples of strlcpy are of the form "len = strlcpy(...); if (len >= size) ..." (that is, no "+ 1" here) while for strlen it's quite common to write "strlen(...) + 1" (that is, adding one right away). OTOH such a bug with strlen will lead to buffer overflow instead of truncation. Not really better:-) > its use case is > already well-addressed by the standard API, and portable programs can > (and already do) use something like the following instead. This could be an argument against addition of strlcpy in general but specifically for size=0 it seems to confirm importance of the case:-) >> you can catch size=0 with strlcat even if strlcpy accepts size=0 >> without triggering any protections. > > No, because often only strlcpy is executed, with strlcat used only in > relatively unusual circumstances. Ok, I misunderstood you. > There are sound software-engineering > reasons for insisting that strlcpy be no less reliable and strict than > strlcat, Given that strlcat 'dest' argument is often prepared by strlcpy I can see why strlcpy should be not less reliable than strlcat -- there is just no reason for strlcat to be more robust than strclpy. But why strlcpy should be no less strict than strlcat? > and for the two functions to follow the same rules. The functions are different and it's not entirely clear what you mean exactly. One possible guiding principle: strlcat should be able to accept anything after strlcpy. This would mean that it should accept size=0 if strlcpy does it but not other non-null-terminated dest arrays. But taken in isolation strlcat for non-null-terminated dest arrays (including the case of size=0) is somewhat different from strlcpy for size=0: the result of strlcpy with size=0 is the amount of memory you have to allocate for a successful retry, the result of strlcat with a non-null-terminated dest array is mostly useless because the length of dest is still unknown. OTOH strlcat is sometimes used after snprintf (~40 cases in OpenBSD sources). Does it mean that strlcat should be as reliable and strict as snprintf? >> be conservative in what you document, be liberal in what you implement. > > Yes, that's a goal of my most recently-proposed documentation. Again, I > don't think we should add strlcpy+strlcat to glibc; but if we do add > them, we should document them conservatively. I guess this implies that it should be documented in the sources which guarantees the implementation provides in addition to those in the user-facing manual so that it's not broken later by someone looking only at the manual.
diff --git a/manual/string.texi b/manual/string.texi index f0fe7d3..3d2fff3 100644 --- a/manual/string.texi +++ b/manual/string.texi @@ -1118,14 +1118,11 @@ If @var{size} is nonzero and less than or equal to the the length of the string bytes to the destination array @var{to}, and writes a terminating null byte to the last byte of the array. -If @var{size} is zero, @code{strlcpy} does not modify the destination -array, and @var{to} can be a null pointer. - The return value @var{result} of @code{strlcpy} is the length of the string @var{from}. This means that @samp{@var{result} >= @var{size}} is true whenever truncation occurs. -The behavior of @code{strlcpy} is undefined if @var{size} is nonzero and +The behavior of @code{strlcpy} is undefined if @var{size} is zero, or if the source string and the first @var{size} bytes of the destination array overlap. @@ -1133,8 +1130,8 @@ As noted below, this function is generally a poor choice for processing text. Unlike @code{strncpy}, @code{strlcpy} requires @var{size} to be nonzero and the source string to be null-terminated, computes the source string's length, ensures that the destination is -null-terminated (assuming that @var{size} is nonzero), and does not -fill the remaining part of the destination with null bytes. +null-terminated, and does not fill the remaining part of the destination +with null bytes. This function is derived from BSD. @end deftypefun