Patchwork Implement strlcat [BZ#178]

login
register
mail settings
Submitter Paul Eggert
Date Dec. 3, 2015, 5:33 p.m.
Message ID <56607CD1.3050209@cs.ucla.edu>
Download mbox | patch
Permalink /patch/9870/
State New
Headers show

Comments

Paul Eggert - Dec. 3, 2015, 5:33 p.m.
Thanks for the review.  Revised patchset attached.  At this point I'm 
inclined to install the first two patches (which don't change semantics) 
to make it easier to review and maintain the third one (which adds 
strlcpy+strlcat), but I'll hold off a bit longer on that to get more 
feedback.

Replying to your comments:
> I don't understand the “and as for strings usually pointers … are used”
> part.

Changed to "A wide-string variable is usually declared to be a pointer 
of type @code{wchar_t *}, by analogy with string variables and 
@code{char *}." Hope this makes it clear.

> Standard uses “wide string” by the way, not “wide character string”.

C11 says "wide string", POSIX says "wide-character string". True, the 
more-concise form is better here, so I've changed it to that. Similarly, 
I changed "multibyte character string" (POSIX wording) to "multibyte 
string" (C Standard wording). I wish the standards could standardize the 
wording...

> Please also use “integer constant @code{0}”.  Only the integer constant
> zero represents a null pointer, an integer variable which stores the
> value zero does not.

Done.

> +literal.  Strings can also be formed by @dfn{string concatenation}:
> +@code{"a" "b"} is the
>
> The original had “string literal”.  As this only works for string
> literals, it's best to keep it.

The original was incorrect; it said "string literals can also be formed 
by @dfn{string concatenation}" but string literals are the input to 
string concatenation, not the output from it. I changed the wording to 
"String literals can also contribute to @dfn{string concatenation}:...".

>   blocks of memory, and functions that are specific to null-terminated
> -arrays of characters and wide characters.
> +strings and wide strings.
>
> Should be “specific to strings and wide character strings” (the
> “null-terminated” is redundant,
OK, though this will be "wide strings" as per the above-described changed.
> and “wide strings” has not been defined.
>
> Most of the remaining “null-terminated” occurrences in string.texi
> should be removed, for consistency and clarity.

Sure, done.

> @@ -309,7 +318,8 @@ returns @var{maxlen}.  Therefore this function is
> equivalent to
>   @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})}
>   but it
>   is more efficient and works even if the string @var{s} is not
> -null-terminated.
> +null-terminated so long as @var{maxlen} does not exceed the
> +size of @var{s}'s array.
>
> This doesn't make much sense anymore because strings are defined to be
> always null-terminated.

Reworded to "If the array @var{s} of size @var{maxlen} contains a null 
byte, the @code{strnlen} function returns the length of the string 
@var{s} in bytes.  Otherwise it returns @var{maxlen}. Therefore this 
function is equivalent to @code{(strlen (@var{s}) < @var{maxlen} ? 
strlen (@var{s}) : @var{maxlen})} but it is more efficient and works 
even if @var{s} is not null-terminated so long as @var{maxlen} does not 
exceed the size of @var{s}'s array."


>
>> PS.  This time around I noticed that in some cases strlcat is required
>> to not null-terminate its output, even when there's room for a null byte
>> in the destination buffer.  Eeeuuuw.  Do we really want to document that
>> particular misfeature?  Does user code really require it?
> It's part of the specification, as far as I can tell.

Yes and no. The FreeBSD man page says that "for strlcat() both src and 
dst must be NUL-terminated"; see 
<https://www.freebsd.org/cgi/man.cgi?query=strlcpy&sektion=3>. 
Admittedly the FreeBSD spec is confused here, as the second part of its 
RETURN VALUES section describes what happens when the destination is not 
null-terminated! But that section also says "this should not happen", 
and obviously user code should not depend on behavior that "should not 
happen" and in practice I expect user code follows this, so let's go 
with the slightly-tighter spec.

Come to think of it, this is related to the confusion between one of the 
main goals of strlcpy (namely, the result is always null-terminated), 
and the weird special case where the destination size is zero (where 
strlcpy cannot null-terminate the destination). In practice user code 
does not and should not depend on this weird special case. We can fix 
this confusion by making strlcpy have undefined behavior if the 
destination size is zero. This simplifies the spec, and gives us an 
opportunity to add one more runtime sanity check that the destination 
size is nonzero in our debugging implementation, if we want to do that. 
I've added this idea to the third patch in the attached patchset (which 
changes only the documentation).
Zack Weinberg - Dec. 3, 2015, 5:36 p.m.
On Thu, Dec 3, 2015 at 12:33 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
> Come to think of it, this is related to the confusion between one of the
> main goals of strlcpy (namely, the result is always null-terminated), and
> the weird special case where the destination size is zero (where strlcpy
> cannot null-terminate the destination). In practice user code does not and
> should not depend on this weird special case.

I would not at all be surprised to find code deliberately calling
strlcpy with destination size zero in order to learn how big the
destination buffer needed to be.

zw
Paul Eggert - Dec. 3, 2015, 5:56 p.m.
On 12/03/2015 09:36 AM, Zack Weinberg wrote:
> I would not at all be surprised to find code deliberately calling
> strlcpy with destination size zero in order to learn how big the
> destination buffer needed to be.

Calling strlcpy (DST, SRC, 0) instead of the more-obvious strlen (SRC)? 
Really? That would be bizarre. I'd like to see an example.

Also, would you be surprised by a call strlcpy (NULL, SRC, 0) instead of 
the more-obvious strlen (SRC)? Florian's proposed wording (like my 
proposal) would prohibit such a call. Even though strlcpy (NULL, SRC, 
0)  works just fine on BSD and the OpenBSD and FreeBSD man pages don't 
prohibit it, surely glibc doesn't need to support this usage, and surely 
the same is true for strlcpy (DST, SRC, 0) when DST is non-NULL.
Zack Weinberg - Dec. 3, 2015, 6:08 p.m.
On Thu, Dec 3, 2015 at 12:56 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 12/03/2015 09:36 AM, Zack Weinberg wrote:
>>
>> I would not at all be surprised to find code deliberately calling
>> strlcpy with destination size zero in order to learn how big the
>> destination buffer needed to be.
>
> Calling strlcpy (DST, SRC, 0) instead of the more-obvious strlen (SRC)?
> Really? That would be bizarre. I'd like to see an example.

char *my_strdup(char *s)
{
    size_t bufsiz = strlcpy(0, s, 0) + 1;
    char *buf = malloc(bufsiz);
    if (!buf) return 0;
    strlcpy(buf, s, bufsiz);
    return s;
}

If you're inclined to write this function using strlcpy in the first
place, using it both to compute the buffer size and to do the copy is
natural.

> Also, would you be surprised by a call strlcpy (NULL, SRC, 0) instead of the
> more-obvious strlen (SRC)?

I would actively expect strlcpy (0, SRC, 0) to be used; indeed, I
would expect it to be *more* common than passing a non-null buffer
with a zero size.

> Florian's proposed wording (like my proposal)
> would prohibit such a call. Even though strlcpy (NULL, SRC, 0)  works just
> fine on BSD and the OpenBSD and FreeBSD man pages don't prohibit it,
> surely glibc doesn't need to support this usage

If we're going to have these functions at all, they need to work
_exactly_ as they do on *BSD.

Also, accepting a null destination buffer along with a zero
destination size is consistent with snprintf.

zw
Paul Eggert - Dec. 3, 2015, 10:42 p.m.
On 12/03/2015 10:08 AM, Zack Weinberg wrote:
> char *my_strdup(char *s)

Sorry, I should have been clearer. I'd like to see real-world 
applications that care about these weird corner cases. I'm skeptical 
that such applications exist. I'm sure that one can find contrived 
examples and feature-test programs and suchlike, but production code?

> I would actively expect strlcpy (0, SRC, 0) to be used

Any such usage would not work under NetBSD's current strlcpy 
implementation, which does not allow the destination to be a null 
pointer. So strlcpy (0, SRC, 0) is already not portable, and we don't 
need to support it.

> If we're going to have these functions at all, they need to work 
> _exactly_ as they do on *BSD

There is no "_exactly_". The BSDs differ.

As you know, I'd rather we didn't support these poorly-designed APIs; 
but if we do support them at least we can help out a bit by catching any 
apps that fall into these API's weirder cracks.
Florian Weimer - Dec. 4, 2015, 11:12 a.m.
On 12/03/2015 11:42 PM, Paul Eggert wrote:
> On 12/03/2015 10:08 AM, Zack Weinberg wrote:
>> If we're going to have these functions at all, they need to work
>> _exactly_ as they do on *BSD
> 
> There is no "_exactly_". The BSDs differ.

I think the behavior is the same.  The FreeBSD manpage deviation you
mentioned could be a documentation bug.

> As you know, I'd rather we didn't support these poorly-designed APIs;
> but if we do support them at least we can help out a bit by catching any
> apps that fall into these API's weirder cracks.

We'd have to coordinate this with the OpenBSD folks at least.  OpenBSD
made overlapped strings explicitly undefined in 2014, so there is some
precedent for clarification.  Not sure if a direct conversation is
possible.  I've reached out.

Florian
Florian Weimer - Dec. 4, 2015, 1:41 p.m.
On 12/03/2015 06:33 PM, Paul Eggert wrote:
> Thanks for the review.  Revised patchset attached.  At this point I'm
> inclined to install the first two patches (which don't change semantics)
> to make it easier to review and maintain the third one (which adds
> strlcpy+strlcat), but I'll hold off a bit longer on that to get more
> feedback.

> Replying to your comments:
>> I don't understand the “and as for strings usually pointers … are used”
>> part.
> 
> Changed to "A wide-string variable is usually declared to be a pointer
> of type @code{wchar_t *}, by analogy with string variables and
> @code{char *}." Hope this makes it clear.

Oh, I see now.  A few commas in the original version would have helped,
but this much better.

>> Standard uses “wide string” by the way, not “wide character string”.
> 
> C11 says "wide string", POSIX says "wide-character string". True, the
> more-concise form is better here, so I've changed it to that. Similarly,
> I changed "multibyte character string" (POSIX wording) to "multibyte
> string" (C Standard wording). I wish the standards could standardize the
> wording...

Oh, how unfortunate.

>> +literal.  Strings can also be formed by @dfn{string concatenation}:
>> +@code{"a" "b"} is the
>>
>> The original had “string literal”.  As this only works for string
>> literals, it's best to keep it.
> 
> The original was incorrect; it said "string literals can also be formed
> by @dfn{string concatenation}" but string literals are the input to
> string concatenation, not the output from it. I changed the wording to
> "String literals can also contribute to @dfn{string concatenation}:...".

Okay, fine.

>> @@ -309,7 +318,8 @@ returns @var{maxlen}.  Therefore this function is
>> equivalent to
>>   @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) :
>> @var{maxlen})}
>>   but it
>>   is more efficient and works even if the string @var{s} is not
>> -null-terminated.
>> +null-terminated so long as @var{maxlen} does not exceed the
>> +size of @var{s}'s array.
>>
>> This doesn't make much sense anymore because strings are defined to be
>> always null-terminated.
> 
> Reworded to "If the array @var{s} of size @var{maxlen} contains a null
> byte, the @code{strnlen} function returns the length of the string
> @var{s} in bytes.  Otherwise it returns @var{maxlen}. Therefore this
> function is equivalent to @code{(strlen (@var{s}) < @var{maxlen} ?
> strlen (@var{s}) : @var{maxlen})} but it is more efficient and works
> even if @var{s} is not null-terminated so long as @var{maxlen} does not
> exceed the size of @var{s}'s array."

Looks good.

I'm reading the second patch now.

This claim about _FORTIFY_SOURCE is misleading:

+Although these string-truncation functions have been used to fend off
+some buffer overruns, nowadays more-systematic techniques are often
+available, such as defining the @code{_FORTIFY_SOURCE} macro or using
+GCC's @option{-fsanitize=address} option.  Ironically, use of these
+string-truncation functions can mask application bugs that would
+otherwise be caught by the more-systematic techniques.

Unfortunately, source fortification catches few buffer overflows in this
area.  For example, this code:

char *
concat (const char *a, const char *b)
{
  size_t alen = strlen (a);
  size_t blen = strlen (b);
  size_t len = alen + blen;
  char *result = malloc (len + 1);
  if (result != NULL)
    {
      strcpy (result, a);
      strcat (result, b);
    }
  return result;
}

Compiles to (using GCC 5.1.1):

concat:
	pushl	%ebp
	pushl	%edi
	pushl	%esi
	pushl	%ebx
	subl	$24, %esp
	movl	44(%esp), %ebp
	pushl	%ebp
	call	strlen
	popl	%edx
	pushl	48(%esp)
	movl	%eax, %ebx
	call	strlen
	movl	%eax, %esi
	leal	1(%ebx,%eax), %eax
	movl	%eax, (%esp)
	call	malloc
	addl	$16, %esp
	testl	%eax, %eax
	movl	%eax, %edi
	je	.L2
	subl	$4, %esp
	addl	$1, %esi
	pushl	%ebx
	pushl	%ebp
	addl	%edi, %ebx
	pushl	%eax
	call	memcpy
	addl	$12, %esp
	pushl	%esi
	pushl	44(%esp)
	pushl	%ebx
	call	memcpy
	addl	$16, %esp
.L2:
	addl	$12, %esp
	movl	%edi, %eax
	popl	%ebx
	popl	%esi
	popl	%edi
	popl	%ebp
	ret

strcpy and strcat have been replaced by memcpy, but this only eliminates
only one source of buffer overflows here.

In my experience, _FORTIFY_SOURCE mostly covers the
write-to-statically-sized-buffer case.  The manual should not make
promises the current GCC/glibc combinations cannot deliver.
_FORTIFY_SOURCE will always be brittle for the more complex cases
because of the dependency on GCC optimization behavior.

It's also highly application-specific whether a crash (induced by
_FORTIFY_SOURCE) or truncation (from strncpy or strlcpy) is better.

Florian
Zack Weinberg - Dec. 4, 2015, 2:20 p.m.
On 12/03/2015 05:42 PM, Paul Eggert wrote:
> On 12/03/2015 10:08 AM, Zack Weinberg wrote:
>> char *my_strdup(char *s)
> 
> Sorry, I should have been clearer. I'd like to see real-world
> applications that care about these weird corner cases.

I really do not see this as a weird corner case.  I see it as the
expected, natural behavior - if dstsize=0, nothing is written to the
destination, and the return value is unaffected; because nothing is
written to the destination when dstsize=0, it is safe to pass dest=NULL
when dstsize=0.

Again, this is consistent with snprintf, and I think maintaining that
consistency would be a compelling argument even if it wasn't the *BSD
behavior.

> I'm skeptical that such applications exist.

I do not have time to do more than a cursory search, which turns up no
such applications; however, I do not consider "we don't know of any such
applications" an acceptable excuse for deviating from the *BSD behavior.
 I reiterate that

>> If we're going to have these functions at all, they need to work
>> _exactly_ as they do on *BSD.

If they don't work _exactly_ as they do on *BSD, people will not stop
using substitute definitions with glibc and there will have been no
point in adding them!

> There is no "_exactly_". The BSDs differ.

I would consider the example you gave (NetBSD not allowing dst=NULL when
dstsize=0) a bug in NetBSD, not to be emulated.

zw
Paul Eggert - Dec. 4, 2015, 4:06 p.m.
On 12/04/2015 03:12 AM, Florian Weimer wrote:
>> There is no "_exactly_". The BSDs differ.
> I think the behavior is the same.

No, as I mentioned elsewhere, the NetBSD iimplementation does not allow 
strlcpy(0, SRC, 0) whereas the FreeBSD implementation does (and this 
disagrees with the FreeBSD documentation where it says the destination 
must be a string).  I haven't investigated strlcpy but I expect that it 
differs too.

It is a bit of a mess.

It's certainly OK to reach out to the OpenBSD and FreeBSD and NetBSD 
folks here, in particular as to whether applications rely on these weird 
corner cases. I am pretty sure they do not, and that it'd be better to 
make them errors.
Paul Eggert - Dec. 4, 2015, 4:19 p.m.
On 12/04/2015 06:20 AM, Zack Weinberg wrote:
> I really do not see this as a weird corner case.

It's a weird corner case because it violates the natural programmer 
expectation that strlcpy and strlcat produce null-terminated strings. 
This is hardwired into what programmers expect, and is this expectation 
is documented by the FreeBSD man pages. And it is why the BSD man pages 
say "this should not happen" when talking about these weird corner cases.

There's no way that a typical C programmer will call strlcat and think, 
"OK, now I have to worry about whether the result is null-terminated". 
It's a misuse of programmer time to even raise the possibility. The 
whole *point* of these poorly-designed functions is to generate 
null-terminated strings come hell or high water, regardless of how long 
the inputs are.

I accept that in this sense strlcpy+strlcat differ from snprintf but 
there's a good reason for that. With snprintf, one can't easily compute 
the needed length without calling snprintf, so there's a natural pattern 
of calling snprintf with a zero size, then allocating N+1 bytes, then 
calling snprintf again.  With strlcpy this pattern does not arise. 
First, anybody who's trying to allocating a buffer of proper size will 
not be calling strlcpy in the first place; they'll be calling plain 
strcpy or memcpy or whatever because the output buffer will be big 
enough once it's allocated. Second, the natural way to compute a string 
length is to call strlen, not to call strlcpy (NULL, SRC, 0) which is a 
weird corner case and does not work in NetBSD anyway.
Florian Weimer - Dec. 4, 2015, 6:01 p.m.
On 12/03/2015 07:08 PM, Zack Weinberg wrote:

>> Calling strlcpy (DST, SRC, 0) instead of the more-obvious strlen (SRC)?
>> Really? That would be bizarre. I'd like to see an example.
> 
> char *my_strdup(char *s)
> {
>     size_t bufsiz = strlcpy(0, s, 0) + 1;
>     char *buf = malloc(bufsiz);
>     if (!buf) return 0;
>     strlcpy(buf, s, bufsiz);
>     return s;
> }

The implementation I posted does not actually support that because (like
virtually all C standard functions), a null pointer is not allowed even
if the buffer size is zero.

This is incompatible with the C++ standard library, but I did not have
any luck so far with convincing people that this needs to be fixed in
the C standard.  The strlcpy/strlcat patch is not the place for this
general discussion, I think.

Florian
Zack Weinberg - Dec. 7, 2015, 1:49 p.m.
On 12/04/2015 01:01 PM, Florian Weimer wrote:
> On 12/03/2015 07:08 PM, Zack Weinberg wrote:
> 
>>> Calling strlcpy (DST, SRC, 0) instead of the more-obvious strlen (SRC)?
>>> Really? That would be bizarre. I'd like to see an example.
>>
>> char *my_strdup(char *s)
>> {
>>     size_t bufsiz = strlcpy(0, s, 0) + 1;
>>     char *buf = malloc(bufsiz);
>>     if (!buf) return 0;
>>     strlcpy(buf, s, bufsiz);
>>     return s;
>> }
> 
> The implementation I posted does not actually support that because (like
> virtually all C standard functions), a null pointer is not allowed even
> if the buffer size is zero.

I think that's a showstopper bug in your implementation; as I've been
saying to Paul, the semantics should _exactly_ match *BSD (OpenBSD,
specifically) or we shouldn't have these functions at all.  If I'm
reading the OpenBSD manpages correctly, this applies to both strlcpy and
strlcat.

> This is incompatible with the C++ standard library, but I did not have
> any luck so far with convincing people that this needs to be fixed in
> the C standard.  The strlcpy/strlcat patch is not the place for this
> general discussion, I think.

FWIW I would support a general policy of "if a buffer size is specified
as zero, the associated pointer(s) are not dereferenced".  But agree
that this is a separate discussion.

zw
Zack Weinberg - Dec. 7, 2015, 1:57 p.m.
On 12/04/2015 11:19 AM, Paul Eggert wrote:
> On 12/04/2015 06:20 AM, Zack Weinberg wrote:
>> I really do not see this as a weird corner case.
> 
> It's a weird corner case because it violates the natural programmer
> expectation that strlcpy and strlcat produce null-terminated strings.

When there is no space to write a nul-terminated string into, there is
no such expectation.  And "no space" is what bufsiz=0 *means*.

Moreover, there is another - and, I'd argue, stronger - natural
programmer expectation, which is that when a buffer size is zero, the
associated pointer(s) will not be dereferenced.  I know that the C
standard does not require this for e.g. memcpy, but I very strongly
suspect that if you polled a random sample of experienced C programmers
they would react to that basically the same way they react to being told
that signed integer overflow is undefined, i.e. "that's stupid and the
standard should be changed."

> There's no way that a typical C programmer will call strlcat and think,
> "OK, now I have to worry about whether the result is null-terminated".
> It's a misuse of programmer time to even raise the possibility. The
> whole *point* of these poorly-designed functions is to generate
> null-terminated strings come hell or high water, regardless of how long
> the inputs are.

I could be convinced that the OpenBSD semantics are wrong, but only if
you (being the person demanding a change from those semantics) find a
piece of existing code that can *accidentally* call either strlcpy or
strlcat with a zero buffer-size argument.  That means code that was
intended to always call one of these functions with a nonzero
buffer-size argument, but, depending on input and other external
factors, might in practice call it with a zero buffer-size argument.

Failing such an exhibition, I will continue to insist on _exactly_
matching the OpenBSD semantics or else not having these functions at all.

zw
Paul Eggert - Dec. 7, 2015, 9:05 p.m.
On 12/07/2015 05:57 AM, Zack Weinberg wrote:
> I will continue to insist on_exactly_
> matching the OpenBSD semantics or else not having these functions at all.

(I agree, as I think glibc shouldn't have these functions at all. That 
being said...)

If I understand the above comment correctly, you have at least three 
reasons to object to the proposed strlcpy+strlcat implementation.

First, it rejects null pointer arguments.

Second, when arguments overlap it doesn't necessarily have the same 
behavior as the OpenBSD implementation. For example, the OpenBSD strlcpy 
implementation always has well-defined behavior when source and 
destination overlap, but the proposed implementation does not. The 
situation with strlcat is more complicated; still, the OpenBSD 
implementation has well-defined behavior in cases where the proposed 
glibc implementation does not.

Third, the OpenBSD implementation declares strlcpy and strlcat to have 
__attribute__ ((__bounded__ ...)), an OpenBSD extension that generates 
warnings when compiling with gcc -Wbounded (an OpenBSD GCC option that 
is on by default). The proposed implementation doesn't do that so it by 
default does not diagnose bugs that the OpenBSD implementation does 
diagnose.

> When there is no space to write a nul-terminated string into, there is
> no such expectation.

Sometimes there *is* space; that is, the destination size is nonzero, 
but the proposed strlcat *still* doesn't store a terminating null.  
Doesn't this give you qualms?  It does me.

> if you polled a random sample of experienced C programmers
> they would react to that basically the same way they react to being told
> that signed integer overflow is undefined, i.e. "that's stupid and the
> standard should be changed."

If you polled a random sample of experienced C programmers and told them 
that neither strlcpy nor strlcat were guaranteed to null-terminate the 
result even when given valid arguments, they would also say "that's 
stupid and the spec should be changed". And they'd have a point.

> I could be convinced that the OpenBSD semantics are wrong, but only if 
> you (being the person demanding a change from those semantics)

I'm not demanding any change to the OpenBSD semantics. The OpenBSD 
implementation would conform to the spec that I'm proposing. I'm merely 
asking the documentation to permit not only the OpenBSD implementation, 
but also the NetBSD implementation that catches the stupid mistake of a 
null-pointer argument, and an implementation that also catches the 
stupid mistake of output that is not null-terminated. It's reasonable to 
have debugging implementations that catch stupid mistakes like this.

> find a
> piece of existing code that can*accidentally*  call either strlcpy or
> strlcat with a zero buffer-size argument.

OpenSSH 7.1p1 progressmeter.c on a platform where struct winsize's 
ws_col member is an int and is -1, so setscreensize sets win_size to 
zero, so refresh_progress_meter's call to strlcat(buf, " ", win_size) 
has a zero buffer-size argument. Admittedly implausible on typical 
environments, but there it is.
Florian Weimer - Dec. 10, 2015, 4:44 p.m.
On 12/04/2015 05:06 PM, Paul Eggert wrote:
> On 12/04/2015 03:12 AM, Florian Weimer wrote:
>>> There is no "_exactly_". The BSDs differ.
>> I think the behavior is the same.
> 
> No, as I mentioned elsewhere, the NetBSD iimplementation does not allow
> strlcpy(0, SRC, 0) whereas the FreeBSD implementation does (and this
> disagrees with the FreeBSD documentation where it says the destination
> must be a string).

Hmm.  I looked at this, and everything points towards the NetBSD
implementation being buggy in this regard.  Destination-NULL-size-0 is
supported by snprintf, so it should be supported by strlcpy as well (and
strlcat, by analogy).  I will update my patch accordingly.

The question what to do if the terminator is not found in the
destination buffer (for strlcat) is still open.

Florian
Paul Eggert - Dec. 10, 2015, 6:14 p.m.
On 12/10/2015 08:44 AM, Florian Weimer wrote:
> everything points towards the NetBSD
> implementation being buggy in this regard

"everything"?  Nothing indicates that the NetBSD behavior is a bug; it 
doesn't cause real applications to break. And the NetBSD behavior can 
help catch buggy usage, such as "char *buf = malloc (n); strlcpy (buf, 
src, n - m);" on platforms where NULL points to accessible storage, 
malloc (0) returns NULL, and n happens to be 0.

The main point of strlcpy+strlcat is to guarantee that the output is 
always a null-terminated string that fits.This is what programmers 
understandably expect. This main point is *far* more important than 
obscure details about weird corner cases that should never happen 
anyway. The spec should allow an implementation that guarantees the main 
point, and that terminates the program if the program tries to exploit 
the corner cases by passing bad pointers or null pointers or 
unterminated strings or size-zero buffers that prevent the main point 
from being safely satisfied.
Florian Weimer - Dec. 10, 2015, 7:38 p.m.
On 12/07/2015 10:05 PM, Paul Eggert wrote:
> On 12/07/2015 05:57 AM, Zack Weinberg wrote:
>> I will continue to insist on_exactly_
>> matching the OpenBSD semantics or else not having these functions at all.
> 
> (I agree, as I think glibc shouldn't have these functions at all. That
> being said...)
> 
> If I understand the above comment correctly, you have at least three
> reasons to object to the proposed strlcpy+strlcat implementation.
> 
> First, it rejects null pointer arguments.

Yes, I'm going to fix that.

> Second, when arguments overlap it doesn't necessarily have the same
> behavior as the OpenBSD implementation. For example, the OpenBSD strlcpy
> implementation always has well-defined behavior when source and
> destination overlap, but the proposed implementation does not.

The OpenBSD implementation is defined to be undefined with overlapping
inputs, too.

> Third, the OpenBSD implementation declares strlcpy and strlcat to have
> __attribute__ ((__bounded__ ...)), an OpenBSD extension that generates
> warnings when compiling with gcc -Wbounded (an OpenBSD GCC option that
> is on by default). The proposed implementation doesn't do that so it by
> default does not diagnose bugs that the OpenBSD implementation does
> diagnose.

Doesn't the _FORTIFY_SOURCE wrapper do something similar?

>> When there is no space to write a nul-terminated string into, there is
>> no such expectation.
> 
> Sometimes there *is* space; that is, the destination size is nonzero,
> but the proposed strlcat *still* doesn't store a terminating null. 
> Doesn't this give you qualms?  It does me.

Yes, it's annoying, particularly since it is inconsistent with strlcpy
and snprintf.  I'm not sure if we can change that, I'll ask.

Florian
Paul Eggert - Dec. 10, 2015, 10:27 p.m.
On 12/10/2015 11:38 AM, Florian Weimer wrote:

 >> the OpenBSD strlcpy
 >> implementation always has well-defined behavior when source and
 >> destination overlap, but the proposed implementation does not.
 >
 > The OpenBSD implementation is defined to be undefined with overlapping
 > inputs, too.

No, the OpenBSD implementation has well-defined behavior then. strlcpy 
is declared like this:

   size_t strlcpy (char *, const char *, size_t) __attribute__ 
((__bounded__ (__string__, 1, 3)));

http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/include/string.h

and is implemented like this:

   size_t
   strlcpy (char *dst, const char *src, size_t size)
   {
     const char *s = src;
     size_t n = size;

     if (n)
       while (--n && (*dst++ = *s++))
         continue;

     if (!n)
       {
         if (size)
           *dst = '\0';
         while (*s++)
           continue;
       }

     return s - src - 1;
   }

http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/lib/libc/string/strlcpy.c

Nothing in this implementation prohibits overlapping the source and 
destination. There is no use of 'restrict' or of 'memcpy' or anything 
like that. Behavior is perfectly well-defined when source and 
destination overlap.


 >> Third, the OpenBSD implementation declares strlcpy and strlcat to have
 >> __attribute__ ((__bounded__ ...)), an OpenBSD extension that generates
 >> warnings when compiling with gcc -Wbounded (an OpenBSD GCC option that
 >> is on by default). The proposed implementation doesn't do that so it by
 >> default does not diagnose bugs that the OpenBSD implementation does
 >> diagnose.
 >
 > Doesn't the _FORTIFY_SOURCE wrapper do something similar?

Yes, but the proposed glibc implementation is not "exactly matching the 
OpenBSD semantics" as Zack insisted upon. For example, whether 
diagnostics are issued differs by default. There are probably other 
differences (sorry, I don't know what the __bounded__ attribute does, 
exactly).
Florian Weimer - Dec. 11, 2015, 12:09 p.m.
On 12/10/2015 11:27 PM, Paul Eggert wrote:

> http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/lib/libc/string/strlcpy.c

> Nothing in this implementation prohibits overlapping the source and
> destination.

This implementation comes with documentation and it says, among other
things:

“If the src and dst strings overlap, the behavior is undefined.”

<http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/lib/libc/string/strlcpy.3>

Florian
Paul Eggert - Dec. 11, 2015, 6:40 p.m.
On 12/11/2015 04:09 AM, Florian Weimer wrote:
> This implementation comes with documentation and it says

My comments were about the OpenBSD implementation, not the 
documentation.  If implementation compatibility is required (which I 
think was Zack's point), we need to fix incompatibilities with the 
OpenBSD implementation, even if these incompatibilities aren't documented.

Conversely, if implementation compatibility is not required, then we 
needn't reproduce OpenBSD's behavior exactly on weird corner cases that 
come up only with buggy applications. NetBSD does this to some extent, 
and if we're going to implement strlcpy+strlcat at all, we could do so 
as well.  So, for example, it would be fine if _FORTIFY_SOURCE caused 
strlcpy to report an error when given overlapping arguments, even though 
OpenBSD's strlcpy implementation has well-defined behavior in that 
situation.

Either approach would be better than a randomish glibc implementation, 
partially compatible with OpenBSD's weird quirks and partially not, with 
no principle for when we're compatible and when not.

> “If the src and dst strings overlap, the behavior is undefined.”
>
> <http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/lib/libc/string/strlcpy.3>

If the goal is to implement this spec (and implementation compatibility 
is not required), then this quote says the destination is a string, 
which means the destination buffer is null-terminated for strlcat, and 
also means SIZE is nonzero for both strlcpy and strlcat. And that is 
what my proposed strlcpy+strlcat doc patch says, albeit more clearly. Of 
course other parts of the OpenBSD spec talk about what to do when SIZE 
is zero or the strlcat destination is not a string, but what can I say? 
The OpenBSD spec is confused and contradicts itself, and this gives us 
wiggle room to interpret it reasonably for these weird corner cases.

Thanks, by the way, for being so patient with all this. This API is such 
a pain.

Patch

From 69684f3044af8b1554696054e2f8d6f20bab79c2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 3 Dec 2015 09:14:57 -0800
Subject: [PATCH 3/3] Add strlcpy, strlcat

[BZ #178]
This patch was derived from text by Florian Weimer in:
https://sourceware.org/ml/libc-alpha/2015-11/msg00558.html
* manual/string.texi (Truncating Strings): New functions from BSD.
---
 ChangeLog          |  6 ++++
 manual/string.texi | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index d02f7ad..fc2f093 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@ 
 2015-12-03  Paul Eggert  <eggert@cs.ucla.edu>
 
+	Add strlcpy, strlcat
+	[BZ #178]
+	This patch was derived from text by Florian Weimer in:
+	https://sourceware.org/ml/libc-alpha/2015-11/msg00558.html
+	* manual/string.texi (Truncating Strings): New functions from BSD.
+
 	Split large string section; add truncation advice
 	* manual/string.texi (Copying Strings and Arrays): Split into
 	three sections Copying Strings and Arrays, Concatenating Strings,
diff --git a/manual/string.texi b/manual/string.texi
index eec17f7..d32e956 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -745,7 +745,7 @@  As noted below, this function has significant performance issues.
 @end deftypefun
 
 Programmers using the @code{strcat} or @code{wcscat} function (or the
-@code{strncat} or @code{wcsncat} functions defined in
+@code{strlcat}, @code{strncat}, or @code{wcsncat} functions defined in
 a later section, for that matter)
 can easily be recognized as lazy and reckless.  In almost all situations
 the lengths of the participating strings are known (it better should be
@@ -906,6 +906,48 @@  greater than the length of @var{from}.  As noted below, this function
 is generally a poor choice for processing text.
 @end deftypefun
 
+@comment string.h
+@comment BSD
+@deftypefun size_t strlcpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+This function is similar to @code{strcpy}, but copies at most
+@var{size} bytes from the string @var{from} into the destination
+buffer @var{to}, including a terminating null byte if @var{size} is
+nonzero.
+
+If @var{size} is greater than the length of @var{from}, @code{strlcpy}
+copies all of the string @var{from} to the destination buffer
+@var{to}, including the terminating null byte.  Like other string
+functions such as @code{strcpy}, but unlike @code{strncpy}, any
+remaining bytes in the destination buffer remain unchanged.
+
+If @var{size} is nonzero and is not greater than the length of the
+string @var{from}, @code{strlcpy} copies only the first
+@samp{@var{size} - 1} bytes to the destination buffer @var{to}, and
+writes a terminating null byte to the last byte in the buffer.
+
+This function returns the length of @var{from}.  This means that
+truncation occurs whenever the returned value is not less than
+@var{size}.
+
+The behavior of @code{strlcpy} is undefined if @var{size} is zero, if
+the strings overlap, or if the source or destination are null
+pointers.
+
+Unlike @code{strncpy}, @code{strcpy} always null-terminates the
+destination buffer, does not zero-fill the destination buffer,
+requires @var{size} to be nonzero, requires @var{from} to be a
+null-terminated string, and always computes @var{from}'s length even
+when this length is greater than @var{size}.
+
+This function was designed as a stopgap for quickly retrofitting
+possibly-unsafe uses of @code{strcpy} on platforms lacking
+buffer-overrun protection.  As noted below, this function is generally
+a poor choice for processing text.
+
+@code{strlcpy} is derived from BSD.
+@end deftypefun
+
 @comment wchar.h
 @comment ISO
 @deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
@@ -1064,6 +1106,45 @@  choice for processing text.  Also, this function has significant
 performance issues.  @xref{Concatenating Strings}.
 @end deftypefun
 
+@comment string.h
+@comment BSD
+@deftypefun size_t strlcat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+This function is similar to @code{strcat}, except that it truncates
+the result to at most @var{size} bytes (including the terminating null
+byte) when appending the string @var{from} to the string @var{to}.
+
+The @code{strlcat} function appends a prefix of the string @var{from}
+to the string @var{to}.  The prefix contains all of @var{from} if that
+can be appended to @var{to} without requiring more than @var{size}
+bytes total.  Otherwise, it contains only as many leading bytes of
+@var{from} as will fit.  The resulting string in @var{to} is always
+null-terminated, and any excess trailing bytes of @var{from} are not
+copied.
+
+@code{strlcat} returns the sum of the original length of @var{to} and
+the length of @var{from}.  This means that truncation occurs whenever
+the returned value is not less than @var{size}.
+
+The behavior of @code{strlcat} is undefined if @var{from} does not
+contain a null byte in its first @var{size} bytes, if the strings
+overlap, or if the source or destination are null pointers.
+
+Unlike @code{strncat}, @code{strlcat} requires @var{to} and @var{from}
+to be null-terminated, ensures that @var{to} stays null-terminated,
+and always computes @var{from}'s length even when this length is
+greater than that of the appended string.
+
+As a companion to @code{strlcpy}, @code{strlcat} was designed as a
+stopgap for quickly retrofitting possibly-unsafe uses of @code{strcat}
+on platforms lacking buffer-overrun protection.  As noted below, this
+function is generally a poor choice for processing text.  Also, this
+function has significant performance issues.  @xref{Concatenating
+Strings}.
+
+@code{strlcat} is derived from BSD.
+@end deftypefun
+
 @comment wchar.h
 @comment ISO
 @deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
-- 
2.1.0