Implement strlcat [BZ#178]

Message ID	5656E018.5020608@cs.ucla.edu
State	New, archived
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org Subject: Re: [PATCH] Implement strlcat [BZ#178] To: Florian Weimer <fweimer@redhat.com> References: <56547472.3010302@redhat.com> <5654B1FE.5020100@cs.ucla.edu> <5654B796.7070302@redhat.com> Cc: GNU C Library <libc-alpha@sourceware.org> From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <5656E018.5020608@cs.ucla.edu> Date: Thu, 26 Nov 2015 02:34:00 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <5654B796.7070302@redhat.com> Content-Type: multipart/mixed; boundary="------------060804000608090202020707"

Message ID

5656E018.5020608@cs.ucla.edu

State

New, archived

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
Subject: Re: [PATCH] Implement strlcat [BZ#178]
To: Florian Weimer <fweimer@redhat.com>
References: <56547472.3010302@redhat.com> <5654B1FE.5020100@cs.ucla.edu>
	<5654B796.7070302@redhat.com>
Cc: GNU C Library <libc-alpha@sourceware.org>
From: Paul Eggert <eggert@cs.ucla.edu>
Message-ID: <5656E018.5020608@cs.ucla.edu>
Date: Thu, 26 Nov 2015 02:34:00 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <5654B796.7070302@redhat.com>
Content-Type: multipart/mixed;
	boundary="------------060804000608090202020707"

Commit Message

Paul Eggert Nov. 26, 2015, 10:34 a.m. UTC

  Florian Weimer wrote:
> I'm attaching the consolidated documentation patch.

Thanks.  Attached are three patches fixing the problems I mentioned earlier, 
relative to the current glibc master.  The first two patches are independent of 
strlcpy+strlcat: they switch to less-confusing terminology (e.g., "byte" instead 
of "character" when talking about strcpy), and split the enormous "Copying 
Strings and Arrays" section into more-manageable subsections while adding advice 
about string truncation.  The third patch adds strlcpy/strlcat and relies on the 
earlier two patches.

I still would rather not add strlcpy+strlcat, but if we're going to add them 
they will need decent documentation, and I hope this is good enough.

PS.  This time around I noticed that in some cases strlcat is required to not 
null-terminate its output, even when there's room for a null byte in the 
destination buffer.  Eeeuuuw.  Do we really want to document that particular 
misfeature?  Does user code really require it?  Perhaps it'd be better to leave 
the behavior undefined in that particular case, i.e., to require that strlcat's 
first argument be a string if SIZE is nonzero.

Comments

Florian Weimer Dec. 2, 2015, 4:49 p.m. UTC | #1

On 11/26/2015 11:34 AM, Paul Eggert wrote:
> Florian Weimer wrote:
>> I'm attaching the consolidated documentation patch.
> 
> Thanks.  Attached are three patches fixing the problems I mentioned
> earlier, relative to the current glibc master.  The first two patches
> are independent of strlcpy+strlcat: they switch to less-confusing
> terminology (e.g., "byte" instead of "character" when talking about
> strcpy), and split the enormous "Copying Strings and Arrays" section
> into more-manageable subsections while adding advice about string
> truncation.  The third patch adds strlcpy/strlcat and relies on the
> earlier two patches.

I looked at the first patch, and have the followign comments.

+null byte always represents the null character.  A @dfn{multibyte
+character string} is a string that consists entirely of multibyte
+characters.  In contrast, a @dfn{wide character string} is a
+null-terminated sequence of @code{wchar_t} objects and as for strings
+usually pointers of type @code{wchar_t *} are used.  @xref{Extended
+Char Intro}.

I don't understand the “and as for strings usually pointers … are used”
part.

Standard uses “wide string” by the way, not “wide character string”.

-A null character is quite different conceptually from a null pointer,
+A null byte is quite different conceptually from a null pointer,
 although both are represented by the integer @code{0}.

Please also use “integer constant @code{0}”.  Only the integer constant
zero represents a null pointer, an integer variable which stores the
value zero does not.

+literal.  Strings can also be formed by @dfn{string concatenation}:
+@code{"a" "b"} is the

The original had “string literal”.  As this only works for string
literals, it's best to keep it.

 blocks of memory, and functions that are specific to null-terminated
-arrays of characters and wide characters.
+strings and wide strings.

Should be “specific to strings and wide character strings” (the
“null-terminated” is redundant, and “wide strings” has not been defined.

Most of the remaining “null-terminated” occurrences in string.texi
should be removed, for consistency and clarity.

@@ -309,7 +318,8 @@ returns @var{maxlen}.  Therefore this function is
equivalent to
 @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})}
 but it
 is more efficient and works even if the string @var{s} is not
-null-terminated.
+null-terminated so long as @var{maxlen} does not exceed the
+size of @var{s}'s array.

This doesn't make much sense anymore because strings are defined to be
always null-terminated.

The documentation of strncpy now implies that the source must be
null-terminated, which is not correct.  Same for wcsncpy and wcsnlen.

> I still would rather not add strlcpy+strlcat, but if we're going to add
> them they will need decent documentation, and I hope this is good enough.

Understood.

> PS.  This time around I noticed that in some cases strlcat is required
> to not null-terminate its output, even when there's room for a null byte
> in the destination buffer.  Eeeuuuw.  Do we really want to document that
> particular misfeature?  Does user code really require it?

It's part of the specification, as far as I can tell.

Florian

diff mbox

Patch

From 91d9c61dfa915d888c1e0e2010c2ddddbb1b4d34 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 26 Nov 2015 01:54:20 -0800
Subject: [PATCH 3/3] Add strlcpy, strlcat

[BZ #178]
This patch was derived from text by Florian Weimer in:
https://sourceware.org/ml/libc-alpha/2015-11/msg00558.html
* manual/string.texi (Truncating Strings): New functions from BSD.
---
 ChangeLog          |  6 ++++
 manual/string.texi | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index e4f86e9..b76e781 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@ 
 2015-11-26  Paul Eggert  <eggert@cs.ucla.edu>
 
+	[BZ #178]
+	Add strlcpy, strlcat
+	This patch was derived from text by Florian Weimer in:
+	https://sourceware.org/ml/libc-alpha/2015-11/msg00558.html
+	* manual/string.texi (Truncating Strings): New functions from BSD.
+
 	Split large string section; add truncation advice
 	* manual/string.texi (Copying Strings and Arrays): Split into
 	three sections Copying Strings and Arrays, Concatenating Strings,
diff --git a/manual/string.texi b/manual/string.texi
index fcb04c9..80c2eda 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -748,7 +748,7 @@  As noted below, this function has significant performance issues.
 @end deftypefun
 
 Programmers using the @code{strcat} or @code{wcscat} function (or the
-@code{strncat} or @code{wcsncat} functions defined in
+@code{strlcat}, @code{strncat}, or @code{wcsncat} functions defined in
 a later section, for that matter)
 can easily be recognized as lazy and reckless.  In almost all situations
 the lengths of the participating strings are known (it better should be
@@ -911,6 +911,49 @@  greater than the length of @var{from}.  As noted below, this function
 is generally a poor choice for processing text.
 @end deftypefun
 
+@comment string.h
+@comment BSD
+@deftypefun size_t strlcpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+This function is similar to @code{strcpy}, but copies at most
+@var{size} bytes from the string @var{from} into the destination
+buffer @var{to}, including a terminating null byte if @var{size} is
+nonzero.
+
+If @var{size} is greater than the length of @var{from}, @code{strlcpy}
+copies all of the string @var{from} to the destination buffer @var{to},
+including the terminating null byte.  Like other string functions such
+as @code{strcpy}, but unlike @code{strncpy}, any remaining bytes in
+the destination buffer remain unchanged.
+
+If @var{size} is nonzero and is not greater than the length of the
+string @var{from}, @code{strlcpy} copies only the first
+@samp{@var{size} - 1} bytes to the destination buffer @var{to}, and
+writes a terminating null byte to the last byte in the buffer.
+
+If @var{size} is zero, nothing is written to @var{to}.
+
+This function returns the length of @var{from}.  This means that
+truncation occurs whenever the returned value is not less than
+@var{size}.
+
+The behavior of @code{strlcpy} is undefined if the strings overlap or
+if the source or destination are null pointers.
+
+Unlike @code{strncpy}, @code{strcpy} always null-terminates a nonempty
+destination buffer, does not zero-fill the destination buffer,
+requires @var{from} to be a null-terminated string, and always
+computes @var{from}'s length even when this length is greater than
+@var{size}.
+
+This function was designed as a stopgap for quickly retrofitting
+possibly-unsafe uses of @code{strcpy} on platforms lacking
+buffer-overrun protection.  As noted below, this function is generally
+a poor choice for processing text.
+
+@code{strlcpy} is derived from BSD.
+@end deftypefun
+
 @comment wchar.h
 @comment ISO
 @deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
@@ -1067,6 +1110,45 @@  choice for processing text.  Also, this function has significant
 performance issues.  @xref{Concatenating Strings}.
 @end deftypefun
 
+@comment string.h
+@comment BSD
+@deftypefun size_t strlcat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
+@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+This function attempts to concatenate the string @var{from} to the
+destination array @var{to} of size @var{size}.
+
+If the original array @var{to} contains a null byte among the first
+@var{size} bytes, @code{strlcat} copies as much as possible of the
+string @var{from} into the buffer at @var{to} of @var{size} bytes,
+starting at the terminating null byte of the original string @var{to}.
+In effect, this appends a prefix of the string @var{from} to the
+string @var{to}.  In this case, the resulting string in @var{to} will
+always be null-terminated, and it is truncated if necessary (not all
+bytes in @var{from} are copied).  @code{strlcat} returns the sum of
+the original length of @var{to} and the length of @var{from}.
+
+If the original array @var{to} is not null-terminated (within the
+first @var{size} bytes of the array), @code{strlcat} returns
+@var{size} plus the length of the string @var{from}.  In this case,
+the array @var{to} is unchanged, and is not null-terminated.  This
+also covers the case when @var{size} is zero.
+
+Unlike @code{strncat}, @code{strlcat} keeps the destination buffer
+null-terminated if it was already null-terminated, requires @var{from}
+to be a null-terminated string, and always computes @var{from}'s
+length even when this length is greater than that of the appended
+string.
+
+As a companion to @code{strlcpy}, @code{strlcat} was designed as a
+stopgap for quickly retrofitting possibly-unsafe uses of @code{strcat}
+on platforms lacking buffer-overrun protection.  As noted below, this
+function is generally a poor choice for processing text.  Also, this
+function has significant performance issues.  @xref{Concatenating
+Strings}.
+
+@code{strlcat} is derived from BSD.
+@end deftypefun
+
 @comment wchar.h
 @comment ISO
 @deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
-- 
2.1.0