Message ID | gerrit.1572431114000.Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432@gnutoolchain-gerrit.osci.io |
---|---|
State | Dropped |
Headers |
Received: (qmail 4646 invoked by alias); 30 Oct 2019 10:25:21 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 4632 invoked by uid 89); 30 Oct 2019 10:25:20 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3 autolearn=ham version=3.3.1 spammy=termination, smallexample, @var X-HELO: mx1.osci.io X-Gerrit-PatchSet: 1 Date: Wed, 30 Oct 2019 06:25:15 -0400 From: "Florian Weimer (Code Review)" <gerrit@gnutoolchain-gerrit.osci.io> To: libc-alpha@sourceware.org Message-ID: <gerrit.1572431114000.Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432@gnutoolchain-gerrit.osci.io> Auto-Submitted: auto-generated X-Gerrit-MessageType: newchange Subject: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior X-Gerrit-Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432 X-Gerrit-Change-Number: 444 X-Gerrit-ChangeURL: <https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444> X-Gerrit-Commit: 7eb8ae69a93874313a95c27985ed2441d74de3af References: <gerrit.1572431114000.Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432@gnutoolchain-gerrit.osci.io> Reply-To: fweimer@redhat.com, libc-alpha@sourceware.org MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/3.0.3-74-g460fb0f7e9 Content-Type: text/plain; charset=UTF-8 |
Commit Message
Simon Marchi (Code Review)
Oct. 30, 2019, 10:25 a.m. UTC
Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444 ...................................................................... manual: Clarify strnlen, wcsnlen, strndup null termination behavior It is required that the inputs are arrays, as reading is not guaranteed to stop on the first null byte. Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432 --- M manual/string.texi 1 file changed, 10 insertions(+), 0 deletions(-)
Comments
On Okt 30 2019, Florian Weimer (Code Review) wrote: > +Note that @var{s} must be an array of at least @var{maxlen} bytes. It > +is undefined to call @code{strnlen} on a shorter array, even if it is > +known that the shorter array contains a null terminator. This is not true. strnlen _always_ stops before the null byte. Andreas.
* Andreas Schwab: > On Okt 30 2019, Florian Weimer (Code Review) wrote: > >> +Note that @var{s} must be an array of at least @var{maxlen} bytes. It >> +is undefined to call @code{strnlen} on a shorter array, even if it is >> +known that the shorter array contains a null terminator. > > This is not true. strnlen _always_ stops before the null byte. This is not how it is specified in POSIX. Our generic implementation of strnlen performs out-of-bounds pointer arithmetic in that case, and it looks really iffy: const char *char_ptr, *end_ptr = str + maxlen; … if (__glibc_unlikely (end_ptr < str)) end_ptr = (const char *) ~0UL; GCC does the right thing on x86-64, I think, but that's far from guaranteed. And what about wcsnlen? Thanks, Florian
On Okt 30 2019, Florian Weimer wrote: > * Andreas Schwab: > >> On Okt 30 2019, Florian Weimer (Code Review) wrote: >> >>> +Note that @var{s} must be an array of at least @var{maxlen} bytes. It >>> +is undefined to call @code{strnlen} on a shorter array, even if it is >>> +known that the shorter array contains a null terminator. >> >> This is not true. strnlen _always_ stops before the null byte. > > This is not how it is specified in POSIX. Yes, it is. The strnlen() function shall return the number of bytes preceding the first null byte in the array to which s points, if s contains a null byte within the first maxlen bytes; otherwise, it shall return maxlen. There is nothing undefined here. Your interpretation would be completely useless anyway. Andreas.
* Andreas Schwab: > On Okt 30 2019, Florian Weimer wrote: > >> * Andreas Schwab: >> >>> On Okt 30 2019, Florian Weimer (Code Review) wrote: >>> >>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes. It >>>> +is undefined to call @code{strnlen} on a shorter array, even if it is >>>> +known that the shorter array contains a null terminator. >>> >>> This is not true. strnlen _always_ stops before the null byte. >> >> This is not how it is specified in POSIX. > > Yes, it is. > > The strnlen() function shall return the number of bytes preceding > the first null byte in the array to which s points, if s contains a > null byte within the first maxlen bytes; otherwise, it shall return > maxlen. > > There is nothing undefined here. Your interpretation would be > completely useless anyway. It says “array”, which implies a length. Admittedly, it does not say that maxlen corresponds to the arrray length. POSIX also says this: | The strnlen() function shall never examine more than maxlen bytes of | the array pointed to by s. But it does NOT say that reading stops after the first null terminator. Thanks, Florian
On Okt 30 2019, Florian Weimer wrote: > * Andreas Schwab: > >> On Okt 30 2019, Florian Weimer wrote: >> >>> * Andreas Schwab: >>> >>>> On Okt 30 2019, Florian Weimer (Code Review) wrote: >>>> >>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes. It >>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is >>>>> +known that the shorter array contains a null terminator. >>>> >>>> This is not true. strnlen _always_ stops before the null byte. >>> >>> This is not how it is specified in POSIX. >> >> Yes, it is. >> >> The strnlen() function shall return the number of bytes preceding >> the first null byte in the array to which s points, if s contains a >> null byte within the first maxlen bytes; otherwise, it shall return >> maxlen. >> >> There is nothing undefined here. Your interpretation would be >> completely useless anyway. > > It says “array” Yes, because a null terminator is not required. > But it does NOT say that reading stops after the first null terminator. Yes, it does, see above. Otherwise it doesn't make sense. Andreas.
On Wed, Oct 30, 2019 at 7:10 AM Andreas Schwab <schwab@suse.de> wrote: > On Okt 30 2019, Florian Weimer wrote: > > * Andreas Schwab: > >> On Okt 30 2019, Florian Weimer wrote: > >>> * Andreas Schwab: > >>>> > >>>> strnlen _always_ stops before the null byte. > >>> This is not how it is specified in POSIX. > >> Yes, it is. > >> > >> # The strnlen() function shall return the number of bytes preceding > >> # the first null byte in the array to which s points, if s contains a > >> # null byte within the first maxlen bytes; otherwise, it shall return > >> # maxlen. > >> There is nothing undefined here. Your interpretation would be > >> completely useless anyway. > > > > It says “array” > > Yes, because a null terminator is not required. > > > But it does NOT say that reading stops after the first null terminator. > > Yes, it does, see above. Otherwise it doesn't make sense. I agree with Florian's interpretation. The text Andreas quoted only says that strnlen shall find the first null byte within the array and return the number of bytes preceding. It does not say anything about whether read accesses beyond the first NUL are allowed. Looking at <https://pubs.opengroup.org/onlinepubs/9699919799/functions/strnlen.html>, Andreas quoted only the RETURN VALUE section of the specification; there's another paragraph in the DESCRIPTION section which clarifies: # The strnlen() function shall compute the smaller of the number of # bytes in the array to which s points, not including any terminating # NUL character, or the value of the maxlen argument. The strnlen() # function shall never examine more than maxlen bytes of the array # pointed to by s. It says that accesses beyond maxlen are forbidden, but it *doesn't* say that accesses beyond the first NUL are forbidden; therefore they are allowed. As a matter of QoI I think our implementation should take care not to access beyond the end of the *page* containing the first NUL (which happens naturally if we don't do speculative or misaligned loads) but it is appropriate for the manual to warn people that portable code needs to make sure the entire array is readable. (I have not looked at the rest of the proposed changes.) zw
On Okt 30 2019, Zack Weinberg wrote: > It says that accesses beyond maxlen are forbidden, but it *doesn't* > say that accesses beyond the first NUL are forbidden; therefore they > are allowed. Neither does it say that about strncpy or strncat. Andreas.
On Wed, Oct 30, 2019 at 12:20 PM Andreas Schwab <schwab@suse.de> wrote: > On Okt 30 2019, Zack Weinberg wrote: > > > It says that accesses beyond maxlen are forbidden, but it *doesn't* > > say that accesses beyond the first NUL are forbidden; therefore they > > are allowed. > > Neither does it say that about strncpy or strncat. I don't see why that would change anything. zw
On Okt 30 2019, Zack Weinberg wrote: > On Wed, Oct 30, 2019 at 12:20 PM Andreas Schwab <schwab@suse.de> wrote: >> On Okt 30 2019, Zack Weinberg wrote: >> >> > It says that accesses beyond maxlen are forbidden, but it *doesn't* >> > say that accesses beyond the first NUL are forbidden; therefore they >> > are allowed. >> >> Neither does it say that about strncpy or strncat. > > I don't see why that would change anything. That means that strncpy (x, "a", 10) is undefined. Andreas.
On Wed, Oct 30, 2019 at 12:47 PM Andreas Schwab <schwab@suse.de> wrote: > On Okt 30 2019, Zack Weinberg wrote: > > On Wed, Oct 30, 2019 at 12:20 PM Andreas Schwab <schwab@suse.de> wrote: > >> On Okt 30 2019, Zack Weinberg wrote: > >> > >> > It says that accesses beyond maxlen are forbidden, but it *doesn't* > >> > say that accesses beyond the first NUL are forbidden; therefore they > >> > are allowed. > >> > >> Neither does it say that about strncpy or strncat. > > > > I don't see why that would change anything. > > That means that strncpy (x, "a", 10) is undefined. Yes, that could be a defect in the specification of strncpy (I can argue either way about what the parenthetical "(bytes that follow a NUL character are not copied)" means). How does text's presence or absence in the specification of strncpy change anything about the requirements on strnlen? zw
On Wed, 30 Oct 2019, Zack Weinberg wrote: > I agree with Florian's interpretation. The text Andreas quoted only says > that strnlen shall find the first null byte within the array and > return the number of bytes preceding. It does not say anything about > whether read accesses beyond the first NUL are allowed. Also, ISO C has special wording for memchr for this issue "The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found.", which it doesn't for other string functions. (As noted in the prior discussion of strnlen in bug 19391, currently marked RESOLVED/INVALID.)
On Okt 30 2019, Zack Weinberg wrote: > Yes, that could be a defect in the specification of strncpy (I can > argue either way about what the parenthetical "(bytes that follow a > NUL character are not copied)" means). How does text's presence or > absence in the specification of strncpy change anything about the > requirements on strnlen? Because it shows how flawed your argument is. Andreas.
On Wed, Oct 30, 2019 at 1:26 PM Andreas Schwab <schwab@suse.de> wrote: > On Okt 30 2019, Zack Weinberg wrote: > > > Yes, that could be a defect in the specification of strncpy (I can > > argue either way about what the parenthetical "(bytes that follow a > > NUL character are not copied)" means). How does text's presence or > > absence in the specification of strncpy change anything about the > > requirements on strnlen? > > Because it shows how flawed your argument is. Are you seriously saying that I have to read the specification of strncpy to understand the specification of strnlen? That's not how I was taught to read standards. zw
* Zack Weinberg: > On Wed, Oct 30, 2019 at 1:26 PM Andreas Schwab <schwab@suse.de> wrote: >> On Okt 30 2019, Zack Weinberg wrote: >> >> > Yes, that could be a defect in the specification of strncpy (I can >> > argue either way about what the parenthetical "(bytes that follow a >> > NUL character are not copied)" means). How does text's presence or >> > absence in the specification of strncpy change anything about the >> > requirements on strnlen? >> >> Because it shows how flawed your argument is. > > Are you seriously saying that I have to read the specification of > strncpy to understand the specification of strnlen? That's not how I > was taught to read standards. I actually find the strncpy-based argument quite convincing. And it's really the way you have to read standards if you want derive meaning from them. You need to look at how certain terms are used in other contexts and what they apply there. For strncpy, clearly the intent is that it is safe to specify a source string shorter than the target array. If comparable wording is used to describe the strnlen behavior, then it makes sense to assume that the POSIX authors probably have not thought about this corner case. At the very least, it tells us that the standard does not say what the behavior should be in this case. Does anyone know if we have test cases that exercise page crossing after the null terminator in strnlen?
Carlos O'Donell has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444 ...................................................................... Patch Set 1: Code-Review+2 (2 comments) Looks good to me. Reivewed-by: Carlos O'Donell <carlos@redhat.com> | --- /dev/null | +++ /COMMIT_MSG | @@ -1,0 +1,12 @@ | +Parent: 177a3d48 (y2038: linux: Provide __clock_getres64 implementation) | +Author: Florian Weimer <fweimer@redhat.com> | +AuthorDate: 2019-10-30 11:21:18 +0100 | +Commit: Florian Weimer <fweimer@redhat.com> | +CommitDate: 2019-10-30 11:21:18 +0100 | + | +manual: Clarify strnlen, wcsnlen, strndup null termination behavior | + | +It is required that the inputs are arrays, as reading is not | +guaranteed to stop on the first null byte. PS1, Line 10: OK. Agreed. | + | +Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432 | --- manual/string.texi | +++ manual/string.texi | @@ -321,18 +321,22 @@ is more efficient and works even if @var{s} is not null-terminated so | long as @var{maxlen} does not exceed the size of @var{s}'s array. | | @smallexample | char string[32] = "hello, world"; | strnlen (string, 32) | @result{} 12 | strnlen (string, 5) | @result{} 5 | @end smallexample | | +Note that @var{s} must be an array of at least @var{maxlen} bytes. It | +is undefined to call @code{strnlen} on a shorter array, even if it is | +known that the shorter array contains a null terminator. PS1, Line 333: OK. Agreed, you must have at least maxlen bytes, othewise it's undefined. We might even create something that scans backwards for NULL bytes knowing we have maxlen bytes. | + | This function is a GNU extension and is declared in @file{string.h}. | @end deftypefun | | @deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen}) | @standards{GNU, wchar.h} | @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} | @code{wcsnlen} is the wide character equivalent to @code{strnlen}. The | @var{maxlen} parameter specifies the maximum number of wide characters.
Florian Weimer has posted comments on this change. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444 ...................................................................... Patch Set 1: Code-Review-2 There has been objection to this on libc-alpha: https://sourceware.org/ml/libc-alpha/2019-10/msg00939.html I should have probably closed out this review.
Carlos O'Donell has abandoned this change. ( https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444 ) Change subject: manual: Clarify strnlen, wcsnlen, strndup null termination behavior ...................................................................... Abandoned Dropping this change due to: https://sourceware.org/ml/libc-alpha/2019-10/msg00939.html
diff --git a/manual/string.texi b/manual/string.texi index a1c58e5..ba8a588 100644 --- a/manual/string.texi +++ b/manual/string.texi @@ -328,6 +328,10 @@ @result{} 5 @end smallexample +Note that @var{s} must be an array of at least @var{maxlen} bytes. It +is undefined to call @code{strnlen} on a shorter array, even if it is +known that the shorter array contains a null terminator. + This function is a GNU extension and is declared in @file{string.h}. @end deftypefun @@ -336,6 +340,8 @@ @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} @code{wcsnlen} is the wide character equivalent to @code{strnlen}. The @var{maxlen} parameter specifies the maximum number of wide characters. +Similar to @code{strnlen}, @var{ws} must point to an array of at least +@var{maxlen} wide characters. This function is a GNU extension and is declared in @file{wchar.h}. @end deftypefun @@ -919,6 +925,10 @@ copies just the first @var{size} bytes and adds a closing null byte. Otherwise all bytes are copied and the string is terminated. +Note that @var{s} must be an array of at least @var{size} bytes. It +is undefined to call @code{strndup} on a shorter array, even if it is +known that the shorter array contains a null terminator. + This function differs from @code{strncpy} in that it always terminates the destination string.