[review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior

Message ID 87wobk8hew.fsf@mid.deneb.enyo.de
State New, archived
Headers

Commit Message

Florian Weimer Nov. 28, 2019, 9:43 a.m. UTC
  * Florian Weimer:

> * Andreas Schwab:
>
>> On Okt 30 2019, Florian Weimer wrote:
>>
>>> * Andreas Schwab:
>>>
>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>
>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>> +known that the shorter array contains a null terminator.
>>>>
>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>
>>> This is not how it is specified in POSIX.
>>
>> Yes, it is.
>>
>>     The strnlen() function shall return the number of bytes preceding
>>     the first null byte in the array to which s points, if s contains a
>>     null byte within the first maxlen bytes; otherwise, it shall return
>>     maxlen.
>>
>> There is nothing undefined here.  Your interpretation would be
>> completely useless anyway.
>
> It says “array”, which implies a length.  Admittedly, it does not say
> that maxlen corresponds to the arrray length.  POSIX also says this:
>
> | The strnlen() function shall never examine more than maxlen bytes of
> | the array pointed to by s.
>
> But it does NOT say that reading stops after the first null terminator.

I have built glibc with --disable-multi-arch and this patch on x86-64:


The resulting crashes demonstrate that the test suite verifies that we
do not treat the input as an array (to some degree; there might be
scopes in coverage).

I think we should document this as a GNU extension.  Thoughts?
  

Comments

Carlos O'Donell Nov. 28, 2019, 3:56 p.m. UTC | #1
On 11/28/19 4:43 AM, Florian Weimer wrote:
> * Florian Weimer:
> 
>> * Andreas Schwab:
>>
>>> On Okt 30 2019, Florian Weimer wrote:
>>>
>>>> * Andreas Schwab:
>>>>
>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>>
>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>>> +known that the shorter array contains a null terminator.
>>>>>
>>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>>
>>>> This is not how it is specified in POSIX.
>>>
>>> Yes, it is.
>>>
>>>     The strnlen() function shall return the number of bytes preceding
>>>     the first null byte in the array to which s points, if s contains a
>>>     null byte within the first maxlen bytes; otherwise, it shall return
>>>     maxlen.
>>>
>>> There is nothing undefined here.  Your interpretation would be
>>> completely useless anyway.
>>
>> It says “array”, which implies a length.  Admittedly, it does not say
>> that maxlen corresponds to the arrray length.  POSIX also says this:
>>
>> | The strnlen() function shall never examine more than maxlen bytes of
>> | the array pointed to by s.
>>
>> But it does NOT say that reading stops after the first null terminator.
> 
> I have built glibc with --disable-multi-arch and this patch on x86-64:
> 
> diff --git a/string/strnlen.c b/string/strnlen.c
> index 0b3a12e8b1..d5781dbb6f 100644
> --- a/string/strnlen.c
> +++ b/string/strnlen.c
> @@ -33,6 +33,10 @@
>  size_t
>  __strnlen (const char *str, size_t maxlen)
>  {
> +  /* Assert that the entire input is readable.  */
> +  for (size_t i = 0; i < maxlen; ++i)
> +    asm volatile ("" :: "r" (str[i]));
> +
>    const char *char_ptr, *end_ptr = str + maxlen;
>    const unsigned long int *longword_ptr;
>    unsigned long int longword, himagic, lomagic;
> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
> deleted file mode 100644
> index d3c43ac482..0000000000
> --- a/sysdeps/x86_64/strnlen.S
> +++ /dev/null
> @@ -1,6 +0,0 @@
> -#define AS_STRNLEN
> -#define strlen __strnlen
> -#include "strlen.S"
> -
> -weak_alias (__strnlen, strnlen);
> -libc_hidden_builtin_def (strnlen)
> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
> index 17e004dcc0..0d3709ac91 100644
> --- a/wcsmbs/wcsnlen.c
> +++ b/wcsmbs/wcsnlen.c
> @@ -26,6 +26,10 @@
>  size_t
>  __wcsnlen (const wchar_t *s, size_t maxlen)
>  {
> +  /* Assert that the entire input is readable.  */
> +  for (size_t i = 0; i < maxlen; ++i)
> +    asm volatile ("" :: "r" (s[i]));
> +
>    const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
>    if (ret)
>      maxlen = ret - s;
> 
> The resulting crashes demonstrate that the test suite verifies that we
> do not treat the input as an array (to some degree; there might be
> scopes in coverage).
> 
> I think we should document this as a GNU extension.  Thoughts?

We should absolutely document this. It's an implementation-dependent detail
that we choose to interpret the standard in this way.
  
Carlos O'Donell Nov. 28, 2019, 3:58 p.m. UTC | #2
On 11/28/19 10:56 AM, Carlos O'Donell wrote:
> On 11/28/19 4:43 AM, Florian Weimer wrote:
>> * Florian Weimer:
>>
>>> * Andreas Schwab:
>>>
>>>> On Okt 30 2019, Florian Weimer wrote:
>>>>
>>>>> * Andreas Schwab:
>>>>>
>>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>>>
>>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>>>> +known that the shorter array contains a null terminator.
>>>>>>
>>>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>>>
>>>>> This is not how it is specified in POSIX.
>>>>
>>>> Yes, it is.
>>>>
>>>>     The strnlen() function shall return the number of bytes preceding
>>>>     the first null byte in the array to which s points, if s contains a
>>>>     null byte within the first maxlen bytes; otherwise, it shall return
>>>>     maxlen.
>>>>
>>>> There is nothing undefined here.  Your interpretation would be
>>>> completely useless anyway.
>>>
>>> It says “array”, which implies a length.  Admittedly, it does not say
>>> that maxlen corresponds to the arrray length.  POSIX also says this:
>>>
>>> | The strnlen() function shall never examine more than maxlen bytes of
>>> | the array pointed to by s.
>>>
>>> But it does NOT say that reading stops after the first null terminator.
>>
>> I have built glibc with --disable-multi-arch and this patch on x86-64:
>>
>> diff --git a/string/strnlen.c b/string/strnlen.c
>> index 0b3a12e8b1..d5781dbb6f 100644
>> --- a/string/strnlen.c
>> +++ b/string/strnlen.c
>> @@ -33,6 +33,10 @@
>>  size_t
>>  __strnlen (const char *str, size_t maxlen)
>>  {
>> +  /* Assert that the entire input is readable.  */
>> +  for (size_t i = 0; i < maxlen; ++i)
>> +    asm volatile ("" :: "r" (str[i]));
>> +
>>    const char *char_ptr, *end_ptr = str + maxlen;
>>    const unsigned long int *longword_ptr;
>>    unsigned long int longword, himagic, lomagic;
>> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
>> deleted file mode 100644
>> index d3c43ac482..0000000000
>> --- a/sysdeps/x86_64/strnlen.S
>> +++ /dev/null
>> @@ -1,6 +0,0 @@
>> -#define AS_STRNLEN
>> -#define strlen __strnlen
>> -#include "strlen.S"
>> -
>> -weak_alias (__strnlen, strnlen);
>> -libc_hidden_builtin_def (strnlen)
>> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
>> index 17e004dcc0..0d3709ac91 100644
>> --- a/wcsmbs/wcsnlen.c
>> +++ b/wcsmbs/wcsnlen.c
>> @@ -26,6 +26,10 @@
>>  size_t
>>  __wcsnlen (const wchar_t *s, size_t maxlen)
>>  {
>> +  /* Assert that the entire input is readable.  */
>> +  for (size_t i = 0; i < maxlen; ++i)
>> +    asm volatile ("" :: "r" (s[i]));
>> +
>>    const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
>>    if (ret)
>>      maxlen = ret - s;
>>
>> The resulting crashes demonstrate that the test suite verifies that we
>> do not treat the input as an array (to some degree; there might be
>> scopes in coverage).
>>
>> I think we should document this as a GNU extension.  Thoughts?
> 
> We should absolutely document this. It's an implementation-dependent detail
> that we choose to interpret the standard in this way.
> 

I also think we should get changes into the linux man page project to call
this out so that nobody thinks about changing this again and so the
implementation is clear.

Have we asked Rich what musl does and what he thinks on the topic?
  
Rich Felker Nov. 28, 2019, 6:22 p.m. UTC | #3
On Thu, Nov 28, 2019 at 10:58:13AM -0500, Carlos O'Donell wrote:
> On 11/28/19 10:56 AM, Carlos O'Donell wrote:
> > On 11/28/19 4:43 AM, Florian Weimer wrote:
> >> * Florian Weimer:
> >>
> >>> * Andreas Schwab:
> >>>
> >>>> On Okt 30 2019, Florian Weimer wrote:
> >>>>
> >>>>> * Andreas Schwab:
> >>>>>
> >>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
> >>>>>>
> >>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
> >>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
> >>>>>>> +known that the shorter array contains a null terminator.
> >>>>>>
> >>>>>> This is not true.  strnlen _always_ stops before the null byte.
> >>>>>
> >>>>> This is not how it is specified in POSIX.
> >>>>
> >>>> Yes, it is.
> >>>>
> >>>>     The strnlen() function shall return the number of bytes preceding
> >>>>     the first null byte in the array to which s points, if s contains a
> >>>>     null byte within the first maxlen bytes; otherwise, it shall return
> >>>>     maxlen.
> >>>>
> >>>> There is nothing undefined here.  Your interpretation would be
> >>>> completely useless anyway.
> >>>
> >>> It says “array”, which implies a length.  Admittedly, it does not say
> >>> that maxlen corresponds to the arrray length.  POSIX also says this:
> >>>
> >>> | The strnlen() function shall never examine more than maxlen bytes of
> >>> | the array pointed to by s.
> >>>
> >>> But it does NOT say that reading stops after the first null terminator.
> >>
> >> I have built glibc with --disable-multi-arch and this patch on x86-64:
> >>
> >> diff --git a/string/strnlen.c b/string/strnlen.c
> >> index 0b3a12e8b1..d5781dbb6f 100644
> >> --- a/string/strnlen.c
> >> +++ b/string/strnlen.c
> >> @@ -33,6 +33,10 @@
> >>  size_t
> >>  __strnlen (const char *str, size_t maxlen)
> >>  {
> >> +  /* Assert that the entire input is readable.  */
> >> +  for (size_t i = 0; i < maxlen; ++i)
> >> +    asm volatile ("" :: "r" (str[i]));
> >> +
> >>    const char *char_ptr, *end_ptr = str + maxlen;
> >>    const unsigned long int *longword_ptr;
> >>    unsigned long int longword, himagic, lomagic;
> >> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
> >> deleted file mode 100644
> >> index d3c43ac482..0000000000
> >> --- a/sysdeps/x86_64/strnlen.S
> >> +++ /dev/null
> >> @@ -1,6 +0,0 @@
> >> -#define AS_STRNLEN
> >> -#define strlen __strnlen
> >> -#include "strlen.S"
> >> -
> >> -weak_alias (__strnlen, strnlen);
> >> -libc_hidden_builtin_def (strnlen)
> >> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
> >> index 17e004dcc0..0d3709ac91 100644
> >> --- a/wcsmbs/wcsnlen.c
> >> +++ b/wcsmbs/wcsnlen.c
> >> @@ -26,6 +26,10 @@
> >>  size_t
> >>  __wcsnlen (const wchar_t *s, size_t maxlen)
> >>  {
> >> +  /* Assert that the entire input is readable.  */
> >> +  for (size_t i = 0; i < maxlen; ++i)
> >> +    asm volatile ("" :: "r" (s[i]));
> >> +
> >>    const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
> >>    if (ret)
> >>      maxlen = ret - s;
> >>
> >> The resulting crashes demonstrate that the test suite verifies that we
> >> do not treat the input as an array (to some degree; there might be
> >> scopes in coverage).
> >>
> >> I think we should document this as a GNU extension.  Thoughts?
> > 
> > We should absolutely document this. It's an implementation-dependent detail
> > that we choose to interpret the standard in this way.
> > 
> 
> I also think we should get changes into the linux man page project to call
> this out so that nobody thinks about changing this again and so the
> implementation is clear.
> 
> Have we asked Rich what musl does and what he thinks on the topic?

I missed this whole thread, and haven't had time to look back through
it yet. Is the claim that strnlen, etc. require a pointer to at least
n bytes? I do not think that matches the intent of these interfaces at
all. The language in POSIX is sloppy ("the number of bytes in the
array to which s points"?! I think they were just trying to avoid
saying "string" here because it's not necessarily a string, but they
botched it) but a function like this that requires a large array is
utterly useless. The whole point of strnlen is to be a bounded-time
strlen when lengths >n will be treated as errors (or otherwise
specially) after it returns.

Rich
  
Szabolcs Nagy Nov. 28, 2019, 6:38 p.m. UTC | #4
On 28/11/2019 18:22, Rich Felker wrote:
> On Thu, Nov 28, 2019 at 10:58:13AM -0500, Carlos O'Donell wrote:

>> On 11/28/19 10:56 AM, Carlos O'Donell wrote:

>>> On 11/28/19 4:43 AM, Florian Weimer wrote:

>>>> * Florian Weimer:

>>>>

>>>>> * Andreas Schwab:

>>>>>

>>>>>> On Okt 30 2019, Florian Weimer wrote:

>>>>>>

>>>>>>> * Andreas Schwab:

>>>>>>>

>>>>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:

>>>>>>>>

>>>>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It

>>>>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is

>>>>>>>>> +known that the shorter array contains a null terminator.

>>>>>>>>

>>>>>>>> This is not true.  strnlen _always_ stops before the null byte.

>>>>>>>

>>>>>>> This is not how it is specified in POSIX.

>>>>>>

>>>>>> Yes, it is.

>>>>>>

>>>>>>     The strnlen() function shall return the number of bytes preceding

>>>>>>     the first null byte in the array to which s points, if s contains a

>>>>>>     null byte within the first maxlen bytes; otherwise, it shall return

>>>>>>     maxlen.

>>>>>>

>>>>>> There is nothing undefined here.  Your interpretation would be

>>>>>> completely useless anyway.

>>>>>

>>>>> It says “array”, which implies a length.  Admittedly, it does not say

>>>>> that maxlen corresponds to the arrray length.  POSIX also says this:

>>>>>

>>>>> | The strnlen() function shall never examine more than maxlen bytes of

>>>>> | the array pointed to by s.

>>>>>

>>>>> But it does NOT say that reading stops after the first null terminator.

>>>>

>>>> I have built glibc with --disable-multi-arch and this patch on x86-64:

>>>>

>>>> diff --git a/string/strnlen.c b/string/strnlen.c

>>>> index 0b3a12e8b1..d5781dbb6f 100644

>>>> --- a/string/strnlen.c

>>>> +++ b/string/strnlen.c

>>>> @@ -33,6 +33,10 @@

>>>>  size_t

>>>>  __strnlen (const char *str, size_t maxlen)

>>>>  {

>>>> +  /* Assert that the entire input is readable.  */

>>>> +  for (size_t i = 0; i < maxlen; ++i)

>>>> +    asm volatile ("" :: "r" (str[i]));

>>>> +

>>>>    const char *char_ptr, *end_ptr = str + maxlen;

>>>>    const unsigned long int *longword_ptr;

>>>>    unsigned long int longword, himagic, lomagic;

>>>> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S

>>>> deleted file mode 100644

>>>> index d3c43ac482..0000000000

>>>> --- a/sysdeps/x86_64/strnlen.S

>>>> +++ /dev/null

>>>> @@ -1,6 +0,0 @@

>>>> -#define AS_STRNLEN

>>>> -#define strlen __strnlen

>>>> -#include "strlen.S"

>>>> -

>>>> -weak_alias (__strnlen, strnlen);

>>>> -libc_hidden_builtin_def (strnlen)

>>>> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c

>>>> index 17e004dcc0..0d3709ac91 100644

>>>> --- a/wcsmbs/wcsnlen.c

>>>> +++ b/wcsmbs/wcsnlen.c

>>>> @@ -26,6 +26,10 @@

>>>>  size_t

>>>>  __wcsnlen (const wchar_t *s, size_t maxlen)

>>>>  {

>>>> +  /* Assert that the entire input is readable.  */

>>>> +  for (size_t i = 0; i < maxlen; ++i)

>>>> +    asm volatile ("" :: "r" (s[i]));

>>>> +

>>>>    const wchar_t *ret = __wmemchr (s, L'\0', maxlen);

>>>>    if (ret)

>>>>      maxlen = ret - s;

>>>>

>>>> The resulting crashes demonstrate that the test suite verifies that we

>>>> do not treat the input as an array (to some degree; there might be

>>>> scopes in coverage).

>>>>

>>>> I think we should document this as a GNU extension.  Thoughts?

>>>

>>> We should absolutely document this. It's an implementation-dependent detail

>>> that we choose to interpret the standard in this way.

>>>

>>

>> I also think we should get changes into the linux man page project to call

>> this out so that nobody thinks about changing this again and so the

>> implementation is clear.

>>

>> Have we asked Rich what musl does and what he thinks on the topic?

> 

> I missed this whole thread, and haven't had time to look back through

> it yet. Is the claim that strnlen, etc. require a pointer to at least

> n bytes? I do not think that matches the intent of these interfaces at

> all. The language in POSIX is sloppy ("the number of bytes in the

> array to which s points"?! I think they were just trying to avoid

> saying "string" here because it's not necessarily a string, but they

> botched it) but a function like this that requires a large array is

> utterly useless. The whole point of strnlen is to be a bounded-time

> strlen when lengths >n will be treated as errors (or otherwise

> specially) after it returns.


if there is something wrong with the posix wording then
maybe the c2x proposal of strnlen should be updated too?
(cc Martin)

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2351.htm
  
Martin Sebor Nov. 29, 2019, 6:20 p.m. UTC | #5
On 11/28/19 11:38 AM, Szabolcs Nagy wrote:
> On 28/11/2019 18:22, Rich Felker wrote:
>> On Thu, Nov 28, 2019 at 10:58:13AM -0500, Carlos O'Donell wrote:
>>> On 11/28/19 10:56 AM, Carlos O'Donell wrote:
>>>> On 11/28/19 4:43 AM, Florian Weimer wrote:
>>>>> * Florian Weimer:
>>>>>
>>>>>> * Andreas Schwab:
>>>>>>
>>>>>>> On Okt 30 2019, Florian Weimer wrote:
>>>>>>>
>>>>>>>> * Andreas Schwab:
>>>>>>>>
>>>>>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>>>>>>
>>>>>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>>>>>>> +known that the shorter array contains a null terminator.
>>>>>>>>>
>>>>>>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>>>>>>
>>>>>>>> This is not how it is specified in POSIX.
>>>>>>>
>>>>>>> Yes, it is.
>>>>>>>
>>>>>>>      The strnlen() function shall return the number of bytes preceding
>>>>>>>      the first null byte in the array to which s points, if s contains a
>>>>>>>      null byte within the first maxlen bytes; otherwise, it shall return
>>>>>>>      maxlen.
>>>>>>>
>>>>>>> There is nothing undefined here.  Your interpretation would be
>>>>>>> completely useless anyway.
>>>>>>
>>>>>> It says “array”, which implies a length.  Admittedly, it does not say
>>>>>> that maxlen corresponds to the arrray length.  POSIX also says this:
>>>>>>
>>>>>> | The strnlen() function shall never examine more than maxlen bytes of
>>>>>> | the array pointed to by s.
>>>>>>
>>>>>> But it does NOT say that reading stops after the first null terminator.
>>>>>
>>>>> I have built glibc with --disable-multi-arch and this patch on x86-64:
>>>>>
>>>>> diff --git a/string/strnlen.c b/string/strnlen.c
>>>>> index 0b3a12e8b1..d5781dbb6f 100644
>>>>> --- a/string/strnlen.c
>>>>> +++ b/string/strnlen.c
>>>>> @@ -33,6 +33,10 @@
>>>>>   size_t
>>>>>   __strnlen (const char *str, size_t maxlen)
>>>>>   {
>>>>> +  /* Assert that the entire input is readable.  */
>>>>> +  for (size_t i = 0; i < maxlen; ++i)
>>>>> +    asm volatile ("" :: "r" (str[i]));
>>>>> +
>>>>>     const char *char_ptr, *end_ptr = str + maxlen;
>>>>>     const unsigned long int *longword_ptr;
>>>>>     unsigned long int longword, himagic, lomagic;
>>>>> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
>>>>> deleted file mode 100644
>>>>> index d3c43ac482..0000000000
>>>>> --- a/sysdeps/x86_64/strnlen.S
>>>>> +++ /dev/null
>>>>> @@ -1,6 +0,0 @@
>>>>> -#define AS_STRNLEN
>>>>> -#define strlen __strnlen
>>>>> -#include "strlen.S"
>>>>> -
>>>>> -weak_alias (__strnlen, strnlen);
>>>>> -libc_hidden_builtin_def (strnlen)
>>>>> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
>>>>> index 17e004dcc0..0d3709ac91 100644
>>>>> --- a/wcsmbs/wcsnlen.c
>>>>> +++ b/wcsmbs/wcsnlen.c
>>>>> @@ -26,6 +26,10 @@
>>>>>   size_t
>>>>>   __wcsnlen (const wchar_t *s, size_t maxlen)
>>>>>   {
>>>>> +  /* Assert that the entire input is readable.  */
>>>>> +  for (size_t i = 0; i < maxlen; ++i)
>>>>> +    asm volatile ("" :: "r" (s[i]));
>>>>> +
>>>>>     const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
>>>>>     if (ret)
>>>>>       maxlen = ret - s;
>>>>>
>>>>> The resulting crashes demonstrate that the test suite verifies that we
>>>>> do not treat the input as an array (to some degree; there might be
>>>>> scopes in coverage).
>>>>>
>>>>> I think we should document this as a GNU extension.  Thoughts?
>>>>
>>>> We should absolutely document this. It's an implementation-dependent detail
>>>> that we choose to interpret the standard in this way.
>>>>
>>>
>>> I also think we should get changes into the linux man page project to call
>>> this out so that nobody thinks about changing this again and so the
>>> implementation is clear.
>>>
>>> Have we asked Rich what musl does and what he thinks on the topic?
>>
>> I missed this whole thread, and haven't had time to look back through
>> it yet. Is the claim that strnlen, etc. require a pointer to at least
>> n bytes? I do not think that matches the intent of these interfaces at
>> all. The language in POSIX is sloppy ("the number of bytes in the
>> array to which s points"?! I think they were just trying to avoid
>> saying "string" here because it's not necessarily a string, but they
>> botched it) but a function like this that requires a large array is
>> utterly useless. The whole point of strnlen is to be a bounded-time
>> strlen when lengths >n will be treated as errors (or otherwise
>> specially) after it returns.
> 
> if there is something wrong with the posix wording then
> maybe the c2x proposal of strnlen should be updated too?
> (cc Martin)
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2351.htm

I'm not sure I'd call it wrong but I'm not opposed to updating
the text to make it clear the function (and others like it) must
not examine bytes of the source array past the first NUL.

The strncpy and strncat functions are specified similarly (there's
no explicit requirement that they not read characters past the first
NUL).  All three functions should behave analogously WRT reading
the source array.

Martin
  

Patch

diff --git a/string/strnlen.c b/string/strnlen.c
index 0b3a12e8b1..d5781dbb6f 100644
--- a/string/strnlen.c
+++ b/string/strnlen.c
@@ -33,6 +33,10 @@ 
 size_t
 __strnlen (const char *str, size_t maxlen)
 {
+  /* Assert that the entire input is readable.  */
+  for (size_t i = 0; i < maxlen; ++i)
+    asm volatile ("" :: "r" (str[i]));
+
   const char *char_ptr, *end_ptr = str + maxlen;
   const unsigned long int *longword_ptr;
   unsigned long int longword, himagic, lomagic;
diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
deleted file mode 100644
index d3c43ac482..0000000000
--- a/sysdeps/x86_64/strnlen.S
+++ /dev/null
@@ -1,6 +0,0 @@ 
-#define AS_STRNLEN
-#define strlen __strnlen
-#include "strlen.S"
-
-weak_alias (__strnlen, strnlen);
-libc_hidden_builtin_def (strnlen)
diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
index 17e004dcc0..0d3709ac91 100644
--- a/wcsmbs/wcsnlen.c
+++ b/wcsmbs/wcsnlen.c
@@ -26,6 +26,10 @@ 
 size_t
 __wcsnlen (const wchar_t *s, size_t maxlen)
 {
+  /* Assert that the entire input is readable.  */
+  for (size_t i = 0; i < maxlen; ++i)
+    asm volatile ("" :: "r" (s[i]));
+
   const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
   if (ret)
     maxlen = ret - s;