[v1] x86: memcmp-avx2-movbe.S and memcmp-evex-movbe.S fix overflow bug.

Message ID 20210607083011.855616-1-goldstein.w.n@gmail.com
State Accepted, archived
Headers
Series [v1] x86: memcmp-avx2-movbe.S and memcmp-evex-movbe.S fix overflow bug. |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Noah Goldstein June 7, 2021, 8:30 a.m. UTC
  Fix bugs introducted in commits:

author	Noah Goldstein <goldstein.w.n@gmail.com>
Mon, 17 May 2021 17:57:24 +0000 (13:57 -0400)
commit	4ad473e97acdc5f6d811755b67c09f2128a644ce

And

author	Noah Goldstein <goldstein.w.n@gmail.com>
Mon, 17 May 2021 17:56:52 +0000 (13:56 -0400)
commit	16d12015c57701b08d7bbed6ec536641bcafb428

Which added a bug which would cause pointer + length overflow to lead
to an early return as opposed to a Segmentation Fault.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
---
 sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 13 ++++++++-----
 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 15 ++++++++-------
 2 files changed, 16 insertions(+), 12 deletions(-)
  

Comments

H.J. Lu June 7, 2021, 1:45 p.m. UTC | #1
On Mon, Jun 7, 2021 at 1:30 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> Fix bugs introducted in commits:
>
> author  Noah Goldstein <goldstein.w.n@gmail.com>
> Mon, 17 May 2021 17:57:24 +0000 (13:57 -0400)
> commit  4ad473e97acdc5f6d811755b67c09f2128a644ce
>
> And
>
> author  Noah Goldstein <goldstein.w.n@gmail.com>
> Mon, 17 May 2021 17:56:52 +0000 (13:56 -0400)
> commit  16d12015c57701b08d7bbed6ec536641bcafb428
>
> Which added a bug which would cause pointer + length overflow to lead
> to an early return as opposed to a Segmentation Fault.
>
> Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
> ---
>  sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 13 ++++++++-----
>  sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 15 ++++++++-------
>  2 files changed, 16 insertions(+), 12 deletions(-)
>

Please open glibc bugs for both memcmp and memset overflow issue.

Thanks.
  
Siddhesh Poyarekar June 7, 2021, 2:21 p.m. UTC | #2
On 6/7/21 2:00 PM, Noah Goldstein via Libc-alpha wrote:
> Fix bugs introducted in commits:
> 
> author	Noah Goldstein <goldstein.w.n@gmail.com>
> Mon, 17 May 2021 17:57:24 +0000 (13:57 -0400)
> commit	4ad473e97acdc5f6d811755b67c09f2128a644ce
> 
> And
> 
> author	Noah Goldstein <goldstein.w.n@gmail.com>
> Mon, 17 May 2021 17:56:52 +0000 (13:56 -0400)
> commit	16d12015c57701b08d7bbed6ec536641bcafb428
> 
> Which added a bug which would cause pointer + length overflow to lead
> to an early return as opposed to a Segmentation Fault.

If we end up making this change, IMO it should come with an explicit 
note that this behaviour is not guaranteed for invalid inputs in other 
implementations of memcmp or for that matter, in future versions of this 
memcmp.

An input that causes pointer + length overflow is undefined behaviour 
and IMO we shouldn't try to define it for glibc.  This change may not 
have a noticeable performance impact but future requests to guarantee 
this behaviour may not necessarily be this straightforward and IMO we 
should not bind ourselves to it.

Siddhesh
  
Noah Goldstein June 7, 2021, 5:24 p.m. UTC | #3
On Mon, Jun 7, 2021 at 9:45 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Mon, Jun 7, 2021 at 1:30 AM Noah Goldstein <goldstein.w.n@gmail.com>
> wrote:
> >
> > Fix bugs introducted in commits:
> >
> > author  Noah Goldstein <goldstein.w.n@gmail.com>
> > Mon, 17 May 2021 17:57:24 +0000 (13:57 -0400)
> > commit  4ad473e97acdc5f6d811755b67c09f2128a644ce
> >
> > And
> >
> > author  Noah Goldstein <goldstein.w.n@gmail.com>
> > Mon, 17 May 2021 17:56:52 +0000 (13:56 -0400)
> > commit  16d12015c57701b08d7bbed6ec536641bcafb428
> >
> > Which added a bug which would cause pointer + length overflow to lead
> > to an early return as opposed to a Segmentation Fault.
> >
> > Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
> > ---
> >  sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 13 ++++++++-----
> >  sysdeps/x86_64/multiarch/memcmp-evex-movbe.S | 15 ++++++++-------
> >  2 files changed, 16 insertions(+), 12 deletions(-)
> >
>
> Please open glibc bugs for both memcmp and memset overflow issue.
>

Done:

Memcmp: https://sourceware.org/bugzilla/show_bug.cgi?id=27961
Memset: https://sourceware.org/bugzilla/show_bug.cgi?id=27960

>
> Thanks.
>
> --
> H.J.
>
  
Noah Goldstein June 7, 2021, 5:28 p.m. UTC | #4
On Mon, Jun 7, 2021 at 10:21 AM Siddhesh Poyarekar <siddhesh@gotplt.org>
wrote:

> On 6/7/21 2:00 PM, Noah Goldstein via Libc-alpha wrote:
> > Fix bugs introducted in commits:
> >
> > author        Noah Goldstein <goldstein.w.n@gmail.com>
> > Mon, 17 May 2021 17:57:24 +0000 (13:57 -0400)
> > commit        4ad473e97acdc5f6d811755b67c09f2128a644ce
> >
> > And
> >
> > author        Noah Goldstein <goldstein.w.n@gmail.com>
> > Mon, 17 May 2021 17:56:52 +0000 (13:56 -0400)
> > commit        16d12015c57701b08d7bbed6ec536641bcafb428
> >
> > Which added a bug which would cause pointer + length overflow to lead
> > to an early return as opposed to a Segmentation Fault.
>
> If we end up making this change, IMO it should come with an explicit
> note that this behaviour is not guaranteed for invalid inputs in other
> implementations of memcmp or for that matter, in future versions of this
> memcmp.
>
> An input that causes pointer + length overflow is undefined behaviour
> and IMO we shouldn't try to define it for glibc.


Is it actually UB? The caller is not causing overflow. The implementation
method is. It is possible to implement without overflow.


> This change may not
> have a noticeable performance impact


Minor impact actually for memcmp.


> but future requests to guarantee
> this behaviour may not necessarily be this straightforward and IMO we
> should not bind ourselves to it.


Agreed.


>
> Siddhesh
>
  
Paul Eggert June 7, 2021, 5:45 p.m. UTC | #5
On 6/7/21 10:28 AM, Noah Goldstein via Libc-alpha wrote:
> 
> Is it actually UB? The caller is not causing overflow. The implementation
> method is. It is possible to implement without overflow.

The C Standard says that unless otherwise specified, when you pass an 
array (pointer + size) to a standard function, all the addresses in the 
array must be valid. It's valid if (say) memcmp is multithreaded and 
compares the first halves of the two arrays in parallel with comparing 
the second halves.

If I understand things correctly this patch isn't fixing a conformance 
bug; it's merely a QoI issue, where by "quality" one means "I want this 
particular undefined behavior to cause a core dump".
  
Noah Goldstein June 7, 2021, 6:07 p.m. UTC | #6
On Mon, Jun 7, 2021 at 1:45 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 6/7/21 10:28 AM, Noah Goldstein via Libc-alpha wrote:
> >
> > Is it actually UB? The caller is not causing overflow. The implementation
> > method is. It is possible to implement without overflow.
>
> The C Standard says that unless otherwise specified, when you pass an
> array (pointer + size) to a standard function, all the addresses in the
> array must be valid. It's valid if (say) memcmp is multithreaded and
> compares the first halves of the two arrays in parallel with comparing
> the second halves.


> If I understand things correctly this patch isn't fixing a conformance
> bug; it's merely a QoI issue, where by "quality" one means "I want this
> particular undefined behavior to cause a core dump".
>

I see. Good to hear there wasn't technically a bug :)

 What do we want the behavior to be? Generally I would think
failing silently can be quite painful for users.
  
Paul Eggert June 7, 2021, 7:51 p.m. UTC | #7
On 6/7/21 11:07 AM, Noah Goldstein wrote:
>   What do we want the behavior to be? Generally I would think
> failing silently can be quite painful for users.

Traditionally we have typically said to go for the best typical-case 
performance, and not to worry about how that affects undefined behavior.

So, for example, glibc's memcmp ((void *) 1, NULL, 0) does not dump core 
even though its behavior is undefined, because it would slow glibc down 
to force memcmp to dump core for that case.

A debugging library like valgrind's might choose to crash more reliably.
  
Noah Goldstein June 7, 2021, 8:12 p.m. UTC | #8
On Mon, Jun 7, 2021 at 3:51 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 6/7/21 11:07 AM, Noah Goldstein wrote:
> >   What do we want the behavior to be? Generally I would think
> > failing silently can be quite painful for users.
>
> Traditionally we have typically said to go for the best typical-case
> performance, and not to worry about how that affects undefined behavior.
>

That works for me. The old versions are slightly faster / use less code.
Panicked a bit last night when I thought I might have pushed a bug :/

>
> So, for example, glibc's memcmp ((void *) 1, NULL, 0) does not dump core
> even though its behavior is undefined, because it would slow glibc down
> to force memcmp to dump core for that case.
>
> A debugging library like valgrind's might choose to crash more reliably.
>
  
Noah Goldstein June 9, 2021, 5:15 a.m. UTC | #9
On Mon, Jun 7, 2021 at 1:45 PM Paul Eggert <eggert@cs.ucla.edu> wrote:

> On 6/7/21 10:28 AM, Noah Goldstein via Libc-alpha wrote:
> >
> > Is it actually UB? The caller is not causing overflow. The implementation
> > method is. It is possible to implement without overflow.
>
> The C Standard says that unless otherwise specified, when you pass an
> array (pointer + size) to a standard function, all the addresses in the
> array must be valid. It's valid if (say) memcmp is multithreaded and
> compares the first halves of the two arrays in parallel with comparing
> the second halves.
>

Does this mean there is an issue with functions like wcsnlen?

They do say that they only need to be able to access memory up
to first null terminator but currently the x86_64 wcsnlen-avx2.S
implementation will multiple length by 4 which could cause overflow.

For example with a string length 1000 and maxlen passed as
2^62 + 1 the return will be 1.

Is that an issue? It's a pretty simple fix although I think this idiom
of multiply length by 4 to get byte count is in a lot of similarly specified
files.


>
> If I understand things correctly this patch isn't fixing a conformance
> bug; it's merely a QoI issue, where by "quality" one means "I want this
> particular undefined behavior to cause a core dump".
>
  
Siddhesh Poyarekar June 9, 2021, 5:25 a.m. UTC | #10
On 6/9/21 10:45 AM, Noah Goldstein wrote:
> 
> 
> On Mon, Jun 7, 2021 at 1:45 PM Paul Eggert <eggert@cs.ucla.edu 
> <mailto:eggert@cs.ucla.edu>> wrote:
> 
>     On 6/7/21 10:28 AM, Noah Goldstein via Libc-alpha wrote:
>      >
>      > Is it actually UB? The caller is not causing overflow. The
>     implementation
>      > method is. It is possible to implement without overflow.
> 
>     The C Standard says that unless otherwise specified, when you pass an
>     array (pointer + size) to a standard function, all the addresses in the
>     array must be valid. It's valid if (say) memcmp is multithreaded and
>     compares the first halves of the two arrays in parallel with comparing
>     the second halves.
> 
> 
> Does this mean there is an issue with functions like wcsnlen?
> 
> They do say that they only need to be able to access memory up
> to first null terminator but currently the x86_64 wcsnlen-avx2.S
> implementation will multiple length by 4 which could cause overflow.
> 
> For example with a string length 1000 and maxlen passed as
> 2^62 + 1 the return will be 1.

The maxlen causes undefined behaviour; it is the responsibility of the 
caller to ensure that it doesn't and the way to do that for wcsnlen is 
to ensure that s+maxlen*sizeof(wchar_t) is within bounds of the object 
and hence does not cause overflow.
Siddhesh
  
Noah Goldstein June 9, 2021, 5:43 a.m. UTC | #11
On Wed, Jun 9, 2021 at 1:25 AM Siddhesh Poyarekar <siddhesh@gotplt.org>
wrote:

> On 6/9/21 10:45 AM, Noah Goldstein wrote:
> >
> >
> > On Mon, Jun 7, 2021 at 1:45 PM Paul Eggert <eggert@cs.ucla.edu
> > <mailto:eggert@cs.ucla.edu>> wrote:
> >
> >     On 6/7/21 10:28 AM, Noah Goldstein via Libc-alpha wrote:
> >      >
> >      > Is it actually UB? The caller is not causing overflow. The
> >     implementation
> >      > method is. It is possible to implement without overflow.
> >
> >     The C Standard says that unless otherwise specified, when you pass an
> >     array (pointer + size) to a standard function, all the addresses in
> the
> >     array must be valid. It's valid if (say) memcmp is multithreaded and
> >     compares the first halves of the two arrays in parallel with
> comparing
> >     the second halves.
> >
> >
> > Does this mean there is an issue with functions like wcsnlen?
> >
> > They do say that they only need to be able to access memory up
> > to first null terminator but currently the x86_64 wcsnlen-avx2.S
> > implementation will multiple length by 4 which could cause overflow.
> >
> > For example with a string length 1000 and maxlen passed as
> > 2^62 + 1 the return will be 1.
>
> The maxlen causes undefined behaviour; it is the responsibility of the
> caller to ensure that it doesn't and the way to do that for wcsnlen is
> to ensure that s+maxlen*sizeof(wchar_t) is within bounds of the object
> and hence does not cause overflow.
> Siddhesh
>

Got it and thanks! Sorry for all the questions.

Was a bit thrown off because we have overflow tests for strncat / strnlen
so it appears we are testing UB.
  
Siddhesh Poyarekar June 9, 2021, 6:01 a.m. UTC | #12
On 6/9/21 11:13 AM, Noah Goldstein wrote:
> Got it and thanks! Sorry for all the questions.
> 
> Was a bit thrown off because we have overflow tests for strncat / strnlen
> so it appears we are testing UB.

Hmm, I couldn't spot any overflow tests from a quick skim; there are 
tests that set maxlen as SIZE_MAX (but should terminate before reaching 
SIZE_MAX and hence not cause an overflow) and page boundary tests in 
strnlen, but none of them should result in undefined behaviour.  Could 
you point out the ones that invoke undefined behaviour?

Thanks,
Siddhesh
  
Noah Goldstein June 9, 2021, 6:32 a.m. UTC | #13
On Wed, Jun 9, 2021 at 2:02 AM Siddhesh Poyarekar <siddhesh@gotplt.org>
wrote:

> On 6/9/21 11:13 AM, Noah Goldstein wrote:
> > Got it and thanks! Sorry for all the questions.
> >
> > Was a bit thrown off because we have overflow tests for strncat / strnlen
> > so it appears we are testing UB.
>
> Hmm, I couldn't spot any overflow tests from a quick skim; there are
> tests that set maxlen as SIZE_MAX (but should terminate before reaching
> SIZE_MAX and hence not cause an overflow) and page boundary tests in
> strnlen, but none of them should result in undefined behaviour.  Could
> you point out the ones that invoke undefined behaviour?
>

Wait are you saying that the return value overflowing causes UB or
that the act of passing a maxlen where s + maxlen * sizeof(wchar_t) is
outside range object is UB?

If the former then why is the follow okay:

Previous example with a string whose length is 1000
but because wcslen is passed maxlen where maxlen * sizeof(wchar_t)
overflows and leads to a result less than 1000 the implementation of
wcslen in wcsnlen-avx2.S will return a length less than 1000.


If the latter then:

For test-wcsnlen which redirects to test-strnlen
If the UB is when is s+maxlen*sizeof(wchar_t) is outside object bound
then s + SIZE_MAX + sizeof(wchar_t) surely is.

Although even then test-strnlen s + SIZE_MAX will also overflow if s non
null.



>
> Thanks,
> Siddhesh
>
  
Siddhesh Poyarekar June 9, 2021, 6:47 a.m. UTC | #14
On 6/9/21 12:02 PM, Noah Goldstein wrote:
> Wait are you saying that the return value overflowing causes UB or
> that the act of passing a maxlen where s + maxlen * sizeof(wchar_t) is
> outside range object is UB?

Not specifically the return value, but more broadly, causing wcsnlen to 
invoke undefined behaviour, which could include the former.

> If the former then why is the follow okay:
> 
> Previous example with a string whose length is 1000
> but because wcslen is passed maxlen where maxlen * sizeof(wchar_t)
> overflows and leads to a result less than 1000 the implementation of
> wcslen in wcsnlen-avx2.S will return a length less than 1000.
> 
> 
> If the latter then:
> 
> For test-wcsnlen which redirects to test-strnlen
> If the UB is when is s+maxlen*sizeof(wchar_t) is outside object bound
> then s + SIZE_MAX + sizeof(wchar_t) surely is.
> 
> Although even then test-strnlen s + SIZE_MAX will also overflow if s non 
> null.

That is a good catch; I did not notice that.  That should be SIZE_MAX / 
sizeof (CHAR).

Thanks,
Siddhesh
  
Noah Goldstein June 9, 2021, 6:54 a.m. UTC | #15
On Wed, Jun 9, 2021 at 2:48 AM Siddhesh Poyarekar <siddhesh@gotplt.org>
wrote:

> On 6/9/21 12:02 PM, Noah Goldstein wrote:
> > Wait are you saying that the return value overflowing causes UB or
> > that the act of passing a maxlen where s + maxlen * sizeof(wchar_t) is
> > outside range object is UB?
>
> Not specifically the return value, but more broadly, causing wcsnlen to
> invoke undefined behaviour, which could include the former.
>
> > If the former then why is the follow okay:
> >
> > Previous example with a string whose length is 1000
> > but because wcslen is passed maxlen where maxlen * sizeof(wchar_t)
> > overflows and leads to a result less than 1000 the implementation of
> > wcslen in wcsnlen-avx2.S will return a length less than 1000.
> >
> >
> > If the latter then:
> >
> > For test-wcsnlen which redirects to test-strnlen
> > If the UB is when is s+maxlen*sizeof(wchar_t) is outside object bound
> > then s + SIZE_MAX + sizeof(wchar_t) surely is.
> >
> > Although even then test-strnlen s + SIZE_MAX will also overflow if s non
> > null.
>
> That is a good catch; I did not notice that.  That should be SIZE_MAX /
> sizeof (CHAR).
>

Do we want to support s + maxlen + sizeof(CHAR) overflowing? If not we can
speed up the AVX2/EVEX implementation of strnlen/wcsnlen/memchr/wmemchr.

(Assuming memchr is of same standard where s + n + sizeof(CHAR)
can't overflow. If not it has this bug).


> Thanks,
> Siddhesh
>
  
Siddhesh Poyarekar June 9, 2021, 7:01 a.m. UTC | #16
On 6/9/21 12:24 PM, Noah Goldstein wrote:
> Do we want to support s + maxlen + sizeof(CHAR) overflowing? If not we can
> speed up the AVX2/EVEX implementation of strnlen/wcsnlen/memchr/wmemchr.

We don't want to specify any behaviour that is undefined by the standard 
so it's perfectly OK for the algorithm to assume that s + (maxlen - 1) * 
sizeof(CHAR) (off by one in my previous comment, sorry :)) points to a 
valid address if that makes the function go faster.

Siddhesh
  
Siddhesh Poyarekar June 9, 2021, 7:07 a.m. UTC | #17
On 6/9/21 12:31 PM, Siddhesh Poyarekar wrote:
> On 6/9/21 12:24 PM, Noah Goldstein wrote:
>> Do we want to support s + maxlen + sizeof(CHAR) overflowing? If not we 
>> can
>> speed up the AVX2/EVEX implementation of strnlen/wcsnlen/memchr/wmemchr.
> 
> We don't want to specify any behaviour that is undefined by the standard 
> so it's perfectly OK for the algorithm to assume that s + (maxlen - 1) * 
> sizeof(CHAR) (off by one in my previous comment, sorry :)) points to a 
> valid address if that makes the function go faster.

s/valid address/valid reference in the object referred to by s/

In fact, like implementations for strlen, it is OK to assume that reads 
beyond the object bounds are also OK as long as they're within the last 
page that s is in.  Likewise for reads before object bounds as long as 
they're within the first page that s is in.

Siddhesh
  
Andreas Schwab June 9, 2021, 7:39 a.m. UTC | #18
On Jun 09 2021, Siddhesh Poyarekar wrote:

> On 6/9/21 12:02 PM, Noah Goldstein wrote:
>> Wait are you saying that the return value overflowing causes UB or
>> that the act of passing a maxlen where s + maxlen * sizeof(wchar_t) is
>> outside range object is UB?
>
> Not specifically the return value, but more broadly, causing wcsnlen to
> invoke undefined behaviour, which could include the former.

I don't think wcsnlen can assume that such an overflow doesn't happen.

Andreas.
  
Siddhesh Poyarekar June 9, 2021, 7:58 a.m. UTC | #19
On 6/9/21 1:09 PM, Andreas Schwab wrote:
> On Jun 09 2021, Siddhesh Poyarekar wrote:
> 
>> On 6/9/21 12:02 PM, Noah Goldstein wrote:
>>> Wait are you saying that the return value overflowing causes UB or
>>> that the act of passing a maxlen where s + maxlen * sizeof(wchar_t) is
>>> outside range object is UB?
>>
>> Not specifically the return value, but more broadly, causing wcsnlen to
>> invoke undefined behaviour, which could include the former.
> 
> I don't think wcsnlen can assume that such an overflow doesn't happen.

Hmm, I just noticed that wcsnlen is not in the ISO C draft (at least the 
one I have from 2011) and is defined in POSIX.  The description doesn't 
seem to specify any access semantics for wcsnlen + maxlen.  Are you 
referring to the fact that it's unspecified or are you aware of anywhere 
else in the spec that requires the implementation to ensure valid access?

Siddhesh
  
Andreas Schwab June 9, 2021, 9:14 a.m. UTC | #20
On Jun 09 2021, Siddhesh Poyarekar wrote:

> Hmm, I just noticed that wcsnlen is not in the ISO C draft (at least the
> one I have from 2011) and is defined in POSIX.  The description doesn't 
> seem to specify any access semantics for wcsnlen + maxlen.  Are you
> referring to the fact that it's unspecified or are you aware of anywhere 
> else in the spec that requires the implementation to ensure valid access?

Does the sentence "The wcsnlen() function shall never examine more than
the first maxlen characters of the wide-character array pointed to by
ws." constitute a limit on maxlen, or that ws+maxlen must be valid?

Andreas.
  
Florian Weimer June 9, 2021, 9:20 a.m. UTC | #21
* Andreas Schwab:

> On Jun 09 2021, Siddhesh Poyarekar wrote:
>
>> Hmm, I just noticed that wcsnlen is not in the ISO C draft (at least the
>> one I have from 2011) and is defined in POSIX.  The description doesn't 
>> seem to specify any access semantics for wcsnlen + maxlen.  Are you
>> referring to the fact that it's unspecified or are you aware of anywhere 
>> else in the spec that requires the implementation to ensure valid access?
>
> Does the sentence "The wcsnlen() function shall never examine more than
> the first maxlen characters of the wide-character array pointed to by
> ws." constitute a limit on maxlen, or that ws+maxlen must be valid?

Is this bug related?

  wcsrtombs calls wcsnlen on input data which is not an array
  <https://sourceware.org/bugzilla/show_bug.cgi?id=23711>

Thanks,
Florian
  
Siddhesh Poyarekar June 9, 2021, 9:26 a.m. UTC | #22
On 6/9/21 2:44 PM, Andreas Schwab wrote:
> On Jun 09 2021, Siddhesh Poyarekar wrote:
> 
>> Hmm, I just noticed that wcsnlen is not in the ISO C draft (at least the
>> one I have from 2011) and is defined in POSIX.  The description doesn't
>> seem to specify any access semantics for wcsnlen + maxlen.  Are you
>> referring to the fact that it's unspecified or are you aware of anywhere
>> else in the spec that requires the implementation to ensure valid access?
> 
> Does the sentence "The wcsnlen() function shall never examine more than
> the first maxlen characters of the wide-character array pointed to by
> ws." constitute a limit on maxlen, or that ws+maxlen must be valid?

It does not specify any of that; that's what I said.  My question was: 
do you think we should interpret that as a constraint the implementation 
has to take care of or are you aware of another text in the standard 
that makes it clearer?

Siddhesh
  
Siddhesh Poyarekar June 9, 2021, 9:34 a.m. UTC | #23
On 6/9/21 2:50 PM, Florian Weimer wrote:
> Is this bug related?
> 
>    wcsrtombs calls wcsnlen on input data which is not an array
>    <https://sourceware.org/bugzilla/show_bug.cgi?id=23711>

Your interpretation that the input has to be an array of size maxlen 
appears to put constraints on maxlen.

Siddhesh
  
Andreas Schwab June 9, 2021, 9:35 a.m. UTC | #24
On Jun 09 2021, Siddhesh Poyarekar wrote:

> On 6/9/21 2:44 PM, Andreas Schwab wrote:
>> On Jun 09 2021, Siddhesh Poyarekar wrote:
>> 
>>> Hmm, I just noticed that wcsnlen is not in the ISO C draft (at least the
>>> one I have from 2011) and is defined in POSIX.  The description doesn't
>>> seem to specify any access semantics for wcsnlen + maxlen.  Are you
>>> referring to the fact that it's unspecified or are you aware of anywhere
>>> else in the spec that requires the implementation to ensure valid access?
>> Does the sentence "The wcsnlen() function shall never examine more than
>> the first maxlen characters of the wide-character array pointed to by
>> ws." constitute a limit on maxlen, or that ws+maxlen must be valid?
>
> It does not specify any of that; that's what I said.

Then you cannot assume that maxlen*sizeof(wchar_t) does not wrap around,
and using that as a limit would be a bug.

Andreas.
  
Florian Weimer June 9, 2021, 9:37 a.m. UTC | #25
* Siddhesh Poyarekar:

> On 6/9/21 2:50 PM, Florian Weimer wrote:
>> Is this bug related?
>>    wcsrtombs calls wcsnlen on input data which is not an array
>>    <https://sourceware.org/bugzilla/show_bug.cgi?id=23711>
>
> Your interpretation that the input has to be an array of size maxlen
> appears to put constraints on maxlen.

But I think I later realized that GNU supports non-array use as an
extension, and that would mean that there is no such constraint.

Thanks,
Florian
  
Siddhesh Poyarekar June 9, 2021, 9:41 a.m. UTC | #26
On 6/9/21 3:07 PM, Florian Weimer wrote:
> * Siddhesh Poyarekar:
> 
>> On 6/9/21 2:50 PM, Florian Weimer wrote:
>>> Is this bug related?
>>>     wcsrtombs calls wcsnlen on input data which is not an array
>>>     <https://sourceware.org/bugzilla/show_bug.cgi?id=23711>
>>
>> Your interpretation that the input has to be an array of size maxlen
>> appears to put constraints on maxlen.
> 
> But I think I later realized that GNU supports non-array use as an
> extension, and that would mean that there is no such constraint.

OK got it.

Thanks,
Siddhesh
  
Siddhesh Poyarekar June 9, 2021, 9:43 a.m. UTC | #27
On 6/9/21 3:05 PM, Andreas Schwab wrote:
> Then you cannot assume that maxlen*sizeof(wchar_t) does not wrap around,
> and using that as a limit would be a bug.

Thanks for clarifying.

Siddhesh
  
Noah Goldstein June 9, 2021, 8:33 p.m. UTC | #28
So do we need a patch to fix this?

I see the issue of assuming maxlen * sizeof(wchar_t) does not wrap
in the following files:

wmemchr-sse4_1
wmemchr-avx2

wcsnlen-sse2
wcsnlen-avx2

I can get one ready pretty easily if we do need one.

On Wed, Jun 9, 2021 at 5:35 AM Andreas Schwab <schwab@linux-m68k.org> wrote:

> On Jun 09 2021, Siddhesh Poyarekar wrote:
>
> > On 6/9/21 2:44 PM, Andreas Schwab wrote:
> >> On Jun 09 2021, Siddhesh Poyarekar wrote:
> >>
> >>> Hmm, I just noticed that wcsnlen is not in the ISO C draft (at least
> the
> >>> one I have from 2011) and is defined in POSIX.  The description doesn't
> >>> seem to specify any access semantics for wcsnlen + maxlen.  Are you
> >>> referring to the fact that it's unspecified or are you aware of
> anywhere
> >>> else in the spec that requires the implementation to ensure valid
> access?
> >> Does the sentence "The wcsnlen() function shall never examine more than
> >> the first maxlen characters of the wide-character array pointed to by
> >> ws." constitute a limit on maxlen, or that ws+maxlen must be valid?
> >
> > It does not specify any of that; that's what I said.
>
> Then you cannot assume that maxlen*sizeof(wchar_t) does not wrap around,
> and using that as a limit would be a bug.
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
>
  
H.J. Lu June 9, 2021, 8:37 p.m. UTC | #29
On Wed, Jun 9, 2021 at 1:34 PM Noah Goldstein via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> So do we need a patch to fix this?
>
> I see the issue of assuming maxlen * sizeof(wchar_t) does not wrap
> in the following files:
>
> wmemchr-sse4_1
> wmemchr-avx2
>
> wcsnlen-sse2
> wcsnlen-avx2
>
> I can get one ready pretty easily if we do need one.

Can we create tests without undefined behavior to show the current
implementation is wrong? If yes, please add such testcases and fix
the implementations.

Thanks.

>
> On Wed, Jun 9, 2021 at 5:35 AM Andreas Schwab <schwab@linux-m68k.org> wrote:
>
> > On Jun 09 2021, Siddhesh Poyarekar wrote:
> >
> > > On 6/9/21 2:44 PM, Andreas Schwab wrote:
> > >> On Jun 09 2021, Siddhesh Poyarekar wrote:
> > >>
> > >>> Hmm, I just noticed that wcsnlen is not in the ISO C draft (at least
> > the
> > >>> one I have from 2011) and is defined in POSIX.  The description doesn't
> > >>> seem to specify any access semantics for wcsnlen + maxlen.  Are you
> > >>> referring to the fact that it's unspecified or are you aware of
> > anywhere
> > >>> else in the spec that requires the implementation to ensure valid
> > access?
> > >> Does the sentence "The wcsnlen() function shall never examine more than
> > >> the first maxlen characters of the wide-character array pointed to by
> > >> ws." constitute a limit on maxlen, or that ws+maxlen must be valid?
> > >
> > > It does not specify any of that; that's what I said.
> >
> > Then you cannot assume that maxlen*sizeof(wchar_t) does not wrap around,
> > and using that as a limit would be a bug.
> >
> > Andreas.
> >
> > --
> > Andreas Schwab, schwab@linux-m68k.org
> > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> > "And now for something completely different."
> >
  
Siddhesh Poyarekar June 10, 2021, 3:35 a.m. UTC | #30
On 6/10/21 2:03 AM, Noah Goldstein wrote:
> So do we need a patch to fix this?
> 

We don't.  As Andreas clarified, wcsnlen behaviour avoids overflow.

Siddhesh
  
Siddhesh Poyarekar June 10, 2021, 3:46 a.m. UTC | #31
On 6/10/21 2:03 AM, Noah Goldstein wrote:
> So do we need a patch to fix this?
> 
> I see the issue of assuming maxlen * sizeof(wchar_t) does not wrap
> in the following files:
> 
> wmemchr-sse4_1
> wmemchr-avx2
> 
> wcsnlen-sse2
> wcsnlen-avx2
> 
> I can get one ready pretty easily if we do need one.

Sorry I misunderstood your question.  H.J. has the right advice for you.

Thanks,
Siddhesh
  

Patch

diff --git a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
index 2621ec907a..4a9414ff61 100644
--- a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
+++ b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
@@ -60,6 +60,7 @@ 
 # endif
 
 # define VEC_SIZE 32
+# define LOG_VEC_SIZE	7
 # define PAGE_SIZE	4096
 
 /* Warning!
@@ -200,8 +201,7 @@  L(return_vec_2):
 # endif
 	VZEROUPPER_RETURN
 
-	/* NB: p2align 5 here to ensure 4x loop is 32 byte aligned.  */
-	.p2align 5
+	.p2align 4
 L(8x_return_vec_0_1_2_3):
 	/* Returning from L(more_8x_vec) requires restoring rsi.  */
 	addq	%rdi, %rsi
@@ -232,7 +232,7 @@  L(return_vec_3):
 # endif
 	VZEROUPPER_RETURN
 
-	.p2align 4
+	.p2align 5
 L(more_8x_vec):
 	/* Set end of s1 in rdx.  */
 	leaq	-(VEC_SIZE * 4)(%rdi, %rdx), %rdx
@@ -241,8 +241,11 @@  L(more_8x_vec):
 	subq	%rdi, %rsi
 	/* Align s1 pointer.  */
 	andq	$-VEC_SIZE, %rdi
+	leaq	-1(%rdx), %rax
+	subq	%rdi, %rax
 	/* Adjust because first 4x vec where check already.  */
 	subq	$-(VEC_SIZE * 4), %rdi
+	sarq	$LOG_VEC_SIZE, %rax
 	.p2align 4
 L(loop_4x_vec):
 	/* rsi has s2 - s1 so get correct address by adding s1 (in rdi).
@@ -267,8 +270,8 @@  L(loop_4x_vec):
 	jnz	L(8x_return_vec_0_1_2_3)
 	subq	$-(VEC_SIZE * 4), %rdi
 	/* Check if s1 pointer at end.  */
-	cmpq	%rdx, %rdi
-	jb	L(loop_4x_vec)
+	decq	%rax
+	jne	L(loop_4x_vec)
 
 	subq	%rdx, %rdi
 	/* rdi has 4 * VEC_SIZE - remaining length.  */
diff --git a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
index 654dc7ac8c..60be3f43e7 100644
--- a/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
+++ b/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
@@ -53,6 +53,7 @@ 
 # endif
 
 # define VEC_SIZE	32
+# define LOG_VEC_SIZE	7
 # define PAGE_SIZE	4096
 # define CHAR_PER_VEC	(VEC_SIZE / CHAR_SIZE)
 
@@ -163,10 +164,7 @@  ENTRY (MEMCMP)
 	/* NB: eax must be zero to reach here.  */
 	ret
 
-	/* NB: aligning 32 here allows for the rest of the jump targets
-	   to be tuned for 32 byte alignment. Most important this ensures
-	   the L(more_8x_vec) loop is 32 byte aligned.  */
-	.p2align 5
+	.p2align 4
 L(less_vec):
 	/* Check if one or less CHAR. This is necessary for size = 0 but
 	   is also faster for size = CHAR_SIZE.  */
@@ -277,7 +275,7 @@  L(return_vec_3):
 # endif
 	ret
 
-	.p2align 4
+	.p2align 5
 L(more_8x_vec):
 	/* Set end of s1 in rdx.  */
 	leaq	-(VEC_SIZE * 4)(%rdi, %rdx, CHAR_SIZE), %rdx
@@ -286,8 +284,11 @@  L(more_8x_vec):
 	subq	%rdi, %rsi
 	/* Align s1 pointer.  */
 	andq	$-VEC_SIZE, %rdi
+	leaq	-1(%rdx), %rax
+	subq	%rdi, %rax
 	/* Adjust because first 4x vec where check already.  */
 	subq	$-(VEC_SIZE * 4), %rdi
+	sarq	$LOG_VEC_SIZE, %rax
 	.p2align 4
 L(loop_4x_vec):
 	VMOVU	(%rsi, %rdi), %YMM1
@@ -307,8 +308,8 @@  L(loop_4x_vec):
 	testl	%ecx, %ecx
 	jnz	L(8x_return_vec_0_1_2_3)
 	subq	$-(VEC_SIZE * 4), %rdi
-	cmpq	%rdx, %rdi
-	jb	L(loop_4x_vec)
+	decq	%rax
+	jnz	L(loop_4x_vec)
 
 	subq	%rdx, %rdi
 	/* rdi has 4 * VEC_SIZE - remaining length.  */