Add cast to unsigned char to strverscmp

Message ID CAD22NBkDvgVjq2D8B4pKJQzwtp5f+1WV1nv6WyG_zajkwP7-cQ@mail.gmail.com
State New
Headers
Series Add cast to unsigned char to strverscmp |

Commit Message

Jeremy Bettis Nov. 7, 2024, 4:53 p.m. UTC
  GCC 14.2 with -Werror=sign-compare fails on this code.

Signed-off-by: Jeremy Bettis <jbettis@google.com>
---
 newlib/libc/string/strverscmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Brian Inglis Nov. 7, 2024, 7:05 p.m. UTC | #1
On 2024-11-07 09:53, Jeremy Bettis wrote:
> GCC 14.2 with -Werror=sign-compare fails on this code.
> 
> Signed-off-by: Jeremy Bettis <jbettis@google.com <mailto:jbettis@google.com>>
> ---
>   newlib/libc/string/strverscmp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/newlib/libc/string/strverscmp.c b/newlib/libc/string/strverscmp.c
> index 55966335f..e86718faa 100644
> --- a/newlib/libc/string/strverscmp.c
> +++ b/newlib/libc/string/strverscmp.c
> @@ -76,7 +76,7 @@ int strverscmp(const char *l0, const char *r0)
>    else if (c!='0') z=0;
>    }
> 
> - if (l[dp]<'1' + 9U && r[dp]<'1' + 9U) {

I'd call that a false positive?

> + if ((unsigned char)(l[dp]-'1')<9U && (unsigned char)(r[dp]-'1')<9U) {

What happens when chars are '\0' or '\x80'?

>    /* If we're looking at non-degenerate digit sequences starting
>    * with nonzero digits, longest digit string is greater. */
>    for (j=i; isdigit(l[j]); j++)
  
Joel Sherrill Nov. 7, 2024, 7:58 p.m. UTC | #2
On Thu, Nov 7, 2024 at 1:06 PM <brian.inglis@systematicsw.ab.ca> wrote:

> On 2024-11-07 09:53, Jeremy Bettis wrote:
> > GCC 14.2 with -Werror=sign-compare fails on this code.
> >
> > Signed-off-by: Jeremy Bettis <jbettis@google.com <mailto:
> jbettis@google.com>>
> > ---
> >   newlib/libc/string/strverscmp.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/newlib/libc/string/strverscmp.c
> b/newlib/libc/string/strverscmp.c
> > index 55966335f..e86718faa 100644
> > --- a/newlib/libc/string/strverscmp.c
> > +++ b/newlib/libc/string/strverscmp.c
> > @@ -76,7 +76,7 @@ int strverscmp(const char *l0, const char *r0)
> >    else if (c!='0') z=0;
> >    }
> >
> > - if (l[dp]<'1' + 9U && r[dp]<'1' + 9U) {
>
> I'd call that a false positive?
>

Is this a side-effect of the signedness of char being unspecified and
varying by architecture?

I'd lean to trying to taking the U off the 9.


> > + if ((unsigned char)(l[dp]-'1')<9U && (unsigned char)(r[dp]-'1')<9U) {
>
> What happens when chars are '\0' or '\x80'?
>
> >    /* If we're looking at non-degenerate digit sequences starting
> >    * with nonzero digits, longest digit string is greater. */
> >    for (j=i; isdigit(l[j]); j++)
>

--joel

>
> --
> Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada
>
> La perfection est atteinte                   Perfection is achieved
> non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to
> add
> mais lorsqu'il n'y a plus rien à retirer     but when there is no more to
> cut
>                                  -- Antoine de Saint-Exupéry
>
  
Corinna Vinschen Nov. 8, 2024, 10:49 a.m. UTC | #3
On Nov  7 13:58, Joel Sherrill wrote:
> On Thu, Nov 7, 2024 at 1:06 PM <brian.inglis@systematicsw.ab.ca> wrote:
> 
> > On 2024-11-07 09:53, Jeremy Bettis wrote:
> > > GCC 14.2 with -Werror=sign-compare fails on this code.
> > >
> > > Signed-off-by: Jeremy Bettis <jbettis@google.com <mailto:
> > jbettis@google.com>>
> > > ---
> > >   newlib/libc/string/strverscmp.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/newlib/libc/string/strverscmp.c
> > b/newlib/libc/string/strverscmp.c
> > > index 55966335f..e86718faa 100644
> > > --- a/newlib/libc/string/strverscmp.c
> > > +++ b/newlib/libc/string/strverscmp.c
> > > @@ -76,7 +76,7 @@ int strverscmp(const char *l0, const char *r0)
> > >    else if (c!='0') z=0;
> > >    }
> > >
> > > - if (l[dp]<'1' + 9U && r[dp]<'1' + 9U) {

I'm puzzled.  This doesn't look like newlib code.  Newlib's code looks
like this:

        if (l[dp]-'1'<9U && r[dp]-'1'<9U) {

See https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/strverscmp.c;hb=HEAD#l79

Given that l and r are unsigned char anyway, the entire expression
is unsigned, so there shouldn't be a sign-compare error in newlib's
version.


Corinna
  
Steven J Abner Nov. 8, 2024, 11:38 a.m. UTC | #4
On Fri, Nov 8 2024 at 11:28:31 AM +0000, Steven J Abner 
<pheonix.sja@att.net> wrote:
> On Fri, Nov 8 2024 at 10:49:31 AM +0000, Corinna Vinschen 
> <vinschen@redhat.com> wrote:
>> l[dp]-'1'
> 
> I'm not an expert on gcc, but most of the time this trips me up, gcc 
> math. Gcc might be converting to int32_t to do sub/add of variables.
> Steve
>
  
Christian Franke Nov. 8, 2024, 11:41 a.m. UTC | #5
Corinna Vinschen wrote:
> On Nov  7 13:58, Joel Sherrill wrote:
>> On Thu, Nov 7, 2024 at 1:06 PM <brian.inglis@systematicsw.ab.ca> wrote:
>>
>>> On 2024-11-07 09:53, Jeremy Bettis wrote:
>>>> GCC 14.2 with -Werror=sign-compare fails on this code.
>>>>
>>>> Signed-off-by: Jeremy Bettis <jbettis@google.com <mailto:
>>> jbettis@google.com>>
>>>> ---
>>>>    newlib/libc/string/strverscmp.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/newlib/libc/string/strverscmp.c
>>> b/newlib/libc/string/strverscmp.c
>>>> index 55966335f..e86718faa 100644
>>>> --- a/newlib/libc/string/strverscmp.c
>>>> +++ b/newlib/libc/string/strverscmp.c
>>>> @@ -76,7 +76,7 @@ int strverscmp(const char *l0, const char *r0)
>>>>     else if (c!='0') z=0;
>>>>     }
>>>>
>>>> - if (l[dp]<'1' + 9U && r[dp]<'1' + 9U) {
> I'm puzzled.  This doesn't look like newlib code.  Newlib's code looks
> like this:
>
>          if (l[dp]-'1'<9U && r[dp]-'1'<9U) {
>
> See https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/strverscmp.c;hb=HEAD#l79
>
> Given that l and r are unsigned char anyway, the entire expression
> is unsigned, so there shouldn't be a sign-compare error in newlib's
> version.
>

It's actually signed, IIRC due to changing 'unsigned preserving' (K&R C) 
to 'value preserving' (C89) implicit conversions.

"Proof" using C++17 compile time checks:

$ cat uchar2int.cc
#include <cstddef>
#include <type_traits>

const unsigned char *l;
std::size_t dp;

static_assert(std::is_same_v<decltype(l[dp]), const unsigned char &>);
static_assert(std::is_same_v<decltype(l[dp]-'1'), int>);
static_assert(std::is_same_v<decltype(l[dp]-0x31U), unsigned int>);

$ g++ -S -Wall -W uchar2int.cc
[no failed assert]
  
Corinna Vinschen Nov. 8, 2024, 2:37 p.m. UTC | #6
On Nov  8 12:41, Christian Franke wrote:
> Corinna Vinschen wrote:
> > On Nov  7 13:58, Joel Sherrill wrote:
> > > On Thu, Nov 7, 2024 at 1:06 PM <brian.inglis@systematicsw.ab.ca> wrote:
> > > 
> > > > On 2024-11-07 09:53, Jeremy Bettis wrote:
> > > > > GCC 14.2 with -Werror=sign-compare fails on this code.
> > > > > 
> > > > > Signed-off-by: Jeremy Bettis <jbettis@google.com <mailto:
> > > > jbettis@google.com>>
> > > > > ---
> > > > >    newlib/libc/string/strverscmp.c | 2 +-
> > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/newlib/libc/string/strverscmp.c
> > > > b/newlib/libc/string/strverscmp.c
> > > > > index 55966335f..e86718faa 100644
> > > > > --- a/newlib/libc/string/strverscmp.c
> > > > > +++ b/newlib/libc/string/strverscmp.c
> > > > > @@ -76,7 +76,7 @@ int strverscmp(const char *l0, const char *r0)
> > > > >     else if (c!='0') z=0;
> > > > >     }
> > > > > 
> > > > > - if (l[dp]<'1' + 9U && r[dp]<'1' + 9U) {
> > I'm puzzled.  This doesn't look like newlib code.  Newlib's code looks
> > like this:
> > 
> >          if (l[dp]-'1'<9U && r[dp]-'1'<9U) {
> > 
> > See https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/strverscmp.c;hb=HEAD#l79
> > 
> > Given that l and r are unsigned char anyway, the entire expression
> > is unsigned, so there shouldn't be a sign-compare error in newlib's
> > version.
> > 
> 
> It's actually signed, IIRC due to changing 'unsigned preserving' (K&R C) to
> 'value preserving' (C89) implicit conversions.
> 
> "Proof" using C++17 compile time checks:
> 
> $ cat uchar2int.cc
> #include <cstddef>
> #include <type_traits>
> 
> const unsigned char *l;
> std::size_t dp;
> 
> static_assert(std::is_same_v<decltype(l[dp]), const unsigned char &>);
> static_assert(std::is_same_v<decltype(l[dp]-'1'), int>);
> static_assert(std::is_same_v<decltype(l[dp]-0x31U), unsigned int>);
> 
> $ g++ -S -Wall -W uchar2int.cc
> [no failed assert]

Drat.  Looks like I'm still living in K&R country...


Corinna
  
Corinna Vinschen Nov. 8, 2024, 2:45 p.m. UTC | #7
On Nov  8 15:37, Corinna Vinschen wrote:
> On Nov  8 12:41, Christian Franke wrote:
> > Corinna Vinschen wrote:
> > > On Nov  7 13:58, Joel Sherrill wrote:
> > > > On Thu, Nov 7, 2024 at 1:06 PM <brian.inglis@systematicsw.ab.ca> wrote:
> > > > 
> > > > > On 2024-11-07 09:53, Jeremy Bettis wrote:
> > > > > > GCC 14.2 with -Werror=sign-compare fails on this code.
> > > > > > 
> > > > > > Signed-off-by: Jeremy Bettis <jbettis@google.com <mailto:
> > > > > jbettis@google.com>>
> > > > > > ---
> > > > > >    newlib/libc/string/strverscmp.c | 2 +-
> > > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/newlib/libc/string/strverscmp.c
> > > > > b/newlib/libc/string/strverscmp.c
> > > > > > index 55966335f..e86718faa 100644
> > > > > > --- a/newlib/libc/string/strverscmp.c
> > > > > > +++ b/newlib/libc/string/strverscmp.c
> > > > > > @@ -76,7 +76,7 @@ int strverscmp(const char *l0, const char *r0)
> > > > > >     else if (c!='0') z=0;
> > > > > >     }
> > > > > > 
> > > > > > - if (l[dp]<'1' + 9U && r[dp]<'1' + 9U) {
> > > I'm puzzled.  This doesn't look like newlib code.  Newlib's code looks
> > > like this:
> > > 
> > >          if (l[dp]-'1'<9U && r[dp]-'1'<9U) {
> > > 
> > > See https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/strverscmp.c;hb=HEAD#l79
> > > 
> > > Given that l and r are unsigned char anyway, the entire expression
> > > is unsigned, so there shouldn't be a sign-compare error in newlib's
> > > version.
> > > 
> > 
> > It's actually signed, IIRC due to changing 'unsigned preserving' (K&R C) to
> > 'value preserving' (C89) implicit conversions.
> > 
> > "Proof" using C++17 compile time checks:
> > 
> > $ cat uchar2int.cc
> > #include <cstddef>
> > #include <type_traits>
> > 
> > const unsigned char *l;
> > std::size_t dp;
> > 
> > static_assert(std::is_same_v<decltype(l[dp]), const unsigned char &>);
> > static_assert(std::is_same_v<decltype(l[dp]-'1'), int>);
> > static_assert(std::is_same_v<decltype(l[dp]-0x31U), unsigned int>);
> > 
> > $ g++ -S -Wall -W uchar2int.cc
> > [no failed assert]
> 
> Drat.  Looks like I'm still living in K&R country...

So then, what about

  if (l[dp]-U'1'<9U && r[dp]-U'1'<9U) {

At least

  static_assert(std::is_same_v<decltype(l[dp]-U'1'), unsigned int>);

is happy.


Corinna
  
Hans-Bernhard Bröker Nov. 8, 2024, 5:04 p.m. UTC | #8
Am 08.11.2024 um 12:41 schrieb Christian Franke:
> Corinna Vinschen wrote:
>> On Nov  7 13:58, Joel Sherrill wrote:

>> Given that l and r are unsigned char anyway, the entire expression
>> is unsigned, so there shouldn't be a sign-compare error in newlib's
>> version.

> It's actually signed, IIRC due to changing 'unsigned preserving' (K&R C) 
> to 'value preserving' (C89) implicit conversions.

The reason it's signed is because '1' is a signed integer, so this 
expression has a type train of:

   ((unsigned char) - (signed int)) < (unsigned int)

The left hand side of the < operator is signed int on all but the very 
weirdest architectures (think sizeof(int) == 1), as unsigned char 
undergoes integer promotion to signed int on those.

As to using U'1' to work around this quirk: that's a C11-ism.  YMMV.
  
Christian Franke Nov. 8, 2024, 5:36 p.m. UTC | #9
Corinna Vinschen wrote:
> On Nov  8 15:37, Corinna Vinschen wrote:
>> On Nov  8 12:41, Christian Franke wrote:
>>> Corinna Vinschen wrote:
>>>> On Nov  7 13:58, Joel Sherrill wrote:
>>>>> On Thu, Nov 7, 2024 at 1:06 PM <brian.inglis@systematicsw.ab.ca> wrote:
>>>>>
>>>>>> On 2024-11-07 09:53, Jeremy Bettis wrote:
>>>>>>> GCC 14.2 with -Werror=sign-compare fails on this code.
>>>>>>>
>>>>>>> Signed-off-by: Jeremy Bettis <jbettis@google.com <mailto:
>>>>>> jbettis@google.com>>
>>>>>>> ---
>>>>>>>     newlib/libc/string/strverscmp.c | 2 +-
>>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/newlib/libc/string/strverscmp.c
>>>>>> b/newlib/libc/string/strverscmp.c
>>>>>>> index 55966335f..e86718faa 100644
>>>>>>> --- a/newlib/libc/string/strverscmp.c
>>>>>>> +++ b/newlib/libc/string/strverscmp.c
>>>>>>> @@ -76,7 +76,7 @@ int strverscmp(const char *l0, const char *r0)
>>>>>>>      else if (c!='0') z=0;
>>>>>>>      }
>>>>>>>
>>>>>>> - if (l[dp]<'1' + 9U && r[dp]<'1' + 9U) {
>>>> I'm puzzled.  This doesn't look like newlib code.  Newlib's code looks
>>>> like this:
>>>>
>>>>           if (l[dp]-'1'<9U && r[dp]-'1'<9U) {
>>>>
>>>> See https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/strverscmp.c;hb=HEAD#l79
>>>>
>>>> Given that l and r are unsigned char anyway, the entire expression
>>>> is unsigned, so there shouldn't be a sign-compare error in newlib's
>>>> version.
>>>>
>>> It's actually signed, IIRC due to changing 'unsigned preserving' (K&R C) to
>>> 'value preserving' (C89) implicit conversions.
>>>
>>> "Proof" using C++17 compile time checks:
>>>
>>> $ cat uchar2int.cc
>>> #include <cstddef>
>>> #include <type_traits>
>>>
>>> const unsigned char *l;
>>> std::size_t dp;
>>>
>>> static_assert(std::is_same_v<decltype(l[dp]), const unsigned char &>);
>>> static_assert(std::is_same_v<decltype(l[dp]-'1'), int>);
>>> static_assert(std::is_same_v<decltype(l[dp]-0x31U), unsigned int>);
>>>
>>> $ g++ -S -Wall -W uchar2int.cc
>>> [no failed assert]
>> Drat.  Looks like I'm still living in K&R country...
> So then, what about
>
>    if (l[dp]-U'1'<9U && r[dp]-U'1'<9U) {
>
> At least
>
>    static_assert(std::is_same_v<decltype(l[dp]-U'1'), unsigned int>);
>
> is happy.
>

In practice, this should work for all existing use cases of newlib. In 
theory, it is not generally portable:

U'1' has type char32_t (C11) which is unsigned. If the very rare ILP64 
data model would be used, this would be promoted to 64-bit signed int.

This variant does not depend on signedness or integer overflow:

   if ('1' <= l[dp] && l[dp] <= '9' && '1' <= r[dp] && r[dp] <= '9') {
  
Christian Franke Nov. 8, 2024, 6:12 p.m. UTC | #10
Hans-Bernhard Bröker wrote:
> Am 08.11.2024 um 12:41 schrieb Christian Franke:
>> Corinna Vinschen wrote:
>>> On Nov  7 13:58, Joel Sherrill wrote:
>
>>> Given that l and r are unsigned char anyway, the entire expression
>>> is unsigned, so there shouldn't be a sign-compare error in newlib's
>>> version.
>
>> It's actually signed, IIRC due to changing 'unsigned preserving' (K&R 
>> C) to 'value preserving' (C89) implicit conversions.
>
> The reason it's signed is because '1' is a signed integer, so this 
> expression has a type train of:
>
>   ((unsigned char) - (signed int)) < (unsigned int)

The LHS of '<' is signed because both arguments are signed after 
implicit integer promotion:

   ((signed int)(unsigned char) - (signed int)) < (unsigned int)

In traditional K&R C, it was unsigned:

   ((unsigned int)(unsigned char) - (signed int)) < (unsigned int)
  
Hans-Bernhard Bröker Nov. 8, 2024, 6:39 p.m. UTC | #11
Am 08.11.2024 um 18:36 schrieb Christian Franke:

> This variant does not depend on signedness or integer overflow:
> 
>    if ('1' <= l[dp] && l[dp] <= '9' && '1' <= r[dp] && r[dp] <= '9') {

Or one could actually use isdigit() for what it's intended to do?  Yeah 
right, that would be just too ridiculous to even consider...
  
Christian Franke Nov. 9, 2024, 11:24 a.m. UTC | #12
Hans-Bernhard Bröker wrote:
> Am 08.11.2024 um 18:36 schrieb Christian Franke:
>
>> This variant does not depend on signedness or integer overflow:
>>
>>    if ('1' <= l[dp] && l[dp] <= '9' && '1' <= r[dp] && r[dp] <= '9') {
>
> Or one could actually use isdigit() for what it's intended to do? Yeah 
> right, that would be just too ridiculous to even consider...
>
>

No, because the '0' is intentionally excluded here.

I never use isdigit() in my own code because it forbids to directly pass 
negative signed char values and typically results in unnecessary (locale 
specific) code.

"Proof" for newlib:

int isdigit_slow(int c)
{
   return isdigit(c);
}

int isdigit_fast(int c)
{
   return ('0' <= c && c <= '9');
}

Code generated with 'gcc -O2 ...':

isdigit_slow:
     pushq    %rbx
     subq    $32, %rsp
     movslq    %ecx, %rbx
     call    __locale_ctype_ptr
     movzbl    1(%rbx,%rax), %eax
     andl    $4, %eax
     addq    $32, %rsp
     popq    %rbx
     ret

isdigit_fast:
     xorl    %eax, %eax
     subl    $48, %ecx
     cmpl    $9, %ecx
     setbe    %al
     ret

Speedup factor is somewhere between 5x and 10x.

If compiled as C++ with 'g++ -O2 ...', identical (fast) code is 
generated for both functions. This is because the is*() macros in 
ctype.h are disabled for C++. Then GCC uses its builtin isdigit(). Not 
really consistent, IMO :-)
  

Patch

diff --git a/newlib/libc/string/strverscmp.c
b/newlib/libc/string/strverscmp.c
index 55966335f..e86718faa 100644
--- a/newlib/libc/string/strverscmp.c
+++ b/newlib/libc/string/strverscmp.c
@@ -76,7 +76,7 @@  int strverscmp(const char *l0, const char *r0)
  else if (c!='0') z=0;
  }

- if (l[dp]<'1' + 9U && r[dp]<'1' + 9U) {
+ if ((unsigned char)(l[dp]-'1')<9U && (unsigned char)(r[dp]-'1')<9U) {
  /* If we're looking at non-degenerate digit sequences starting
  * with nonzero digits, longest digit string is greater. */
  for (j=i; isdigit(l[j]); j++)