[v2,31/65] MIPS: use is_whitespace()

Message ID e8468189-a050-49f8-a887-ed2fa69458c4@suse.com
State New
Headers
Series gas: whitespace handling |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply

Commit Message

Jan Beulich Jan. 27, 2025, 4:28 p.m. UTC
  ... for consistency of recognition of what is deemed whitespace.

At the same time use is_end_of_stmt() instead of an open-coded nul char
check, and check for statement end in the first place in
parse_relocation().
---
v2: New.
  

Comments

Maciej W. Rozycki Feb. 3, 2025, 2:07 p.m. UTC | #1
On Mon, 27 Jan 2025, Jan Beulich wrote:

> --- a/gas/config/tc-mips.c
> +++ b/gas/config/tc-mips.c
> @@ -14388,7 +14388,7 @@ mips16_ip (char *str, struct mips_cl_ins
>    struct mips_operand_token *tokens;
>    unsigned int l;
>  
> -  for (s = str; *s != '\0' && *s != '.' && *s != ' '; ++s)
> +  for (s = str; *s != '\0' && *s != '.' && !is_whitespace (*s); ++s)
                   ^^^^^^^^^^
 Shouldn't this also be `!is_end_of_stmt (*s)'?

> @@ -14399,8 +14399,9 @@ mips16_ip (char *str, struct mips_cl_ins
>      case '\0':
>        break;
>  
> -    case ' ':
> -      s++;
> +    default:
> +      if (is_whitespace (*s))
> +	s++;
>        break;

 Why `is_whitespace (*s)' rather than `is_whitespace (c)'?

 I think this only causes obfuscation to this already messed up statement.  
Since there are only two cases here really ('\0' does nothing and is the 
only remaining possibility here, guaranteed by the loop right above) can 
you please rewrite this as:

  if (c == '.')
    {
      ...
    }
  else if (is_whitespace (c))
    s++;

or suchlike?

> @@ -14417,7 +14418,7 @@ mips16_ip (char *str, struct mips_cl_ins
>  	}
>        if (*s == '\0')

 And `is_end_of_stmt (*s)' here presumably too?

 Otherwise OK, I think.

  Maciej
  
Jan Beulich Feb. 3, 2025, 3:52 p.m. UTC | #2
On 03.02.2025 15:07, Maciej W. Rozycki wrote:
> On Mon, 27 Jan 2025, Jan Beulich wrote:
> 
>> --- a/gas/config/tc-mips.c
>> +++ b/gas/config/tc-mips.c
>> @@ -14388,7 +14388,7 @@ mips16_ip (char *str, struct mips_cl_ins
>>    struct mips_operand_token *tokens;
>>    unsigned int l;
>>  
>> -  for (s = str; *s != '\0' && *s != '.' && *s != ' '; ++s)
>> +  for (s = str; *s != '\0' && *s != '.' && !is_whitespace (*s); ++s)
>                    ^^^^^^^^^^
>  Shouldn't this also be `!is_end_of_stmt (*s)'?

This is the kind of question you as the target maintainer really
will want to answer. All I could do is try to guess where
conversion might be on order.

>> @@ -14399,8 +14399,9 @@ mips16_ip (char *str, struct mips_cl_ins
>>      case '\0':
>>        break;
>>  
>> -    case ' ':
>> -      s++;
>> +    default:
>> +      if (is_whitespace (*s))
>> +	s++;
>>        break;
> 
>  Why `is_whitespace (*s)' rather than `is_whitespace (c)'?

Right, I should have noticed to use c here.

>  I think this only causes obfuscation to this already messed up statement.  
> Since there are only two cases here really ('\0' does nothing and is the 
> only remaining possibility here, guaranteed by the loop right above) can 
> you please rewrite this as:
> 
>   if (c == '.')
>     {
>       ...
>     }
>   else if (is_whitespace (c))
>     s++;
> 
> or suchlike?

Possibly. On a similar question from Richard on aarch64 I indicated that
from other projects I'm working on I'm used to using switch() in such
cases, even if at a certain point there may be just a single case label.
This is to ease future addition of new further labels.

>> @@ -14417,7 +14418,7 @@ mips16_ip (char *str, struct mips_cl_ins
>>  	}
>>        if (*s == '\0')
> 
>  And `is_end_of_stmt (*s)' here presumably too?

See above. It has been a lot of targets all doing things (often just
slightly) differently, so I can only guess that I may have got the
impression that somewhere up the callstack something nul-terminates the
string.

Jan
  
Maciej W. Rozycki Feb. 3, 2025, 5:50 p.m. UTC | #3
On Mon, 3 Feb 2025, Jan Beulich wrote:

> >> --- a/gas/config/tc-mips.c
> >> +++ b/gas/config/tc-mips.c
> >> @@ -14388,7 +14388,7 @@ mips16_ip (char *str, struct mips_cl_ins
> >>    struct mips_operand_token *tokens;
> >>    unsigned int l;
> >>  
> >> -  for (s = str; *s != '\0' && *s != '.' && *s != ' '; ++s)
> >> +  for (s = str; *s != '\0' && *s != '.' && !is_whitespace (*s); ++s)
> >                    ^^^^^^^^^^
> >  Shouldn't this also be `!is_end_of_stmt (*s)'?
> 
> This is the kind of question you as the target maintainer really
> will want to answer. All I could do is try to guess where
> conversion might be on order.

 All I can say if `*s' is '\0', then we're at the end of an assembly 
instruction mnemonic that has no operands following, so my understanding 
is it is indeed the case for `!is_end_of_stmt (*s)'.  However since you 
made such changes elsewhere but chose not to make one on this occasion, 
I've asked whether this has been intentional or just a missed case.

> >  I think this only causes obfuscation to this already messed up statement.  
> > Since there are only two cases here really ('\0' does nothing and is the 
> > only remaining possibility here, guaranteed by the loop right above) can 
> > you please rewrite this as:
> > 
> >   if (c == '.')
> >     {
> >       ...
> >     }
> >   else if (is_whitespace (c))
> >     s++;
> > 
> > or suchlike?
> 
> Possibly. On a similar question from Richard on aarch64 I indicated that
> from other projects I'm working on I'm used to using switch() in such
> cases, even if at a certain point there may be just a single case label.
> This is to ease future addition of new further labels.

 This is generic MIPS assembly language syntax, which is unlikely to ever 
change, and then for the MIPS16 intruction set, which has been effectively 
a dead end for decades now, even the MIPS16e2 extension ~10 years ago was 
a huge surprise and a one-off effort due to a specific customer request, 
so we can safely assume nothing else will ever happen again here.  So I 
think we need to optimise for code clarity rather than minimising highly 
unlikely future changes.  Yes, you need the backend maintainer's knowledge 
to decide here.

 For the record we're handling explicit instruction size override suffixes 
on MIPS16 mnemonics here, so the three cases the current switch statement 
handles is:

- '\0': end of mnemonic, instruction ends w/o operands, no size override,

- ' ':  end of mnemonic, operands follow, no size override,

- '.':  end of mnemonic, a size override suffix follows (then either the 
        instruction ends or operands follow).

There's simply no room for expansion here, and as I say the instruction 
set is a dead end and therefore in the maintenance mode.

 NB this stuff is extensively covered and should therefore be safe to 
apply cleanups to with little concern as to possible breakage, cf.:

$ grep '\.[et]\b' gas/testsuite/gas/mips/*.s

and I put significant effort to get this stuff right, as it used to be 
broken.  See commit 7fd539200562 ("MIPS16: Switch to 32-bit opcode table 
interpretation"), commit 3fb49709438e ("MIPS16/GAS: Fix forced size 
suffixes with argumentless instructions"), and commit 25499ac7ee92 
("MIPS16e2: Add MIPS16e2 ASE support") for the most relevant changes.

> >> @@ -14417,7 +14418,7 @@ mips16_ip (char *str, struct mips_cl_ins
> >>  	}
> >>        if (*s == '\0')
> > 
> >  And `is_end_of_stmt (*s)' here presumably too?
> 
> See above. It has been a lot of targets all doing things (often just
> slightly) differently, so I can only guess that I may have got the
> impression that somewhere up the callstack something nul-terminates the
> string.

 It is an analogous situation once the override suffix has been swallowed.

  Maciej
  
Maciej W. Rozycki Feb. 10, 2025, 10:17 p.m. UTC | #4
On Mon, 3 Feb 2025, Maciej W. Rozycki wrote:

> > >  I think this only causes obfuscation to this already messed up statement.  
> > > Since there are only two cases here really ('\0' does nothing and is the 
> > > only remaining possibility here, guaranteed by the loop right above) can 
> > > you please rewrite this as:
> > > 
> > >   if (c == '.')
> > >     {
> > >       ...
> > >     }
> > >   else if (is_whitespace (c))
> > >     s++;
> > > 
> > > or suchlike?
> > 
> > Possibly. On a similar question from Richard on aarch64 I indicated that
> > from other projects I'm working on I'm used to using switch() in such
> > cases, even if at a certain point there may be just a single case label.
> > This is to ease future addition of new further labels.
> 
>  This is generic MIPS assembly language syntax, which is unlikely to ever 
> change, and then for the MIPS16 intruction set, which has been effectively 
> a dead end for decades now, even the MIPS16e2 extension ~10 years ago was 
> a huge surprise and a one-off effort due to a specific customer request, 
> so we can safely assume nothing else will ever happen again here.  So I 
> think we need to optimise for code clarity rather than minimising highly 
> unlikely future changes.  Yes, you need the backend maintainer's knowledge 
> to decide here.

 I have made and committed this cleanup myself now.

 As an upside this has let me discover and deal with a regression from my 
fix in this area made years ago that caused invalid instruction mnemonics 
ending with a dot to be assembled successfully as if the dot wasn't there.

 I've left any conversion to `is_end_of_stmt' to a future update.  At this 
point I've concluded that since it wasn't made with your original commit 
it makes no sense to me to do it piecemeal, so it'll be best made through 
tc-mips.c in one go.

  Maciej
  

Patch

--- a/gas/config/tc-mips.c
+++ b/gas/config/tc-mips.c
@@ -45,7 +45,7 @@  typedef char static_assert2[sizeof (valu
 #define streq(a, b)           (strcmp (a, b) == 0)
 
 #define SKIP_SPACE_TABS(S) \
-  do { while (*(S) == ' ' || *(S) == '\t') ++(S); } while (0)
+  do { while (is_whitespace (*(S))) ++(S); } while (0)
 
 /* Clean up namespace so we can include obj-elf.h too.  */
 static int mips_output_flavor (void);
@@ -14349,7 +14349,7 @@  mips_ip (char *str, struct mips_cl_insn
   opcode_extra = 0;
 
   /* We first try to match an instruction up to a space or to the end.  */
-  for (end = 0; str[end] != '\0' && !ISSPACE (str[end]); end++)
+  for (end = 0; !is_end_of_stmt (str[end]) && !is_whitespace (str[end]); end++)
     continue;
 
   first = mips_lookup_insn (hash, str, end, &opcode_extra);
@@ -14388,7 +14388,7 @@  mips16_ip (char *str, struct mips_cl_ins
   struct mips_operand_token *tokens;
   unsigned int l;
 
-  for (s = str; *s != '\0' && *s != '.' && *s != ' '; ++s)
+  for (s = str; *s != '\0' && *s != '.' && !is_whitespace (*s); ++s)
     ;
   end = s;
   c = *end;
@@ -14399,8 +14399,9 @@  mips16_ip (char *str, struct mips_cl_ins
     case '\0':
       break;
 
-    case ' ':
-      s++;
+    default:
+      if (is_whitespace (*s))
+	s++;
       break;
 
     case '.':
@@ -14417,7 +14418,7 @@  mips16_ip (char *str, struct mips_cl_ins
 	}
       if (*s == '\0')
 	break;
-      else if (*s++ == ' ')
+      else if (is_whitespace (*s++))
 	break;
       set_insn_error (0, _("unrecognized opcode"));
       return;
@@ -14641,7 +14642,9 @@  parse_relocation (char **str, bfd_reloc_
       {
 	int len = strlen (percent_op[i].str);
 
-	if (!ISSPACE ((*str)[len]) && (*str)[len] != '(')
+	if (!is_end_of_stmt ((*str)[len])
+	    && !is_whitespace ((*str)[len])
+	    && (*str)[len] != '(')
 	  continue;
 
 	*str += strlen (percent_op[i].str);
@@ -14691,7 +14694,7 @@  my_getSmallExpression (expressionS *ep,
 
       /* Skip over whitespace and brackets, keeping count of the number
 	 of brackets.  */
-      while (*str == ' ' || *str == '\t' || *str == '(')
+      while (is_whitespace (*str) || *str == '(')
 	if (*str++ == '(')
 	  str_depth++;
     }
@@ -14703,7 +14706,7 @@  my_getSmallExpression (expressionS *ep,
   str = expr_parse_end;
 
   /* Match every open bracket.  */
-  while (crux_depth > 0 && (*str == ')' || *str == ' ' || *str == '\t'))
+  while (crux_depth > 0 && (*str == ')' || is_whitespace (*str)))
     if (*str++ == ')')
       crux_depth--;