[v2,31/65] MIPS: use is_whitespace()
Checks
Commit Message
... for consistency of recognition of what is deemed whitespace.
At the same time use is_end_of_stmt() instead of an open-coded nul char
check, and check for statement end in the first place in
parse_relocation().
---
v2: New.
Comments
On Mon, 27 Jan 2025, Jan Beulich wrote:
> --- a/gas/config/tc-mips.c
> +++ b/gas/config/tc-mips.c
> @@ -14388,7 +14388,7 @@ mips16_ip (char *str, struct mips_cl_ins
> struct mips_operand_token *tokens;
> unsigned int l;
>
> - for (s = str; *s != '\0' && *s != '.' && *s != ' '; ++s)
> + for (s = str; *s != '\0' && *s != '.' && !is_whitespace (*s); ++s)
^^^^^^^^^^
Shouldn't this also be `!is_end_of_stmt (*s)'?
> @@ -14399,8 +14399,9 @@ mips16_ip (char *str, struct mips_cl_ins
> case '\0':
> break;
>
> - case ' ':
> - s++;
> + default:
> + if (is_whitespace (*s))
> + s++;
> break;
Why `is_whitespace (*s)' rather than `is_whitespace (c)'?
I think this only causes obfuscation to this already messed up statement.
Since there are only two cases here really ('\0' does nothing and is the
only remaining possibility here, guaranteed by the loop right above) can
you please rewrite this as:
if (c == '.')
{
...
}
else if (is_whitespace (c))
s++;
or suchlike?
> @@ -14417,7 +14418,7 @@ mips16_ip (char *str, struct mips_cl_ins
> }
> if (*s == '\0')
And `is_end_of_stmt (*s)' here presumably too?
Otherwise OK, I think.
Maciej
On 03.02.2025 15:07, Maciej W. Rozycki wrote:
> On Mon, 27 Jan 2025, Jan Beulich wrote:
>
>> --- a/gas/config/tc-mips.c
>> +++ b/gas/config/tc-mips.c
>> @@ -14388,7 +14388,7 @@ mips16_ip (char *str, struct mips_cl_ins
>> struct mips_operand_token *tokens;
>> unsigned int l;
>>
>> - for (s = str; *s != '\0' && *s != '.' && *s != ' '; ++s)
>> + for (s = str; *s != '\0' && *s != '.' && !is_whitespace (*s); ++s)
> ^^^^^^^^^^
> Shouldn't this also be `!is_end_of_stmt (*s)'?
This is the kind of question you as the target maintainer really
will want to answer. All I could do is try to guess where
conversion might be on order.
>> @@ -14399,8 +14399,9 @@ mips16_ip (char *str, struct mips_cl_ins
>> case '\0':
>> break;
>>
>> - case ' ':
>> - s++;
>> + default:
>> + if (is_whitespace (*s))
>> + s++;
>> break;
>
> Why `is_whitespace (*s)' rather than `is_whitespace (c)'?
Right, I should have noticed to use c here.
> I think this only causes obfuscation to this already messed up statement.
> Since there are only two cases here really ('\0' does nothing and is the
> only remaining possibility here, guaranteed by the loop right above) can
> you please rewrite this as:
>
> if (c == '.')
> {
> ...
> }
> else if (is_whitespace (c))
> s++;
>
> or suchlike?
Possibly. On a similar question from Richard on aarch64 I indicated that
from other projects I'm working on I'm used to using switch() in such
cases, even if at a certain point there may be just a single case label.
This is to ease future addition of new further labels.
>> @@ -14417,7 +14418,7 @@ mips16_ip (char *str, struct mips_cl_ins
>> }
>> if (*s == '\0')
>
> And `is_end_of_stmt (*s)' here presumably too?
See above. It has been a lot of targets all doing things (often just
slightly) differently, so I can only guess that I may have got the
impression that somewhere up the callstack something nul-terminates the
string.
Jan
On Mon, 3 Feb 2025, Jan Beulich wrote:
> >> --- a/gas/config/tc-mips.c
> >> +++ b/gas/config/tc-mips.c
> >> @@ -14388,7 +14388,7 @@ mips16_ip (char *str, struct mips_cl_ins
> >> struct mips_operand_token *tokens;
> >> unsigned int l;
> >>
> >> - for (s = str; *s != '\0' && *s != '.' && *s != ' '; ++s)
> >> + for (s = str; *s != '\0' && *s != '.' && !is_whitespace (*s); ++s)
> > ^^^^^^^^^^
> > Shouldn't this also be `!is_end_of_stmt (*s)'?
>
> This is the kind of question you as the target maintainer really
> will want to answer. All I could do is try to guess where
> conversion might be on order.
All I can say if `*s' is '\0', then we're at the end of an assembly
instruction mnemonic that has no operands following, so my understanding
is it is indeed the case for `!is_end_of_stmt (*s)'. However since you
made such changes elsewhere but chose not to make one on this occasion,
I've asked whether this has been intentional or just a missed case.
> > I think this only causes obfuscation to this already messed up statement.
> > Since there are only two cases here really ('\0' does nothing and is the
> > only remaining possibility here, guaranteed by the loop right above) can
> > you please rewrite this as:
> >
> > if (c == '.')
> > {
> > ...
> > }
> > else if (is_whitespace (c))
> > s++;
> >
> > or suchlike?
>
> Possibly. On a similar question from Richard on aarch64 I indicated that
> from other projects I'm working on I'm used to using switch() in such
> cases, even if at a certain point there may be just a single case label.
> This is to ease future addition of new further labels.
This is generic MIPS assembly language syntax, which is unlikely to ever
change, and then for the MIPS16 intruction set, which has been effectively
a dead end for decades now, even the MIPS16e2 extension ~10 years ago was
a huge surprise and a one-off effort due to a specific customer request,
so we can safely assume nothing else will ever happen again here. So I
think we need to optimise for code clarity rather than minimising highly
unlikely future changes. Yes, you need the backend maintainer's knowledge
to decide here.
For the record we're handling explicit instruction size override suffixes
on MIPS16 mnemonics here, so the three cases the current switch statement
handles is:
- '\0': end of mnemonic, instruction ends w/o operands, no size override,
- ' ': end of mnemonic, operands follow, no size override,
- '.': end of mnemonic, a size override suffix follows (then either the
instruction ends or operands follow).
There's simply no room for expansion here, and as I say the instruction
set is a dead end and therefore in the maintenance mode.
NB this stuff is extensively covered and should therefore be safe to
apply cleanups to with little concern as to possible breakage, cf.:
$ grep '\.[et]\b' gas/testsuite/gas/mips/*.s
and I put significant effort to get this stuff right, as it used to be
broken. See commit 7fd539200562 ("MIPS16: Switch to 32-bit opcode table
interpretation"), commit 3fb49709438e ("MIPS16/GAS: Fix forced size
suffixes with argumentless instructions"), and commit 25499ac7ee92
("MIPS16e2: Add MIPS16e2 ASE support") for the most relevant changes.
> >> @@ -14417,7 +14418,7 @@ mips16_ip (char *str, struct mips_cl_ins
> >> }
> >> if (*s == '\0')
> >
> > And `is_end_of_stmt (*s)' here presumably too?
>
> See above. It has been a lot of targets all doing things (often just
> slightly) differently, so I can only guess that I may have got the
> impression that somewhere up the callstack something nul-terminates the
> string.
It is an analogous situation once the override suffix has been swallowed.
Maciej
On Mon, 3 Feb 2025, Maciej W. Rozycki wrote:
> > > I think this only causes obfuscation to this already messed up statement.
> > > Since there are only two cases here really ('\0' does nothing and is the
> > > only remaining possibility here, guaranteed by the loop right above) can
> > > you please rewrite this as:
> > >
> > > if (c == '.')
> > > {
> > > ...
> > > }
> > > else if (is_whitespace (c))
> > > s++;
> > >
> > > or suchlike?
> >
> > Possibly. On a similar question from Richard on aarch64 I indicated that
> > from other projects I'm working on I'm used to using switch() in such
> > cases, even if at a certain point there may be just a single case label.
> > This is to ease future addition of new further labels.
>
> This is generic MIPS assembly language syntax, which is unlikely to ever
> change, and then for the MIPS16 intruction set, which has been effectively
> a dead end for decades now, even the MIPS16e2 extension ~10 years ago was
> a huge surprise and a one-off effort due to a specific customer request,
> so we can safely assume nothing else will ever happen again here. So I
> think we need to optimise for code clarity rather than minimising highly
> unlikely future changes. Yes, you need the backend maintainer's knowledge
> to decide here.
I have made and committed this cleanup myself now.
As an upside this has let me discover and deal with a regression from my
fix in this area made years ago that caused invalid instruction mnemonics
ending with a dot to be assembled successfully as if the dot wasn't there.
I've left any conversion to `is_end_of_stmt' to a future update. At this
point I've concluded that since it wasn't made with your original commit
it makes no sense to me to do it piecemeal, so it'll be best made through
tc-mips.c in one go.
Maciej
@@ -45,7 +45,7 @@ typedef char static_assert2[sizeof (valu
#define streq(a, b) (strcmp (a, b) == 0)
#define SKIP_SPACE_TABS(S) \
- do { while (*(S) == ' ' || *(S) == '\t') ++(S); } while (0)
+ do { while (is_whitespace (*(S))) ++(S); } while (0)
/* Clean up namespace so we can include obj-elf.h too. */
static int mips_output_flavor (void);
@@ -14349,7 +14349,7 @@ mips_ip (char *str, struct mips_cl_insn
opcode_extra = 0;
/* We first try to match an instruction up to a space or to the end. */
- for (end = 0; str[end] != '\0' && !ISSPACE (str[end]); end++)
+ for (end = 0; !is_end_of_stmt (str[end]) && !is_whitespace (str[end]); end++)
continue;
first = mips_lookup_insn (hash, str, end, &opcode_extra);
@@ -14388,7 +14388,7 @@ mips16_ip (char *str, struct mips_cl_ins
struct mips_operand_token *tokens;
unsigned int l;
- for (s = str; *s != '\0' && *s != '.' && *s != ' '; ++s)
+ for (s = str; *s != '\0' && *s != '.' && !is_whitespace (*s); ++s)
;
end = s;
c = *end;
@@ -14399,8 +14399,9 @@ mips16_ip (char *str, struct mips_cl_ins
case '\0':
break;
- case ' ':
- s++;
+ default:
+ if (is_whitespace (*s))
+ s++;
break;
case '.':
@@ -14417,7 +14418,7 @@ mips16_ip (char *str, struct mips_cl_ins
}
if (*s == '\0')
break;
- else if (*s++ == ' ')
+ else if (is_whitespace (*s++))
break;
set_insn_error (0, _("unrecognized opcode"));
return;
@@ -14641,7 +14642,9 @@ parse_relocation (char **str, bfd_reloc_
{
int len = strlen (percent_op[i].str);
- if (!ISSPACE ((*str)[len]) && (*str)[len] != '(')
+ if (!is_end_of_stmt ((*str)[len])
+ && !is_whitespace ((*str)[len])
+ && (*str)[len] != '(')
continue;
*str += strlen (percent_op[i].str);
@@ -14691,7 +14694,7 @@ my_getSmallExpression (expressionS *ep,
/* Skip over whitespace and brackets, keeping count of the number
of brackets. */
- while (*str == ' ' || *str == '\t' || *str == '(')
+ while (is_whitespace (*str) || *str == '(')
if (*str++ == '(')
str_depth++;
}
@@ -14703,7 +14706,7 @@ my_getSmallExpression (expressionS *ep,
str = expr_parse_end;
/* Match every open bracket. */
- while (crux_depth > 0 && (*str == ')' || *str == ' ' || *str == '\t'))
+ while (crux_depth > 0 && (*str == ')' || is_whitespace (*str)))
if (*str++ == ')')
crux_depth--;