[v2,00/65] gas: whitespace handling

Message ID 2316ac5c-7870-4b46-9c80-eaecaef93a31@suse.com
Headers
Series gas: whitespace handling |

Message

Jan Beulich Jan. 27, 2025, 3:23 p.m. UTC
  As per observations in target specific code there appears to be disagreement
across the assembler whether to check for specific characters (blank and tab
normally) or whether to use ISSPACE().

As agreed upon during the Cauldron in Prague, switch to a single base
construct for all code to use: is_whitespace().

It clearly is an alternative option to have is_whitespace() expand to
ISSPACE() or ISBLANK() (ISSPACE() also yields "true" for characters we don't
really consider whitespace), then (obviously) leaving out the last patch. See
also the CR_EOL uses in read.c and app.c. I think it is advisable though that
is_whitespace() and is_end_of_{line,stmt}() be non-overlapping; question then
is what (further) characters to tag as LEX_WHITE (see remarks in patch 01).

Along with recently (as of the v1 submission) committed work for x86 this
appears to be sufficient to actually use -f (or #NO_APP at start of file)
for gcc-generated code. I didn't properly check other architectures yet, but
I seem to recall that at least Arm32 and PPC would apparently require
compiler side adjustments, too.

01: consolidate whitespace recognition
02: gas/obj-*.c: use is_whitespace()
03: Alpha/EVAX: use is_whitespace() / is_end_of_stmt()
04: arc: use is_whitespace()
05: Arm: use is_whitespace()
06: aarch64: use is_whitespace()
07: avr: use is_whitespace()
08: bfin: use is_whitespace()
09: bpf: use is_whitespace()
10: CR16: use is_whitespace()
11: cris: use is_whitespace()
12: CRx: use is_whitespace()
13: C-Sky: use is_whitespace()
14: d10v: use is_whitespace()
15: d30v: use is_whitespace()
16: dlx: use is_whitespace()
17: Epiphany: use is_whitespace()
18: fr30: use is_whitespace()
19: ft32: use is_whitespace()
20: H8/300: use is_whitespace()
21: HP-PA: use is_whitespace()
22: kvx: use is_whitespace()
23: LoongArch: use is_whitespace()
24: m32c: use is_whitespace()
25: m32r: use is_whitespace()
26: M68HC1x: use is_whitespace()
27: M68k: use is_whitespace()
28: M*Core: use is_whitespace()
29: metag: use is_whitespace()
30: MicroBlaze: use is_whitespace()
31: MIPS: use is_whitespace()
32: MMIX: use is_whitespace()
33: mn10200: use is_whitespace()
34: mn10300: use is_whitespace()
35: Moxie: use is_whitespace()
36: msp430: use is_whitespace()
37: nds32: use is_whitespace()
38: NS32k: use is_whitespace()
39: PDP11: use is_whitespace()
40: PicoJava: use is_whitespace()
41: PPC: use is_whitespace()
42: pru: use is_whitespace()
43: RISC-V: use is_whitespace()
44: rl78: use is_whitespace()
45: rx: use is_whitespace()
46: s12z: use is_whitespace()
47: S/390: use is_whitespace()
48: Score: use is_whitespace()
49: SH: use is_whitespace()
50: Sparc: use is_whitespace()
51: spu: use is_whitespace()
52: C30: use is_whitespace()
53: C4x: use is_whitespace()
54: C54x: use is_whitespace()
55: C6x: use is_whitespace()
56: v850: use is_whitespace()
57: v850: use is_whitespace()
58: Visium: use is_whitespace()
59: wasm32: use is_whitespace()
60: x86: use is_whitespace()
61: xgate: use is_whitespace()
62: Xtensa: use is_whitespace()
63: Z80: use is_whitespace()
64: Z8k: use is_whitespace()
65: gas: suppress use of ISSPACE() / ISBLANK()

Jan
  

Comments

Hans-Peter Nilsson Jan. 28, 2025, 2:50 a.m. UTC | #1
Sorry for being a downer, but:

> Date: Mon, 27 Jan 2025 16:23:42 +0100
> From: Jan Beulich <jbeulich@suse.com>

> As per observations in target specific code there appears to be disagreement
> across the assembler whether to check for specific characters (blank and tab
> normally) or whether to use ISSPACE().
> 
> As agreed upon during the Cauldron in Prague, switch to a single base
> construct for all code to use: is_whitespace().

Such decisions should be made online, with the whole
community, not with the people attending a specific session
at a specific event.  (While I had the chance, I had no idea
executive decisions were about to take place.)

> It clearly is an alternative option to have is_whitespace() expand to
> ISSPACE() or ISBLANK() (ISSPACE() also yields "true" for characters we don't
> really consider whitespace), then (obviously) leaving out the last patch. See
> also the CR_EOL uses in read.c and app.c. I think it is advisable though that
> is_whitespace() and is_end_of_{line,stmt}() be non-overlapping; question then
> is what (further) characters to tag as LEX_WHITE (see remarks in patch 01).
> 
> Along with recently (as of the v1 submission) committed work for x86 this
> appears to be sufficient to actually use -f (or #NO_APP at start of file)
> for gcc-generated code.

Does that work also clean up gcc-generated code, like
dropping space after comma or multiple spaces or whatever is
judged the #NO_APP behavior of "x86 assembly"?  I don't see
such patches but maybe they're not posted yet.  It doesn't
just happen to be specified as what gcc generates on the
master branch of today?

> I didn't properly check other architectures yet, but
> I seem to recall that at least Arm32 and PPC would apparently require
> compiler side adjustments, too.

I can't help but thinking this is going ever so slightly in
the wrong direction with regards to #NO_APP: this change-set
is making that mode more lenient towards formatting;
allowing more types of space characters.  With the few
targets that have #NO_APP active in gcc-generated code, you
have the chance of making that mode more strict.

brgds, H-P
  
Jan Beulich Jan. 28, 2025, 7:40 a.m. UTC | #2
On 28.01.2025 03:50, Hans-Peter Nilsson wrote:
> Sorry for being a downer, but:
> 
>> Date: Mon, 27 Jan 2025 16:23:42 +0100
>> From: Jan Beulich <jbeulich@suse.com>
> 
>> As per observations in target specific code there appears to be disagreement
>> across the assembler whether to check for specific characters (blank and tab
>> normally) or whether to use ISSPACE().
>>
>> As agreed upon during the Cauldron in Prague, switch to a single base
>> construct for all code to use: is_whitespace().
> 
> Such decisions should be made online, with the whole
> community, not with the people attending a specific session
> at a specific event.  (While I had the chance, I had no idea
> executive decisions were about to take place.)

Well, here we are online. The series hasn't been committed yet, so
objections will be listened to. Albeit in objecting to certain aspects
please keep in mind what the overall goal is: To make #NO_APP and the
-f command line option work for more targets. And to have as uniform
behavior as possible in gas across targets.

As per your comment on the cris-specific patch I can't help the
impression that gcc avoiding to emit TABs for this target isn't
"happenstance" as you called it, but simply attributed to gas'es past
behavior.

>> It clearly is an alternative option to have is_whitespace() expand to
>> ISSPACE() or ISBLANK() (ISSPACE() also yields "true" for characters we don't
>> really consider whitespace), then (obviously) leaving out the last patch. See
>> also the CR_EOL uses in read.c and app.c. I think it is advisable though that
>> is_whitespace() and is_end_of_{line,stmt}() be non-overlapping; question then
>> is what (further) characters to tag as LEX_WHITE (see remarks in patch 01).
>>
>> Along with recently (as of the v1 submission) committed work for x86 this
>> appears to be sufficient to actually use -f (or #NO_APP at start of file)
>> for gcc-generated code.
> 
> Does that work also clean up gcc-generated code, like
> dropping space after comma or multiple spaces or whatever is
> judged the #NO_APP behavior of "x86 assembly"?  I don't see
> such patches but maybe they're not posted yet.  It doesn't
> just happen to be specified as what gcc generates on the
> master branch of today?

Well, what gcc presently emits needs to be accepted anyway. A goal is
specifically to get -f / #NO_APP working without needing to touch target
specific code in gcc, whenever possible. As said ...

>> I didn't properly check other architectures yet, but
>> I seem to recall that at least Arm32 and PPC would apparently require
>> compiler side adjustments, too.

... here, I'm aware that some targets will require some changes, but
x86 (according to my limited testing) is not among them. And those
changes are expected to be of limited nature, i.e. they're for example
not expected to touch *.md files al over the place. From what I was
able to see, the problems there are with how #APP / #NO_APP are emitted.

And btw - why would you apply different criteria to cris and x86? For
cris you said what gcc emits is unwritten but de-fact standard. Yet then
you question that same pre-condition to be applied to x86?

> I can't help but thinking this is going ever so slightly in
> the wrong direction with regards to #NO_APP: this change-set
> is making that mode more lenient towards formatting;
> allowing more types of space characters.  With the few
> targets that have #NO_APP active in gcc-generated code, you
> have the chance of making that mode more strict.

Have you ever wondered why it is only so few targets? Permitting TABs
in compiler generated output is, as indicated in the reply to the
cris-specific patch, a readability aid. That's certainly a personal
view, but one I happen to know is shared by many other people. That
said, I'm also aware that for various targets gcc presently avoids to
make use of TABs. Of the architectures I'm half-way familiar with it's
actually a minority, though (PPC and ia64 vs aarch64, Arm, RISC-V, and
x86).

Jan
  
Alan Modra Jan. 28, 2025, 9:59 a.m. UTC | #3
On Mon, Jan 27, 2025 at 04:23:42PM +0100, Jan Beulich wrote:
> As per observations in target specific code there appears to be disagreement
> across the assembler whether to check for specific characters (blank and tab
> normally) or whether to use ISSPACE().
> 
> As agreed upon during the Cauldron in Prague, switch to a single base
> construct for all code to use: is_whitespace().
> 
> It clearly is an alternative option to have is_whitespace() expand to
> ISSPACE() or ISBLANK() (ISSPACE() also yields "true" for characters we don't
> really consider whitespace), then (obviously) leaving out the last patch. See
> also the CR_EOL uses in read.c and app.c. I think it is advisable though that
> is_whitespace() and is_end_of_{line,stmt}() be non-overlapping; question then
> is what (further) characters to tag as LEX_WHITE (see remarks in patch 01).
> 
> Along with recently (as of the v1 submission) committed work for x86 this
> appears to be sufficient to actually use -f (or #NO_APP at start of file)
> for gcc-generated code. I didn't properly check other architectures yet, but
> I seem to recall that at least Arm32 and PPC would apparently require
> compiler side adjustments, too.

I like it, thanks!
  
Richard Earnshaw (lists) Jan. 28, 2025, 2:47 p.m. UTC | #4
On 28/01/2025 07:40, Jan Beulich wrote:
> On 28.01.2025 03:50, Hans-Peter Nilsson wrote:
>> Sorry for being a downer, but:
>>
>>> Date: Mon, 27 Jan 2025 16:23:42 +0100
>>> From: Jan Beulich <jbeulich@suse.com>
>>
>>> As per observations in target specific code there appears to be disagreement
>>> across the assembler whether to check for specific characters (blank and tab
>>> normally) or whether to use ISSPACE().
>>>
>>> As agreed upon during the Cauldron in Prague, switch to a single base
>>> construct for all code to use: is_whitespace().
>>
>> Such decisions should be made online, with the whole
>> community, not with the people attending a specific session
>> at a specific event.  (While I had the chance, I had no idea
>> executive decisions were about to take place.)
> 
> Well, here we are online. The series hasn't been committed yet, so
> objections will be listened to. Albeit in objecting to certain aspects
> please keep in mind what the overall goal is: To make #NO_APP and the
> -f command line option work for more targets. And to have as uniform
> behavior as possible in gas across targets.
> 
> As per your comment on the cris-specific patch I can't help the
> impression that gcc avoiding to emit TABs for this target isn't
> "happenstance" as you called it, but simply attributed to gas'es past
> behavior.
> 
>>> It clearly is an alternative option to have is_whitespace() expand to
>>> ISSPACE() or ISBLANK() (ISSPACE() also yields "true" for characters we don't
>>> really consider whitespace), then (obviously) leaving out the last patch. See
>>> also the CR_EOL uses in read.c and app.c. I think it is advisable though that
>>> is_whitespace() and is_end_of_{line,stmt}() be non-overlapping; question then
>>> is what (further) characters to tag as LEX_WHITE (see remarks in patch 01).
>>>
>>> Along with recently (as of the v1 submission) committed work for x86 this
>>> appears to be sufficient to actually use -f (or #NO_APP at start of file)
>>> for gcc-generated code.
>>
>> Does that work also clean up gcc-generated code, like
>> dropping space after comma or multiple spaces or whatever is
>> judged the #NO_APP behavior of "x86 assembly"?  I don't see
>> such patches but maybe they're not posted yet.  It doesn't
>> just happen to be specified as what gcc generates on the
>> master branch of today?
> 
> Well, what gcc presently emits needs to be accepted anyway. A goal is
> specifically to get -f / #NO_APP working without needing to touch target
> specific code in gcc, whenever possible. As said ...
> 
>>> I didn't properly check other architectures yet, but
>>> I seem to recall that at least Arm32 and PPC would apparently require
>>> compiler side adjustments, too.
> 
> ... here, I'm aware that some targets will require some changes, but
> x86 (according to my limited testing) is not among them. And those
> changes are expected to be of limited nature, i.e. they're for example
> not expected to touch *.md files al over the place. From what I was
> able to see, the problems there are with how #APP / #NO_APP are emitted.
> 
> And btw - why would you apply different criteria to cris and x86? For
> cris you said what gcc emits is unwritten but de-fact standard. Yet then
> you question that same pre-condition to be applied to x86?
> 
>> I can't help but thinking this is going ever so slightly in
>> the wrong direction with regards to #NO_APP: this change-set
>> is making that mode more lenient towards formatting;
>> allowing more types of space characters.  With the few
>> targets that have #NO_APP active in gcc-generated code, you
>> have the chance of making that mode more strict.
> 
> Have you ever wondered why it is only so few targets? Permitting TABs
> in compiler generated output is, as indicated in the reply to the
> cris-specific patch, a readability aid. That's certainly a personal
> view, but one I happen to know is shared by many other people. That
> said, I'm also aware that for various targets gcc presently avoids to
> make use of TABs. Of the architectures I'm half-way familiar with it's
> actually a minority, though (PPC and ia64 vs aarch64, Arm, RISC-V, and
> x86).
> 
> Jan

Out of interest, did you consider making the scrubber translate all horizontal white space characters into a single space character (as opposed to just collapsing multiple spaces into one)?  This would eliminate the need for each backend to handle multiple type of space character.

R.
  
Jan Beulich Jan. 28, 2025, 3 p.m. UTC | #5
On 28.01.2025 15:47, Richard Earnshaw (lists) wrote:
> Out of interest, did you consider making the scrubber translate all horizontal white space characters into a single space character (as opposed to just collapsing multiple spaces into one)?  This would eliminate the need for each backend to handle multiple type of space character.

I guess I'm confused: How would altering scrubber behavior matter when
the scrubber is bypassed (by -f or #NO_APP)? And then: This collapsing
already is what the scrubber does (and hence why many targets were
able to get away without checking for \t, and with -f/#NO_APP known to
not be working there), aiui.

Jan
  
Hans-Peter Nilsson Jan. 28, 2025, 3:25 p.m. UTC | #6
> Date: Tue, 28 Jan 2025 08:40:02 +0100
> From: Jan Beulich <jbeulich@suse.com>

> On 28.01.2025 03:50, Hans-Peter Nilsson wrote:
> As per your comment on the cris-specific patch I can't help the
> impression that gcc avoiding to emit TABs for this target isn't
> "happenstance" as you called it, but simply attributed to gas'es past
> behavior.

No, it's deliberate.  IMHO \\t (the literally characters)
makes for less readable code in the .md, and a literal TAB
looks indention-wise weird (not sure if it was always valid.
Compare "adcs%?\\t%0, %1, #0" to "adcs%? %0,%1,#0" (random
example from arm.md).  In the generated code, I guess it
depends on your preference; whether columns lining up is
better than the slightly extra horizontal distance.

> > Does that work also clean up gcc-generated code, like
> > dropping space after comma or multiple spaces or whatever is
> > judged the #NO_APP behavior of "x86 assembly"?  I don't see
> > such patches but maybe they're not posted yet.  It doesn't
> > just happen to be specified as what gcc generates on the
> > master branch of today?
> 
> Well, what gcc presently emits needs to be accepted anyway. A goal is
> specifically to get -f / #NO_APP working without needing to touch target
> specific code in gcc, whenever possible.

...and putting the burden on the assembler to do the
post-scrubber processing in your patches.  IOW, while it's
nice if more targets can skip the scrubbing (and joining
#NO_APP actually being in effect), it's more processing once
that state is entered.

> And btw - why would you apply different criteria to cris and x86? For
> cris you said what gcc emits is unwritten but de-fact standard. Yet then
> you question that same pre-condition to be applied to x86?

Different criterias for "tier-1" versus "tier-N", (N > 2)
targets isn't exactly a new concept.

> > I can't help but thinking this is going ever so slightly in
> > the wrong direction with regards to #NO_APP: this change-set
> > is making that mode more lenient towards formatting;
> > allowing more types of space characters.  With the few
> > targets that have #NO_APP active in gcc-generated code, you
> > have the chance of making that mode more strict.
> 
> Have you ever wondered why it is only so few targets?

Only once.  Then I looked and found out, some 30+ years ago. :)

> Permitting TABs
> in compiler generated output is, as indicated in the reply to the
> cris-specific patch, a readability aid.

To that I'll just say: "\\t"! :-)

I'll admit that's the gcc "input" - but which you seem to
overlook in your readability argumentation!  And with that,
I see the discussion derails...

> That's certainly a personal
> view, but one I happen to know is shared by many other people. That
> said, I'm also aware that for various targets gcc presently avoids to
> make use of TABs. Of the architectures I'm half-way familiar with it's
> actually a minority, though (PPC and ia64 vs aarch64, Arm, RISC-V, and
> x86).

...with this "argumentum ad populum".  Let's please drop the
TAB vs space part of the discussion and "agree to disagree".

brgds, H-P
  
Richard Earnshaw (lists) Jan. 28, 2025, 3:27 p.m. UTC | #7
On 28/01/2025 15:00, Jan Beulich wrote:
> On 28.01.2025 15:47, Richard Earnshaw (lists) wrote:
>> Out of interest, did you consider making the scrubber translate all horizontal white space characters into a single space character (as opposed to just collapsing multiple spaces into one)?  This would eliminate the need for each backend to handle multiple type of space character.
> 
> I guess I'm confused: How would altering scrubber behavior matter when
> the scrubber is bypassed (by -f or #NO_APP)? And then: This collapsing
> already is what the scrubber does (and hence why many targets were
> able to get away without checking for \t, and with -f/#NO_APP known to
> not be working there), aiui.
> 
> Jan

Sorry, it's me that's confused.  I hadn't realized that was what #NO_APP did.

R.