[RFC] gas: Do not eliminate '\()' if there is still .irp is not fully translated

Message ID 20240906065036.4145838-1-haochen.jiang@intel.com
State New
Headers
Series [RFC] gas: Do not eliminate '\()' if there is still .irp is not fully translated |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Test passed

Commit Message

Jiang, Haochen Sept. 6, 2024, 6:50 a.m. UTC
  Hi all,

Recently I found that if I am using nested .irp with \(), it would be wrongly
translated. It is caused by \() being eliminated too early. I tried with the
following patch to add an extra bool passing in sub_actual to record if there
is any .irp in mnemonic not handled before \() is handled.

The patch works for current x86 testcases. I may also try with aarch64 machines
if I could find one since macros are widely used in aarch testcases.

I suppose there might be a better option to do that so I called this an RFC
patch. The current method is quite straight-forward but seems too
straight-forward. Discussion is welcomed.

Thx,
Haochen

---
 
During AVX10.2 testcases refinement, I found that if the following .irp were
used, it would not be correctly translated:

	.irp m, madd, msub, nmadd, nmsub
	.irp n, 132, 213, 231
	vf\m\n\()nepbf16	%zmm4, %zmm5, %zmm6
	vf\m\n\()nepbf16	0x10000000(%esp, %esi, 8), %zmm5, %zmm6{%k7}
	vf\m\n\()nepbf16	(%ecx){1to32}, %zmm5, %zmm6
	vf\m\n\()nepbf16	8128(%ecx), %zmm5, %zmm6
	vf\m\n\()nepbf16	-256(%edx){1to32}, %zmm5, %zmm6{%k7}{z}
	.endr
	.endr

It is caused by macro_expand_body(), the function will directly eliminate \()
when first met. This is when \m gets translated but \n does not under this
scenario. However, it should be eliminated after \n is handled, or it will
result into vfmadd\nnepbf16. It leads to invalid character '\' eventually.

gas/ChangeLog:

	* macro.c (sub_actual): Pass the status if the exact sequence is
	found or not.
	(macro_expand_body): Change function call due to the addition of
	bool*. Do not eliminate '\()' if any .irp before '\()' is not fully
	translated.
---
 gas/macro.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)
  

Comments

Jan Beulich Sept. 6, 2024, 7:32 a.m. UTC | #1
On 06.09.2024 08:50, Haochen Jiang wrote:
> Recently I found that if I am using nested .irp with \(), it would be wrongly
> translated. It is caused by \() being eliminated too early. I tried with the
> following patch to add an extra bool passing in sub_actual to record if there
> is any .irp in mnemonic not handled before \() is handled.
> 
> The patch works for current x86 testcases. I may also try with aarch64 machines
> if I could find one since macros are widely used in aarch testcases.
> 
> I suppose there might be a better option to do that so I called this an RFC
> patch. The current method is quite straight-forward but seems too
> straight-forward. Discussion is welcomed.

I don't think this should be done. If such a construct is to survive the
first round of expansion, it simply needs further escaping. With what you
do, if I'm not mistaken you'd break possible existing uses. (Aiui you
don't really look for just \(), but any \(...). Which of course is correct
as far as not special casing the one form goes. But what's inside \(...)
may be intended to be processed in the first round, for the 2nd expansion
round to use the result with \( and ) already dropped.

Jan
  
Jiang, Haochen Sept. 6, 2024, 7:41 a.m. UTC | #2
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, September 6, 2024 3:32 PM
> 
> On 06.09.2024 08:50, Haochen Jiang wrote:
> > Recently I found that if I am using nested .irp with \(), it would be wrongly
> > translated. It is caused by \() being eliminated too early. I tried with the
> > following patch to add an extra bool passing in sub_actual to record if there
> > is any .irp in mnemonic not handled before \() is handled.
> >
> > The patch works for current x86 testcases. I may also try with aarch64
> machines
> > if I could find one since macros are widely used in aarch testcases.
> >
> > I suppose there might be a better option to do that so I called this an RFC
> > patch. The current method is quite straight-forward but seems too
> > straight-forward. Discussion is welcomed.
> 
> I don't think this should be done. If such a construct is to survive the
> first round of expansion, it simply needs further escaping. With what you
> do, if I'm not mistaken you'd break possible existing uses. (Aiui you
> don't really look for just \(), but any \(...). Which of course is correct
> as far as not special casing the one form goes. But what's inside \(...)
> may be intended to be processed in the first round, for the 2nd expansion
> round to use the result with \( and ) already dropped.

I c. I suppose maybe I should only keep \(). The current change impacts too
much, which impacts \(...). This is not what I expected to do.

Thx,
Haochen

> 
> Jan
  
Jan Beulich Sept. 6, 2024, 7:46 a.m. UTC | #3
On 06.09.2024 09:41, Jiang, Haochen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, September 6, 2024 3:32 PM
>>
>> On 06.09.2024 08:50, Haochen Jiang wrote:
>>> Recently I found that if I am using nested .irp with \(), it would be wrongly
>>> translated. It is caused by \() being eliminated too early. I tried with the
>>> following patch to add an extra bool passing in sub_actual to record if there
>>> is any .irp in mnemonic not handled before \() is handled.
>>>
>>> The patch works for current x86 testcases. I may also try with aarch64
>> machines
>>> if I could find one since macros are widely used in aarch testcases.
>>>
>>> I suppose there might be a better option to do that so I called this an RFC
>>> patch. The current method is quite straight-forward but seems too
>>> straight-forward. Discussion is welcomed.
>>
>> I don't think this should be done. If such a construct is to survive the
>> first round of expansion, it simply needs further escaping. With what you
>> do, if I'm not mistaken you'd break possible existing uses. (Aiui you
>> don't really look for just \(), but any \(...). Which of course is correct
>> as far as not special casing the one form goes. But what's inside \(...)
>> may be intended to be processed in the first round, for the 2nd expansion
>> round to use the result with \( and ) already dropped.
> 
> I c. I suppose maybe I should only keep \(). The current change impacts too
> much, which impacts \(...). This is not what I expected to do.

Yet as said - dealing with just \() is too narrow a special case. Plus even
that separator may be intended to be processed the 1st time round, and the
2nd round then using the result. You really want to change your use of the
construct to become \\(). No need to touch how macro expansion works.

Jan
  
Jiang, Haochen Sept. 6, 2024, 7:47 a.m. UTC | #4
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, September 6, 2024 3:46 PM
> 
> On 06.09.2024 09:41, Jiang, Haochen wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Friday, September 6, 2024 3:32 PM
> >>
> >> On 06.09.2024 08:50, Haochen Jiang wrote:
> >>> Recently I found that if I am using nested .irp with \(), it would be wrongly
> >>> translated. It is caused by \() being eliminated too early. I tried with the
> >>> following patch to add an extra bool passing in sub_actual to record if
> there
> >>> is any .irp in mnemonic not handled before \() is handled.
> >>>
> >>> The patch works for current x86 testcases. I may also try with aarch64
> >> machines
> >>> if I could find one since macros are widely used in aarch testcases.
> >>>
> >>> I suppose there might be a better option to do that so I called this an RFC
> >>> patch. The current method is quite straight-forward but seems too
> >>> straight-forward. Discussion is welcomed.
> >>
> >> I don't think this should be done. If such a construct is to survive the
> >> first round of expansion, it simply needs further escaping. With what you
> >> do, if I'm not mistaken you'd break possible existing uses. (Aiui you
> >> don't really look for just \(), but any \(...). Which of course is correct
> >> as far as not special casing the one form goes. But what's inside \(...)
> >> may be intended to be processed in the first round, for the 2nd expansion
> >> round to use the result with \( and ) already dropped.
> >
> > I c. I suppose maybe I should only keep \(). The current change impacts too
> > much, which impacts \(...). This is not what I expected to do.
> 
> Yet as said - dealing with just \() is too narrow a special case. Plus even
> that separator may be intended to be processed the 1st time round, and the
> 2nd round then using the result. You really want to change your use of the
> construct to become \\(). No need to touch how macro expansion works.

Aha, that is a good point. It should work and then no need to change all of
them.

Thx,
Haochen

> 
> Jan
  

Patch

diff --git a/gas/macro.c b/gas/macro.c
index 8b376f7f490..ee4caac9cc7 100644
--- a/gas/macro.c
+++ b/gas/macro.c
@@ -754,7 +754,7 @@  get_apost_token (size_t idx, sb *in, sb *name, int kind)
 
 static size_t
 sub_actual (size_t start, sb *in, sb *t, struct htab *formal_hash,
-	    int kind, sb *out, int copyifnotthere)
+	    int kind, sb *out, int copyifnotthere, bool *found)
 {
   size_t src;
   formal_entry *ptr;
@@ -778,6 +778,7 @@  sub_actual (size_t start, sb *in, sb *t, struct htab *formal_hash,
 	{
 	  sb_add_sb (out, &ptr->def);
 	}
+      *found = true;
     }
   else if (kind == '&')
     {
@@ -786,15 +787,18 @@  sub_actual (size_t start, sb *in, sb *t, struct htab *formal_hash,
       sb_add_sb (out, t);
       if (src != start && in->ptr[src - 1] == '&')
 	sb_add_char (out, '&');
+      *found = true;
     }
   else if (copyifnotthere)
     {
       sb_add_sb (out, t);
+      *found = true;
     }
   else
     {
       sb_add_char (out, '\\');
       sb_add_sb (out, t);
+      *found = false;
     }
   return src;
 }
@@ -811,6 +815,7 @@  macro_expand_body (sb *in, sb *out, formal_entry *formals,
   int inquote = 0, macro_line = 0;
   formal_entry *loclist = NULL;
   const char *err = NULL;
+  bool found = true;
 
   sb_new (&t);
 
@@ -822,7 +827,8 @@  macro_expand_body (sb *in, sb *out, formal_entry *formals,
 	  if (flag_mri)
 	    {
 	      if (src + 1 < in->len && in->ptr[src + 1] == '&')
-		src = sub_actual (src + 2, in, &t, formal_hash, '\'', out, 1);
+		src = sub_actual (src + 2, in, &t, formal_hash, '\'',
+				  out, 1, &found);
 	      else
 		sb_add_char (out, in->ptr[src++]);
 	    }
@@ -830,13 +836,14 @@  macro_expand_body (sb *in, sb *out, formal_entry *formals,
 	    {
 	      /* Permit macro parameter substitution delineated with
 		 an '&' prefix and optional '&' suffix.  */
-	      src = sub_actual (src + 1, in, &t, formal_hash, '&', out, 0);
+	      src = sub_actual (src + 1, in, &t, formal_hash, '&',
+				out, 0, &found);
 	    }
 	}
       else if (in->ptr[src] == '\\')
 	{
 	  src++;
-	  if (src < in->len && in->ptr[src] == '(')
+	  if (src < in->len && in->ptr[src] == '(' && found)
 	    {
 	      /* Sub in till the next ')' literally.  */
 	      src++;
@@ -904,7 +911,8 @@  macro_expand_body (sb *in, sb *out, formal_entry *formals,
 	  else
 	    {
 	      sb_reset (&t);
-	      src = sub_actual (src, in, &t, formal_hash, '\'', out, 0);
+	      src = sub_actual (src, in, &t, formal_hash, '\'',
+				out, 0, &found);
 	    }
 	}
       else if ((flag_macro_alternate || flag_mri)
@@ -923,7 +931,7 @@  macro_expand_body (sb *in, sb *out, formal_entry *formals,
 	      sb_reset (&t);
 	      src = sub_actual (src, in, &t, formal_hash,
 				(macro_strip_at && inquote) ? '@' : '\'',
-				out, 1);
+				out, 1, &found);
 	    }
 	  else
 	    {