[1/8] Support APX GPR32 with rex2 prefix

Message ID 20231102112911.2372810-2-lili.cui@intel.com
State New
Headers
Series Support Intel APX EGPR |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_check--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply

Commit Message

Cui, Lili Nov. 2, 2023, 11:29 a.m. UTC
  APX uses the REX2 prefix to support EGPR for map0 and map1 of legacy
instructions. We added the NoEgpr flag in i386-gen.c for instructions
that do not support EGPR.

gas/ChangeLog:

2023-09-27  Lingling Kong <lingling.kong@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Lili Cui <lili.cui@intel.com>
	    Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c
	(enum i386_error): Add register_type_of_address_mismatch
	and invalid_pseudo_prefix.
	(struct _i386_insn): Add rex2 rex-byte and rex2_encoding for
	gpr32 r16-r31.
	(is_cpu): Add apx_f.
	(register_number): Handle RegRex2 for gpr32.
	(is_any_apx_rex2_encoding): New func. Test rex2 prefix encoding.
	(build_rex2_prefix): New func. Build legacy insn in
	opcode 0/1 use gpr32 with rex2 prefix.
	(optimize_encoding): Handel add r16-r31 for registers.
	(md_assemble): Handle apx encoding.
	(parse_insn): Handle Prefix_REX2.
	(check_EgprOperands): New func. Check if Egprs operands
	are valid for the instruction
	(match_template):  Handle Egpr operands check.
	(set_rex_rex2):  New func. set i.rex and i.rex2.
	(build_modrm_byte): Ditto.
	(output_insn): Handle rex2 2-byte prefix output.
	(check_register): Handle check egpr illegal without
	target apx, 64-bit mode and with rex_prefix.
	* doc/c-i386.texi: Document .apx.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d: D5 valid
	in 64-bit mode.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-inval-pseudo.l: Add rex2 invalid testcase.
	* testsuite/gas/i386/x86-64-inval-pseudo.s:  Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-pseudos-bad.l: Add illegal rex2 test.
	* testsuite/gas/i386/x86-64-pseudos-bad.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add rex2 test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Run APX tests.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.d: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.s: New test.

include/ChangeLog:

	* opcode/i386.h (REX2_OPCODE): Add REX2_OPCODE.

opcodes/ChangeLog:

	* i386-dis.c (struct instr_info): Add erex for gpr32.
	Add last_erex_prefix for rex2 prefix.
	(USED_REX2): Extend for gpr32.
	(REX2_M): Ditto.
	(PREFIX_REX2): Ditto.
	(ILLEGAL_PREFIX_REX2): Ditto.
	(ckprefix): Ditto.
	(prefix_name): Ditto.
	(print_insn): Ditto.
	(print_register): Ditto.
	(OP_E_memory): Ditto.
	(OP_REG): Ditto.
	(OP_EX): Ditto.
	* i386-gen.c (if_entry_needs_special_handle): News.
	(process_i386_opcode_modifier): Set NoEgpr for VEX and some special instructions.
	(output_i386_opcode): Handle if_entry_needs_special_handle.
	* i386-init.h : Regenerated.
	* i386-mnem.h : Regenerated.
	* i386-opc.h (enum i386_cpu): Add CpuAPX_F.
	(Prefix_NoOptimize): Ditto.
	(Prefix_REX2): Ditto.
	(RegRex2): Ditto.
	* i386-opc.tbl: Add rex2 prefix.
	* i386-reg.tbl: Add egprs (r16-r31).
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          | 176 ++++++++++++++--
 gas/doc/c-i386.texi                           |   3 +-
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |   4 +-
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |   4 +-
 .../gas/i386/x86-64-apx-egpr-inval.l          |  23 +++
 .../gas/i386/x86-64-apx-egpr-inval.s          |  18 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 ++++++++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 ++++++++
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
 .../gas/i386/x86-64-opcode-inval-intel.d      |   4 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |   4 +-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  42 ++++
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  49 +++++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  21 ++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  22 ++
 gas/testsuite/gas/i386/x86-64.exp             |   2 +
 include/opcode/i386.h                         |   2 +
 opcodes/i386-dis.c                            | 194 +++++++++++++-----
 opcodes/i386-gen.c                            |  48 ++++-
 opcodes/i386-opc.h                            |  12 +-
 opcodes/i386-opc.tbl                          |   2 +-
 opcodes/i386-reg.tbl                          |  64 ++++++
 23 files changed, 783 insertions(+), 90 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s
  

Comments

Jan Beulich Nov. 2, 2023, 5:05 p.m. UTC | #1
(for now only comments on i386-gen.c changes)

On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>    return elem_size;
>  }
>  
> +static bool
> +if_entry_needs_special_handle (const unsigned long long opcode, unsigned int space,
> +			       const char *cpu_flags)

This function wants to be named after its purpose, e.g. rex2_disallowed()
with its current return value arrangement. "needs special handling" is a
term that might be okay now, but what if you gus come up with REX3 in a
few years time which then again needs (a different kind of) special
handling?

> +{
> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> +      || strcmp (cpu_flags, "Xsave") >= 0
> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> +      || !strcmp (cpu_flags, "3dnow")
> +      || !strcmp (cpu_flags, "3dnowA"))
> +    return true;
> +
> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa* and map0 0x3*, 0x8*
> +     are reserved under REX2 and triggers #UD when prefixed with REX2 */
> +  if ((space == 0 && (opcode >> 4 == 0x4
> +		      || opcode >> 4 == 0x7
> +		      || opcode >> 4 == 0xA))

What about row 0xE? Plus in the comment the latter is map1, not (again) map0.

This also may be easier to express using 0x4490 and

> +      || (space == SPACE_0F && (opcode >> 4 == 0x3
> +				|| opcode >> 4 == 0x8)))

... 0x0108 as constants. Else I'd like to ask that switch() be used to
kept this halfway readable.

> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  	fprintf (stderr,
>  		 "%s: %d: W modifier without Word/Dword/Qword operand(s)\n",
>  		 filename, lineno);
> +
> +      /* The part about judging EVEX encoding should be synchronized with
> +	 is_evex_encoding.  */
> +      if (modifiers[Vex].value
> +	  || ((space > SPACE_0F || has_special_handle)
> +	      && !modifiers[EVex].value
> +	      && !modifiers[Disp8MemShift].value
> +	      && !modifiers[Broadcast].value
> +	      && !modifiers[Masking].value
> +	      && !modifiers[SAE].value))
> +	modifiers[NoEgpr].value = 1;
> +
>      }

The comment is one half of what's needed here. First, however, you want
to say a word on what this is about.

Jan
  
Cui, Lili Nov. 3, 2023, 6:20 a.m. UTC | #2
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> (for now only comments on i386-gen.c changes)
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >    return elem_size;
> >  }
> >
> > +static bool
> > +if_entry_needs_special_handle (const unsigned long long opcode, unsigned
> int space,
> > +			       const char *cpu_flags)
> 
> This function wants to be named after its purpose, e.g. rex2_disallowed() with
> its current return value arrangement. "needs special handling" is a term that
> might be okay now, but what if you gus come up with REX3 in a few years time
> which then again needs (a different kind of) special handling?
> 
Done.

> > +{
> > +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> > +#UD.  */
> > +  if (strcmp (cpu_flags, "XSAVES") >= 0
> > +      || strcmp (cpu_flags, "XSAVEC") >= 0
> > +      || strcmp (cpu_flags, "Xsave") >= 0
> > +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> > +      || !strcmp (cpu_flags, "3dnow")
> > +      || !strcmp (cpu_flags, "3dnowA"))
> > +    return true;
> > +
> > +  /* All opcodes listed map0 0x4*, 0x7*, 0xa* and map0 0x3*, 0x8*
> > +     are reserved under REX2 and triggers #UD when prefixed with REX2
> > +*/
> > +  if ((space == 0 && (opcode >> 4 == 0x4
> > +		      || opcode >> 4 == 0x7
> > +		      || opcode >> 4 == 0xA))
> 
> What about row 0xE? Plus in the comment the latter is map1, not (again)
> map0.
> 

Done, thanks.

> This also may be easier to express using 0x4490 and
> 
> > +      || (space == SPACE_0F && (opcode >> 4 == 0x3
> > +				|| opcode >> 4 == 0x8)))
> 
> ... 0x0108 as constants. Else I'd like to ask that switch() be used to kept this
> halfway readable.
> 

Changed it to

+  if (space == 0)
+    switch (opcode >> 4)
+      {
+      case 0x4:
+      case 0x7:
+      case 0xA:
+      case 0xE:
+       return true;
+      default:
+       return false;
+    }
+
+  if (space == SPACE_0F)
+    switch (opcode >> 4)
+      {
+      case 0x3:
+      case 0x8:
+       return true;
+      default:
+       return false;
+      }
+
> > @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char
> *mod, unsigned int space,
> >  	fprintf (stderr,
> >  		 "%s: %d: W modifier without Word/Dword/Qword
> operand(s)\n",
> >  		 filename, lineno);
> > +
> > +      /* The part about judging EVEX encoding should be synchronized with
> > +	 is_evex_encoding.  */
> > +      if (modifiers[Vex].value
> > +	  || ((space > SPACE_0F || has_special_handle)
> > +	      && !modifiers[EVex].value
> > +	      && !modifiers[Disp8MemShift].value
> > +	      && !modifiers[Broadcast].value
> > +	      && !modifiers[Masking].value
> > +	      && !modifiers[SAE].value))
> > +	modifiers[NoEgpr].value = 1;
> > +
> >      }
> 
> The comment is one half of what's needed here. First, however, you want to
> say a word on what this is about.
> 
Added. 

Thanks,
Lili
  
Jan Beulich Nov. 3, 2023, 1:05 p.m. UTC | #3
On 02.11.2023 18:05, Jan Beulich wrote:
> (for now only comments on i386-gen.c changes)
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>>    return elem_size;
>>  }
>>  
>> +static bool
>> +if_entry_needs_special_handle (const unsigned long long opcode, unsigned int space,
>> +			       const char *cpu_flags)
> 
> This function wants to be named after its purpose, e.g. rex2_disallowed()
> with its current return value arrangement. "needs special handling" is a
> term that might be okay now, but what if you gus come up with REX3 in a
> few years time which then again needs (a different kind of) special
> handling?

Actually, depending on its significance for later changes, egpr_disallowed()
might be a (longterm) better name.

Jan
  
Jan Beulich Nov. 3, 2023, 2:19 p.m. UTC | #4
On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -406,6 +409,11 @@ struct _i386_insn
>      /* Compressed disp8*N attribute.  */
>      unsigned int memshift;
>  
> +    /* No CSPAZO flags update.*/
> +    bool has_nf;
> +
> +    bool has_zero_upper;
> +

Can both please be introduced when they're needed, not randomly ahead
of time?

> @@ -2375,6 +2388,9 @@ register_number (const reg_entry *r)
>    if (r->reg_flags & RegRex)
>      nr += 8;
>  
> +  if (r->reg_flags & RegRex2)
> +    nr += 16;
> +
>    if (r->reg_flags & RegVRex)
>      nr += 16;

Perhaps fold to

    if (r->reg_flags & (RegVRex | RegRex2))
      nr += 16;

? Irrespective an assertion may be worthwhile that both flags aren't set
at the same time?

> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
>      i.vex.bytes[3] |= i.mask.reg->reg_num;
>  }
>  
> +/* Build (2 bytes) rex2 prefix.
> +   | D5h |
> +   | m | R4 X4 B4 | W R X B |
> +*/
> +static void
> +build_rex2_prefix (void)
> +{
> +  i.vex.length = 2;
> +  i.vex.bytes[0] = 0xd5;
> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> +		    | (i.rex2 << 4) | i.rex);
> +}

I may have asked on v1 already: For emitting REX we don't resort to
(ab)using i.vex. Is that really necessary? (If so, a comment next to
the field declaration may be warranted.)

Speaking of v1: Can you please make sure you have correct version tags
on submissions of updated patch versions?

> @@ -4423,12 +4460,16 @@ optimize_encoding (void)
>  	  i.suffix = 0;
>  	  /* Convert to byte registers.  */
>  	  if (i.types[1].bitfield.word)
> -	    j = 16;
> -	  else if (i.types[1].bitfield.dword)
> +	    /* There are 32 8-bit registers.  */

Please make sure comments are actually correct. With your additions
there are 40 8-bit registers; prior to that there were 24. The
j += 8 further down deal with that difference, and the comment here
(if one is to be added) wants to tell the full truth.

> @@ -5278,6 +5319,9 @@ md_assemble (char *line)
>  	case register_type_mismatch:
>  	  err_msg = _("register type mismatch");
>  	  break;
> +	case register_type_of_address_mismatch:
> +	  err_msg = _("register type of address mismatch");
> +	  break;

I have a concern with wording / naming here: If I saw this in an error
message, I wouldn't know what is meant. Maybe something along the lines
of "cannot use an extended GPR for addressing"? And then the enumerator
suitabley renamed as well?

> @@ -5578,7 +5625,7 @@ md_assemble (char *line)
>        as_warn (_("translating to `%sp'"), insn_name (&i.tm));
>      }
>  
> -  if (is_any_vex_encoding (&i.tm))
> + if (is_any_vex_encoding (&i.tm))
>      {

Stray change, breaking indentation?

> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
>  	  return;
>  	}
>  
> +      /* Check for explicit REX2 prefix.  */
> +      if (i.rex2 || i.rex2_encoding)

This open-codes is_any_apx_rex2_encoding(). But read on.

> +	{
> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));

There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is
what case the i.rex2 check above is intended to cover. Error message
comment, and condition want to reflect that.

> @@ -5633,11 +5687,11 @@ md_assemble (char *line)
>  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
>        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
>  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && i.rex != 0))
> +	  && (i.rex != 0 || i.rex2 != 0)))
>      {
>        int x;
> -
> -      i.rex |= REX_OPCODE;

Please don't remove blank lines like this.

> @@ -5647,9 +5701,11 @@ md_assemble (char *line)
>  	      gas_assert (!(i.op[x].regs->reg_flags & RegRex));
>  	      /* In case it is "hi" register, give up.  */
>  	      if (i.op[x].regs->reg_num > 3)
> -		as_bad (_("can't encode register '%s%s' in an "
> -			  "instruction requiring REX prefix."),
> -			register_prefix, i.op[x].regs->reg_name);
> +		{
> +		  as_bad (_("can't encode register '%s%s' in an "
> +			    "instruction requiring REX/REX2 prefix."),
> +			  register_prefix, i.op[x].regs->reg_name);
> +		}

There's no need to introduce braces here. Without doing so this will 
also be less of a change.

> @@ -6989,6 +7056,44 @@ VEX_check_encoding (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if Egprs operands are valid for the instruction.  */
> +
> +static int
> +check_EgprOperands (const insn_template *t)
> +{
> +  if (t->opcode_modifier.noegpr)
> +    {

This scope effectively covers the entire function. Did you consider

  if (!t->opcode_modifier.noegpr)
    return 0;

to aid readability?

> +      for (unsigned int op = 0; op < i.operands; op++)
> +	{
> +	  if (i.types[op].bitfield.class != Reg
> +	      /* Special case for (%dx) while doing input/output op */
> +	      || i.input_output_operand)

Why is this needed? The register table entry for %dx ...

> +	    continue;
> +
> +	  if (i.op[op].regs->reg_flags & RegRex2)

... doesn't have this bit set anyway.

> +	    {
> +	      i.error = register_type_mismatch;
> +	      return 1;
> +	    }
> +	}
> +
> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> +	{
> +	  i.error = register_type_of_address_mismatch;
> +	  return 1;
> +	}
> +
> +      /* Check pseudo prefix {rex2} are valid.  */
> +      if (i.rex2_encoding)
> +	{
> +	  i.error = invalid_pseudo_prefix;
> +	  return 1;
> +	}

Further up in md_assemble() {rex} or {rex2} is simply ignored when
wrong to apply. Why would an inapplicable {rex2} be treated as an
error here? This would then also ...

> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
>        /* Do not verify operands when there are none.  */
>        if (!t->operands)
>  	{
> -	  if (VEX_check_encoding (t))
> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
>  	    {
>  	      specific_error = progress (i.error);
>  	      continue;

... eliminate the need for this change, which is kind of bogus anyway:
There are no operands here, so calling a function of the given name is
at least suspicious.

> @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
>  	i.vec_encoding = vex_encoding_error;
>      }
>  
> +  if (r->reg_flags & RegRex2)
> +    {
> +      if (!cpu_arch_flags.bitfield.cpuapx_f
> +	  || flag_code != CODE_64BIT)
> +	return false;
> +    }

Please fold the two if()s into one (unless of course you know that the
outer one is going to be extended in a subsequent patch).

> --- a/gas/doc/c-i386.texi
> +++ b/gas/doc/c-i386.texi
> @@ -216,6 +216,7 @@ accept various extension mnemonics.  For example,
>  @code{avx10.1/512},
>  @code{avx10.1/256},
>  @code{avx10.1/128},
> +@code{apx},
>  @code{amx_int8},
>  @code{amx_bf16},
>  @code{amx_fp16},
> @@ -1662,7 +1663,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
>  @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
>  @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
>  @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
> -@item @samp{.tlbsync}
> +@item @samp{.tlbsync} @tab @samp{.apx}
>  @end multitable

DYM apx_f in both cases?

Also don't you need to also mention {rex2} somewhere in this file?

> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> @@ -11,11 +11,11 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
>  
>  0+1 <aad0>:
> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> +[ 	]*[a-f0-9]+:	d5                   	rex2
>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
>  
>  0+3 <aad1>:
> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> +[ 	]*[a-f0-9]+:	d5                   	rex2
>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
>  
>  0+5 <aam0>:
> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> @@ -11,11 +11,11 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
>  
>  0+1 <aad0>:
> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> +[ 	]*[a-f0-9]+:	d5                   	rex2
>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
>  
>  0+3 <aad1>:
> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> +[ 	]*[a-f0-9]+:	d5                   	rex2
>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
>  
>  0+5 <aam0>:

These expectations match the ones of the same test in the parent directory.
Hence instead of adjusting each in both places, please have the ones here
reference the parent directory files.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c

As before I'll look at the disassembler changes separately. This patch is
simply too big.

> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>    return elem_size;
>  }
>  
> +static bool
> +if_entry_needs_special_handle (const unsigned long long opcode, unsigned int space,
> +			       const char *cpu_flags)
> +{
> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> +      || strcmp (cpu_flags, "Xsave") >= 0
> +      || strcmp (cpu_flags, "Xsaveopt") >= 0

Upon further thought for these (and maybe even ...

> +      || !strcmp (cpu_flags, "3dnow")
> +      || !strcmp (cpu_flags, "3dnowA"))

... for these, but see also below) it might be better to add the attribute
right in the opcode table.

As to the 3dnow insns - I think I'd like to revise my earlier suggestion to
also tag those. Like e.g. FPU insns they're pretty normal GPR-wise, so
allowing them to be used like that would appear only consistent. Otherwise,
if we were concerned of AMD extensions in general, SSE4a insns (and maybe
further ones) would also need excluding. (Additionally recall that there's
an overlap between 3dnowa and SSE, which would result in another [apparent]
inconsistency when excluding 3dnow insns here.)

Jan
  
Jan Beulich Nov. 6, 2023, 3:02 p.m. UTC | #5
On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  	fprintf (stderr,
>  		 "%s: %d: W modifier without Word/Dword/Qword operand(s)\n",
>  		 filename, lineno);
> +
> +      /* The part about judging EVEX encoding should be synchronized with
> +	 is_evex_encoding.  */
> +      if (modifiers[Vex].value
> +	  || ((space > SPACE_0F || has_special_handle)
> +	      && !modifiers[EVex].value
> +	      && !modifiers[Disp8MemShift].value
> +	      && !modifiers[Broadcast].value
> +	      && !modifiers[Masking].value
> +	      && !modifiers[SAE].value))
> +	modifiers[NoEgpr].value = 1;

While this is just i386-gen (and hence being somewhat inefficient isn't the
end of the world) I still wonder whether we need all the parts of this condition:
Do we really need all the constituents of this EVEX related checks? Wouldn't it
also help is_evex_encoding() if we switched to uniformly having EVex attributes
on all EVEX templates? A presently missing EVex attribute, after all, merely is
another way of saying EVexDYN, if I'm not mistaken. (Such an adjustment, if
deemed to help, would of course want to come as a separate, prereq patch.)

Furthermore, is this correct at all for mixed VEX/EVEX templates?

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
>  <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>                        load:Load:0, store:Store:0, +
>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> -                      rex:REX:x64, nooptimize:NoOptimize:0>
> +                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>

Seeing this I realized that there's something missing here (an APX_F dependency),
which then again would not have had an effect without the patch [1] sent earlier
today.

Jan

[1] https://sourceware.org/pipermail/binutils/2023-November/130345.html
  
Cui, Lili Nov. 6, 2023, 3:20 p.m. UTC | #6
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -406,6 +409,11 @@ struct _i386_insn
> >      /* Compressed disp8*N attribute.  */
> >      unsigned int memshift;
> >
> > +    /* No CSPAZO flags update.*/
> > +    bool has_nf;
> > +
> > +    bool has_zero_upper;
> > +
> 
> Can both please be introduced when they're needed, not randomly ahead of
> time?
 
Moved has_nf to patch 2/8 and deleted has_zero_upper.

> > @@ -2375,6 +2388,9 @@ register_number (const reg_entry *r)
> >    if (r->reg_flags & RegRex)
> >      nr += 8;
> >
> > +  if (r->reg_flags & RegRex2)
> > +    nr += 16;
> > +
> >    if (r->reg_flags & RegVRex)
> >      nr += 16;
> 
> Perhaps fold to
> 
>     if (r->reg_flags & (RegVRex | RegRex2))
>       nr += 16;
> 
> ? Irrespective an assertion may be worthwhile that both flags aren't set at the
> same time?

Done.

> 
> > @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
> >      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >
> > +/* Build (2 bytes) rex2 prefix.
> > +   | D5h |
> > +   | m | R4 X4 B4 | W R X B |
> > +*/
> > +static void
> > +build_rex2_prefix (void)
> > +{
> > +  i.vex.length = 2;
> > +  i.vex.bytes[0] = 0xd5;
> > +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> > +		    | (i.rex2 << 4) | i.rex);
> > +}
> 
> I may have asked on v1 already: For emitting REX we don't resort to (ab)using
> i.vex. Is that really necessary? (If so, a comment next to the field declaration
> may be warranted.)
> 
Added comment for it.

  /* For the W R X B bits, the variables of rex prefix will be reused.  */
  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
                    | (i.rex2 << 4) | i.rex);

> Speaking of v1: Can you please make sure you have correct version tags on
> submissions of updated patch versions?
> 
I used git to send all the patches at once( git send-email  --cover-letter --annotate  --to="..." -8), which only has the opportunity to change the version of the cover letter patch. To change the version of each patch, I can send them one by one next time. By the way, do you have a better way? Or how did you modify them? Thanks.

> > @@ -4423,12 +4460,16 @@ optimize_encoding (void)
> >  	  i.suffix = 0;
> >  	  /* Convert to byte registers.  */
> >  	  if (i.types[1].bitfield.word)
> > -	    j = 16;
> > -	  else if (i.types[1].bitfield.dword)
> > +	    /* There are 32 8-bit registers.  */
> 
> Please make sure comments are actually correct. With your additions there
> are 40 8-bit registers; prior to that there were 24. The j += 8 further down deal
> with that difference, and the comment here (if one is to be added) wants to
> tell the full truth.
> 

Done.

> > @@ -5278,6 +5319,9 @@ md_assemble (char *line)
> >  	case register_type_mismatch:
> >  	  err_msg = _("register type mismatch");
> >  	  break;
> > +	case register_type_of_address_mismatch:
> > +	  err_msg = _("register type of address mismatch");
> > +	  break;
> 
> I have a concern with wording / naming here: If I saw this in an error message,
> I wouldn't know what is meant. Maybe something along the lines of "cannot
> use an extended GPR for addressing"? And then the enumerator suitabley
> renamed as well?
> 
 Changed to  

+       case unsupported_EGPR_for_addressing:
+         err_msg = _("unsupported EGPR for addressing");
+         break;

> > @@ -5578,7 +5625,7 @@ md_assemble (char *line)
> >        as_warn (_("translating to `%sp'"), insn_name (&i.tm));
> >      }
> >
> > -  if (is_any_vex_encoding (&i.tm))
> > + if (is_any_vex_encoding (&i.tm))
> >      {
> 
> Stray change, breaking indentation?
> 

Done.

> > @@ -5594,6 +5641,13 @@ md_assemble (char *line)
> >  	  return;
> >  	}
> >
> > +      /* Check for explicit REX2 prefix.  */
> > +      if (i.rex2 || i.rex2_encoding)
> 
> This open-codes is_any_apx_rex2_encoding(). But read on.
> 
> > +	{
> > +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> 
> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is what case
> the i.rex2 check above is intended to cover. Error message comment, and
> condition want to reflect that.
> 

Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase for it.

        {rex} vmovaps %xmm7,%xmm2
        {rex} vmovaps %xmm17,%xmm2
        {rex} rorx $7,%eax,%ebx
+       {rex2} vmovaps %xmm7,%xmm2

> > @@ -5633,11 +5687,11 @@ md_assemble (char *line)
> >  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> >        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> >  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && i.rex != 0))
> > +	  && (i.rex != 0 || i.rex2 != 0)))
> >      {
> >        int x;
> > -
> > -      i.rex |= REX_OPCODE;
> 
> Please don't remove blank lines like this.
> 

Done.

> > @@ -5647,9 +5701,11 @@ md_assemble (char *line)
> >  	      gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> >  	      /* In case it is "hi" register, give up.  */
> >  	      if (i.op[x].regs->reg_num > 3)
> > -		as_bad (_("can't encode register '%s%s' in an "
> > -			  "instruction requiring REX prefix."),
> > -			register_prefix, i.op[x].regs->reg_name);
> > +		{
> > +		  as_bad (_("can't encode register '%s%s' in an "
> > +			    "instruction requiring REX/REX2 prefix."),
> > +			  register_prefix, i.op[x].regs->reg_name);
> > +		}
> 
> There's no need to introduce braces here. Without doing so this will also be
> less of a change.
> 

Done.

> > @@ -6989,6 +7056,44 @@ VEX_check_encoding (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if Egprs operands are valid for the instruction.  */
> > +
> > +static int
> > +check_EgprOperands (const insn_template *t) {
> > +  if (t->opcode_modifier.noegpr)
> > +    {
> 
> This scope effectively covers the entire function. Did you consider
> 
>   if (!t->opcode_modifier.noegpr)
>     return 0;
> 
> to aid readability?
> 

Done.

> > +      for (unsigned int op = 0; op < i.operands; op++)
> > +	{
> > +	  if (i.types[op].bitfield.class != Reg
> > +	      /* Special case for (%dx) while doing input/output op */
> > +	      || i.input_output_operand)
> 
> Why is this needed? The register table entry for %dx ...
> 
> > +	    continue;
> > +
> > +	  if (i.op[op].regs->reg_flags & RegRex2)
> 
> ... doesn't have this bit set anyway.
> 

For this special case i.op is empty, we need continue, otherwise r i.op[op].regs->reg_flags  will cause segment fault.

> > +	    {
> > +	      i.error = register_type_mismatch;
> > +	      return 1;
> > +	    }
> > +	}
> > +
> > +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> > +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> > +	{
> > +	  i.error = register_type_of_address_mismatch;
> > +	  return 1;
> > +	}
> > +
> > +      /* Check pseudo prefix {rex2} are valid.  */
> > +      if (i.rex2_encoding)
> > +	{
> > +	  i.error = invalid_pseudo_prefix;
> > +	  return 1;
> > +	}
> 
> Further up in md_assemble() {rex} or {rex2} is simply ignored when wrong to
> apply. Why would an inapplicable {rex2} be treated as an error here? This
> would then also ...
> 
> > @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
> >        /* Do not verify operands when there are none.  */
> >        if (!t->operands)
> >  	{
> > -	  if (VEX_check_encoding (t))
> > +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> >  	    {
> >  	      specific_error = progress (i.error);
> >  	      continue;
> 
> ... eliminate the need for this change, which is kind of bogus anyway:
> There are no operands here, so calling a function of the given name is at least
> suspicious.
> 

We have these tests and I'm confused whether to remove them or not.

+       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
+       {rex2} wrmsr
+       {rex2} rdtsc
+       {rex2} rdmsr
+       {rex2} sysenter
+       {rex2} sysexitl
+       {rex2} rdpmc

> > @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
> >  	i.vec_encoding = vex_encoding_error;
> >      }
> >
> > +  if (r->reg_flags & RegRex2)
> > +    {
> > +      if (!cpu_arch_flags.bitfield.cpuapx_f
> > +	  || flag_code != CODE_64BIT)
> > +	return false;
> > +    }
> 
> Please fold the two if()s into one (unless of course you know that the outer
> one is going to be extended in a subsequent patch).
> 

Yes, other code will be added in the outer if with patch2/8.

> > --- a/gas/doc/c-i386.texi
> > +++ b/gas/doc/c-i386.texi
> > @@ -216,6 +216,7 @@ accept various extension mnemonics.  For example,
> > @code{avx10.1/512},  @code{avx10.1/256},  @code{avx10.1/128},
> > +@code{apx},
> >  @code{amx_int8},
> >  @code{amx_bf16},
> >  @code{amx_fp16},
> > @@ -1662,7 +1663,7 @@ supported on the CPU specified.  The choices for
> @var{cpu_type} are:
> >  @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab
> > @samp{.cx16}  @item @samp{.padlock} @tab @samp{.clzero} @tab
> > @samp{.mwaitx} @tab @samp{.rdpru}  @item @samp{.mcommit} @tab
> > @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb} -@item
> > @samp{.tlbsync}
> > +@item @samp{.tlbsync} @tab @samp{.apx}
> >  @end multitable
> 
> DYM apx_f in both cases?
> 
> Also don't you need to also mention {rex2} somewhere in this file?
> 

Done.

> > --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> > +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> > @@ -11,11 +11,11 @@ Disassembly of section .text:
> >  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >
> >  0+1 <aad0>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >
> >  0+3 <aad1>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >
> >  0+5 <aam0>:
> > --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> > +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> > @@ -11,11 +11,11 @@ Disassembly of section .text:
> >  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >
> >  0+1 <aad0>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >
> >  0+3 <aad1>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >
> >  0+5 <aam0>:
> 
> These expectations match the ones of the same test in the parent directory.
> Hence instead of adjusting each in both places, please have the ones here
> reference the parent directory files.
> 

They are used to test illegal opcodes for x86-64. Since D5 now makes sense, these two test cases were removed.

# All the followings are illegal opcodes for x86-64.
aad0:
        aad
aad1:
        aad $2

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> 
> As before I'll look at the disassembler changes separately. This patch is simply
> too big.
> 

Ok

> > @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >    return elem_size;
> >  }
> >
> > +static bool
> > +if_entry_needs_special_handle (const unsigned long long opcode, unsigned
> int space,
> > +			       const char *cpu_flags)
> > +{
> > +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> > +#UD.  */
> > +  if (strcmp (cpu_flags, "XSAVES") >= 0
> > +      || strcmp (cpu_flags, "XSAVEC") >= 0
> > +      || strcmp (cpu_flags, "Xsave") >= 0
> > +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> 
> Upon further thought for these (and maybe even ...
> 
> > +      || !strcmp (cpu_flags, "3dnow")
> > +      || !strcmp (cpu_flags, "3dnowA"))
> 
> ... for these, but see also below) it might be better to add the attribute right in
> the opcode table.
> 
> As to the 3dnow insns - I think I'd like to revise my earlier suggestion to also
> tag those. Like e.g. FPU insns they're pretty normal GPR-wise, so allowing them
> to be used like that would appear only consistent. Otherwise, if we were
> concerned of AMD extensions in general, SSE4a insns (and maybe further
> ones) would also need excluding. (Additionally recall that there's an overlap
> between 3dnowa and SSE, which would result in another [apparent]
> inconsistency when excluding 3dnow insns here.)
> 

I see, for example  I think I need to split this table into two parts, one is for SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }

Thanks,
Lili.
  
Jan Beulich Nov. 6, 2023, 3:39 p.m. UTC | #7
On 02.11.2023 12:29, Cui, Lili wrote:
> @@ -269,9 +275,17 @@ struct dis_private {
>        ins->rex_used |= REX_OPCODE;			\
>    }
>  
> +#define USED_REX2(value)				\
> +  {							\
> +    if ((ins->rex2 & value))				\
> +      ins->rex2_used |= value;				\
> +  }
> +
>  
>  #define EVEX_b_used 1

Nit: Please avoid (re)introducing double blank lines. Instead ...

>  #define EVEX_len_used 2
> +/* M0 in rex2 prefix represents map0 or map1.  */
> +#define REX2_M 0x8

... a blank line ahead of this insertion would be helpful.

> @@ -1872,23 +1888,23 @@ static const struct dis386 dis386[] = {
>    { "outs{b|}",		{ indirDXr, Xb }, 0 },
>    { X86_64_TABLE (X86_64_6F) },
>    /* 70 */
> -  { "joH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jnoH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jbH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jaeH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jeH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jneH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jbeH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jaH",		{ Jb, BND, cond_jump_flag }, 0 },
> +  { "joH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jnoH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jbH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jaeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jneH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jbeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jaH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
>    /* 78 */
> -  { "jsH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jnsH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jpH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jnpH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jlH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jgeH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jleH",		{ Jb, BND, cond_jump_flag }, 0 },
> -  { "jgH",		{ Jb, BND, cond_jump_flag }, 0 },
> +  { "jsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jnsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jnpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jlH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jgeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jleH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
> +  { "jgH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
>    /* 80 */
>    { REG_TABLE (REG_80) },
>    { REG_TABLE (REG_81) },
> @@ -1926,23 +1942,23 @@ static const struct dis386 dis386[] = {
>    { "sahf",		{ XX }, 0 },
>    { "lahf",		{ XX }, 0 },
>    /* a0 */
> -  { "mov%LB",		{ AL, Ob }, 0 },
> -  { "mov%LS",		{ eAX, Ov }, 0 },
> -  { "mov%LB",		{ Ob, AL }, 0 },
> -  { "mov%LS",		{ Ov, eAX }, 0 },
> -  { "movs{b|}",		{ Ybr, Xb }, 0 },
> -  { "movs{R|}",		{ Yvr, Xv }, 0 },
> -  { "cmps{b|}",		{ Xb, Yb }, 0 },
> -  { "cmps{R|}",		{ Xv, Yv }, 0 },
> +  { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
> +  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
> +  { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
> +  { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
> +  { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
> +  { "movs{R|}",		{ Yvr, Xv }, PREFIX_REX2_ILLEGAL },
> +  { "cmps{b|}",		{ Xb, Yb }, PREFIX_REX2_ILLEGAL },
> +  { "cmps{R|}",		{ Xv, Yv }, PREFIX_REX2_ILLEGAL },
>    /* a8 */
> -  { "testB",		{ AL, Ib }, 0 },
> -  { "testS",		{ eAX, Iv }, 0 },
> -  { "stosB",		{ Ybr, AL }, 0 },
> -  { "stosS",		{ Yvr, eAX }, 0 },
> -  { "lodsB",		{ ALr, Xb }, 0 },
> -  { "lodsS",		{ eAXr, Xv }, 0 },
> -  { "scasB",		{ AL, Yb }, 0 },
> -  { "scasS",		{ eAX, Yv }, 0 },
> +  { "testB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
> +  { "testS",		{ eAX, Iv }, PREFIX_REX2_ILLEGAL },
> +  { "stosB",		{ Ybr, AL }, PREFIX_REX2_ILLEGAL },
> +  { "stosS",		{ Yvr, eAX }, PREFIX_REX2_ILLEGAL },
> +  { "lodsB",		{ ALr, Xb }, PREFIX_REX2_ILLEGAL },
> +  { "lodsS",		{ eAXr, Xv }, PREFIX_REX2_ILLEGAL },
> +  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
> +  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
>    /* b0 */
>    { "movB",		{ RMAL, Ib }, 0 },
>    { "movB",		{ RMCL, Ib }, 0 },

Like in the i386-gen.c adjustments for row E look to be missing here, too.

> @@ -2091,12 +2107,12 @@ static const struct dis386 dis386_twobyte[] = {
>    { PREFIX_TABLE (PREFIX_0F2E) },
>    { PREFIX_TABLE (PREFIX_0F2F) },
>    /* 30 */
> -  { "wrmsr",		{ XX }, 0 },
> -  { "rdtsc",		{ XX }, 0 },
> -  { "rdmsr",		{ XX }, 0 },
> -  { "rdpmc",		{ XX }, 0 },
> -  { "sysenter",		{ SEP }, 0 },
> -  { "sysexit%LQ",	{ SEP }, 0 },
> +  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
> +  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
> +  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
> +  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
> +  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
> +  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
>    { Bad_Opcode },
>    { "getsec",		{ XX }, 0 },
>    /* 38 */

Down from here row 8 also wants adjustment afaict.

> @@ -8289,6 +8313,7 @@ ckprefix (instr_info *ins)
>  {
>    int i, length;
>    uint8_t newrex;
> +  unsigned char rex2_payload;

Please can this be restricted to the inner scope where it's used?

> @@ -9292,13 +9338,17 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        goto out;
>      }
>  
> -  if (*ins.codep == 0x0f)
> +  /* M0 in rex2 prefix represents map0 or map1.  */
> +  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))

I'm struggling with the M0 in the comment. DYM just M, or maybe REX2.M?

Also is this, ...

>      {
>        unsigned char threebyte;
>  
> -      ins.codep++;
> -      if (!fetch_code (info, ins.codep + 1))
> -	goto fetch_error_out;
> +      if (!ins.rex2)
> +	{
> +	  ins.codep++;
> +	  if (!fetch_code (info, ins.codep + 1))
> +	    goto fetch_error_out;
> +	}
>        threebyte = *ins.codep;
>        dp = &dis386_twobyte[threebyte];
>        ins.need_modrm = twobyte_has_modrm[threebyte];

... all the way to here, really correct for d5 00 0f?

> @@ -9454,6 +9504,14 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        goto out;
>      }
>  
> +  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
> +      && ins.last_rex2_prefix >= 0)
> +    {
> +      i386_dis_printf (info, dis_style_text, "(bad)");
> +      ret = ins.end_codep - priv.the_buffer;
> +      goto out;
> +    }
> +
>    switch (dp->prefix_requirement)
>      {
>      case PREFIX_DATA:
> @@ -9468,6 +9526,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        ins.used_prefixes |= PREFIX_DATA;
>        /* Fall through.  */
>      case PREFIX_OPCODE:
> +    case PREFIX_OPCODE | PREFIX_REX2_ILLEGAL:

May more robust to mask off PREFIX_REX2_ILLEGAL in the control expression
of the switch()? Or else why don't you move the if() immediately ahead of
the switch() into here, as a new case block?

> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        && !ins.need_vex && ins.last_rex_prefix >= 0)
>      ins.all_prefixes[ins.last_rex_prefix] = 0;
>  
> +  /* Check if the REX2 prefix is used.  */
> +  if (ins.last_rex2_prefix >= 0
> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> +	   && (ins.rex2 & 0x7))

DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0

> +	  || dp == &bad_opcode))

What is this last part of the condition about? Other prefix zapping
code doesn't have such.

> +    ins.all_prefixes[ins.last_rex2_prefix] = 0;
> +
>    /* Check if the SEG prefix is used.  */
>    if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
>  		       | PREFIX_FS | PREFIX_GS)) != 0
> @@ -9541,7 +9607,10 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>  	if (name == NULL)
>  	  abort ();
>  	prefix_length += strlen (name) + 1;
> -	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
> +	if (ins.all_prefixes[i] == REX2_OPCODE)
> +	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);

Do braces really count as part of the mnemonic?

> +	else
> +	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
>        }

Aren't you at risk of wrongly printing a REX prefix here if the high 4 bits
of the REX2 payload were all zero, but some of the low 4 bits turned out
unused?

> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
>      ins->illegal_masking = true;
>  
>    USED_REX (rexmask);
> +  USED_REX2 (rexmask);

Do both really need tracking separately? Whatever consumes REX.B will also
consume REX2.B4, an so on.

Jan
  
Jan Beulich Nov. 6, 2023, 4:08 p.m. UTC | #8
On 06.11.2023 16:20, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> @@ -406,6 +409,11 @@ struct _i386_insn
>>>      /* Compressed disp8*N attribute.  */
>>>      unsigned int memshift;
>>>
>>> +    /* No CSPAZO flags update.*/
>>> +    bool has_nf;
>>> +
>>> +    bool has_zero_upper;
>>> +
>>
>> Can both please be introduced when they're needed, not randomly ahead of
>> time?
>  
> Moved has_nf to patch 2/8 and deleted has_zero_upper.

Patch 2/8? Not in this series then, I suppose?

>>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
>>>
>>> +/* Build (2 bytes) rex2 prefix.
>>> +   | D5h |
>>> +   | m | R4 X4 B4 | W R X B |
>>> +*/
>>> +static void
>>> +build_rex2_prefix (void)
>>> +{
>>> +  i.vex.length = 2;
>>> +  i.vex.bytes[0] = 0xd5;
>>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>>> +		    | (i.rex2 << 4) | i.rex);
>>> +}
>>
>> I may have asked on v1 already: For emitting REX we don't resort to (ab)using
>> i.vex. Is that really necessary? (If so, a comment next to the field declaration
>> may be warranted.)
>>
> Added comment for it.
> 
>   /* For the W R X B bits, the variables of rex prefix will be reused.  */
>   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>                     | (i.rex2 << 4) | i.rex);

How does the comment relate to the (ab)use of i.vex?

>> Speaking of v1: Can you please make sure you have correct version tags on
>> submissions of updated patch versions?
>>
> I used git to send all the patches at once( git send-email  --cover-letter --annotate  --to="..." -8), which only has the opportunity to change the version of the cover letter patch. To change the version of each patch, I can send them one by one next time. By the way, do you have a better way? Or how did you modify them? Thanks.

Well, personally I don't use git to send patches. But I know people send
series with proper version tags throughout, all the time.

>>> @@ -5278,6 +5319,9 @@ md_assemble (char *line)
>>>  	case register_type_mismatch:
>>>  	  err_msg = _("register type mismatch");
>>>  	  break;
>>> +	case register_type_of_address_mismatch:
>>> +	  err_msg = _("register type of address mismatch");
>>> +	  break;
>>
>> I have a concern with wording / naming here: If I saw this in an error message,
>> I wouldn't know what is meant. Maybe something along the lines of "cannot
>> use an extended GPR for addressing"? And then the enumerator suitabley
>> renamed as well?
>>
>  Changed to  
> 
> +       case unsupported_EGPR_for_addressing:
> +         err_msg = _("unsupported EGPR for addressing");
> +         break;

May I suggest "extended GPR" in the message text (the enumerator is fine
to have EGPR)?

>>> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
>>>  	  return;
>>>  	}
>>>
>>> +      /* Check for explicit REX2 prefix.  */
>>> +      if (i.rex2 || i.rex2_encoding)
>>
>> This open-codes is_any_apx_rex2_encoding(). But read on.
>>
>>> +	{
>>> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
>>
>> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is what case
>> the i.rex2 check above is intended to cover. Error message comment, and
>> condition want to reflect that.
>>
> 
> Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase for it.
> 
>         {rex} vmovaps %xmm7,%xmm2
>         {rex} vmovaps %xmm17,%xmm2
>         {rex} rorx $7,%eax,%ebx
> +       {rex2} vmovaps %xmm7,%xmm2

Right, but please see my "optional vs required" comment in the pseudo-
prefix related patch I did send earlier today. I question the correctness
of the {rex} related check here, which would then extend to the {rex2}
one as well.

>>> +      for (unsigned int op = 0; op < i.operands; op++)
>>> +	{
>>> +	  if (i.types[op].bitfield.class != Reg
>>> +	      /* Special case for (%dx) while doing input/output op */
>>> +	      || i.input_output_operand)
>>
>> Why is this needed? The register table entry for %dx ...
>>
>>> +	    continue;
>>> +
>>> +	  if (i.op[op].regs->reg_flags & RegRex2)
>>
>> ... doesn't have this bit set anyway.
>>
> 
> For this special case i.op is empty, we need continue, otherwise r i.op[op].regs->reg_flags  will cause segment fault.

I vaguely recall commenting on this anomaly to H.J. - perhaps time to fix
that properly (in a separate patch), to not leave this trap open any longer?
(Otherwise at least a comment is needed here.)

>>> +	    {
>>> +	      i.error = register_type_mismatch;
>>> +	      return 1;
>>> +	    }
>>> +	}
>>> +
>>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>>> +	{
>>> +	  i.error = register_type_of_address_mismatch;
>>> +	  return 1;
>>> +	}
>>> +
>>> +      /* Check pseudo prefix {rex2} are valid.  */
>>> +      if (i.rex2_encoding)
>>> +	{
>>> +	  i.error = invalid_pseudo_prefix;
>>> +	  return 1;
>>> +	}
>>
>> Further up in md_assemble() {rex} or {rex2} is simply ignored when wrong to
>> apply. Why would an inapplicable {rex2} be treated as an error here? This
>> would then also ...
>>
>>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
>>>        /* Do not verify operands when there are none.  */
>>>        if (!t->operands)
>>>  	{
>>> -	  if (VEX_check_encoding (t))
>>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
>>>  	    {
>>>  	      specific_error = progress (i.error);
>>>  	      continue;
>>
>> ... eliminate the need for this change, which is kind of bogus anyway:
>> There are no operands here, so calling a function of the given name is at least
>> suspicious.
>>
> 
> We have these tests and I'm confused whether to remove them or not.
> 
> +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
> +       {rex2} wrmsr
> +       {rex2} rdtsc
> +       {rex2} rdmsr
> +       {rex2} sysenter
> +       {rex2} sysexitl
> +       {rex2} rdpmc

They should all stay. But as to my comment: There's no use of any eGPR
here. If you want to abuse that function and if there's no better
descriptive name for it, then once again at least a comment is needed.
(Considering this, the attribute's name NoEgpr is probably also
misleading in the cases here, i.e. when there are no operands. Hence,
if not to be renamed, requires yet another comment in i386-opc.h.)

>>> @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
>>>  	i.vec_encoding = vex_encoding_error;
>>>      }
>>>
>>> +  if (r->reg_flags & RegRex2)
>>> +    {
>>> +      if (!cpu_arch_flags.bitfield.cpuapx_f
>>> +	  || flag_code != CODE_64BIT)
>>> +	return false;
>>> +    }
>>
>> Please fold the two if()s into one (unless of course you know that the outer
>> one is going to be extended in a subsequent patch).
>>
> 
> Yes, other code will be added in the outer if with patch2/8.

Hmm, you again say patch 2/8, yet that patch in this series clearly
doesn't do anything like that.

>>> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
>>> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
>>> @@ -11,11 +11,11 @@ Disassembly of section .text:
>>>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
>>>
>>>  0+1 <aad0>:
>>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
>>> +[ 	]*[a-f0-9]+:	d5                   	rex2
>>>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
>>>
>>>  0+3 <aad1>:
>>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
>>> +[ 	]*[a-f0-9]+:	d5                   	rex2
>>>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
>>>
>>>  0+5 <aam0>:
>>> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
>>> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
>>> @@ -11,11 +11,11 @@ Disassembly of section .text:
>>>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
>>>
>>>  0+1 <aad0>:
>>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
>>> +[ 	]*[a-f0-9]+:	d5                   	rex2
>>>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
>>>
>>>  0+3 <aad1>:
>>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
>>> +[ 	]*[a-f0-9]+:	d5                   	rex2
>>>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
>>>
>>>  0+5 <aam0>:
>>
>> These expectations match the ones of the same test in the parent directory.
>> Hence instead of adjusting each in both places, please have the ones here
>> reference the parent directory files.
>>
> 
> They are used to test illegal opcodes for x86-64. Since D5 now makes sense, these two test cases were removed.
> 
> # All the followings are illegal opcodes for x86-64.
> aad0:
>         aad
> aad1:
>         aad $2

Right, but how does this relate to my request to simply fold the
expectations here with that of the same test in the parent directory?
(You'll find various examples under ilp32/ where I've done such
folding already, whenever I had to touch both instances anyway.)

>>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>>>    return elem_size;
>>>  }
>>>
>>> +static bool
>>> +if_entry_needs_special_handle (const unsigned long long opcode, unsigned
>> int space,
>>> +			       const char *cpu_flags)
>>> +{
>>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
>>> +#UD.  */
>>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
>>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
>>> +      || strcmp (cpu_flags, "Xsave") >= 0
>>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
>>
>> Upon further thought for these (and maybe even ...
>>
>>> +      || !strcmp (cpu_flags, "3dnow")
>>> +      || !strcmp (cpu_flags, "3dnowA"))
>>
>> ... for these, but see also below) it might be better to add the attribute right in
>> the opcode table.
>>
>> As to the 3dnow insns - I think I'd like to revise my earlier suggestion to also
>> tag those. Like e.g. FPU insns they're pretty normal GPR-wise, so allowing them
>> to be used like that would appear only consistent. Otherwise, if we were
>> concerned of AMD extensions in general, SSE4a insns (and maybe further
>> ones) would also need excluding. (Additionally recall that there's an overlap
>> between 3dnowa and SSE, which would result in another [apparent]
>> inconsistency when excluding 3dnow insns here.)
>>
> 
> I see, for example  I think I need to split this table into two parts, one is for SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
> pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }

I'm afraid I don't understand the question. All I've asked for is that
the special treatment of 3dnow insns be removed again. Unless you want
to special-case further insns; it's not really clear to me what's best,
as both approaches have noticable downsides (either we allow to encode
something which may never become valid, or we disallow something which
may become valid).

In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui
they can all be encoded with REX2.

Jan
  
Cui, Lili Nov. 7, 2023, 8:06 a.m. UTC | #9
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, November 6, 2023 11:03 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char
> *mod, unsigned int space,
> >  	fprintf (stderr,
> >  		 "%s: %d: W modifier without Word/Dword/Qword
> operand(s)\n",
> >  		 filename, lineno);
> > +
> > +      /* The part about judging EVEX encoding should be synchronized with
> > +	 is_evex_encoding.  */
> > +      if (modifiers[Vex].value
> > +	  || ((space > SPACE_0F || has_special_handle)
> > +	      && !modifiers[EVex].value
> > +	      && !modifiers[Disp8MemShift].value
> > +	      && !modifiers[Broadcast].value
> > +	      && !modifiers[Masking].value
> > +	      && !modifiers[SAE].value))
> > +	modifiers[NoEgpr].value = 1;
> 
> While this is just i386-gen (and hence being somewhat inefficient isn't the end
> of the world) I still wonder whether we need all the parts of this condition:
> Do we really need all the constituents of this EVEX related checks? Wouldn't it
> also help is_evex_encoding() if we switched to uniformly having EVex
> attributes on all EVEX templates? A presently missing EVex attribute, after all,
> merely is another way of saying EVexDYN, if I'm not mistaken. (Such an
> adjustment, if deemed to help, would of course want to come as a separate,
> prereq patch.)
> 

Yes, EVex is another way of saying EVexDYN, it should be appear in every EVEX template, when we merge EVex128, EVex256 and EVex512 into one template we omitted the expression of EVexDYN. So some EVEX templates don’t have this tag. If we want to re-add it, we need new values.

Such as:
vcvttps2dq, 0xF35B, AVX512F, Modrm|Masking|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }

> Furthermore, is this correct at all for mixed VEX/EVEX templates?
> 
After merging the templates we only have one entry and I prefer to set [NoEgpr].value to 1. Don't check NoEgpr for all EVEX instruction in check_EgprOperands function. 

check_EgprOperands (const insn_template *t)
 {
-  if (t->opcode_modifier.noegpr)
+  if (t->opcode_modifier.noegpr && !need_evex_encoding())


> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
> > <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >                        load:Load:0, store:Store:0, +
> >                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> > -                      rex:REX:x64, nooptimize:NoOptimize:0>
> > +                      rex:REX:x64, rex2:REX2:x64,
> > + nooptimize:NoOptimize:0>
> 
> Seeing this I realized that there's something missing here (an APX_F
> dependency), which then again would not have had an effect without the
> patch [1] sent earlier today.
> 
> Jan
> 
> [1] https://sourceware.org/pipermail/binutils/2023-November/130345.html

Changed to 

+#define APX_F_64 APX_F|x64
+
<pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:APX_F_64, nooptimize:NoOptimize:0>

When we have" x86: split insn templates' CPU field" in trunk, I will change it to #define APX_F_64 APX_F&x64.

Thanks,
Lili.
  
Jan Beulich Nov. 7, 2023, 10:20 a.m. UTC | #10
On 07.11.2023 09:06, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, November 6, 2023 11:03 PM
>>
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table, char
>> *mod, unsigned int space,
>>>  	fprintf (stderr,
>>>  		 "%s: %d: W modifier without Word/Dword/Qword
>> operand(s)\n",
>>>  		 filename, lineno);
>>> +
>>> +      /* The part about judging EVEX encoding should be synchronized with
>>> +	 is_evex_encoding.  */
>>> +      if (modifiers[Vex].value
>>> +	  || ((space > SPACE_0F || has_special_handle)
>>> +	      && !modifiers[EVex].value
>>> +	      && !modifiers[Disp8MemShift].value
>>> +	      && !modifiers[Broadcast].value
>>> +	      && !modifiers[Masking].value
>>> +	      && !modifiers[SAE].value))
>>> +	modifiers[NoEgpr].value = 1;
>>
>> While this is just i386-gen (and hence being somewhat inefficient isn't the end
>> of the world) I still wonder whether we need all the parts of this condition:
>> Do we really need all the constituents of this EVEX related checks? Wouldn't it
>> also help is_evex_encoding() if we switched to uniformly having EVex
>> attributes on all EVEX templates? A presently missing EVex attribute, after all,
>> merely is another way of saying EVexDYN, if I'm not mistaken. (Such an
>> adjustment, if deemed to help, would of course want to come as a separate,
>> prereq patch.)
>>
> 
> Yes, EVex is another way of saying EVexDYN, it should be appear in every EVEX template, when we merge EVex128, EVex256 and EVex512 into one template we omitted the expression of EVexDYN. So some EVEX templates don’t have this tag. If we want to re-add it, we need new values.

I don't understand. When there's (e.g.) EVex128, not EVexDYN should appear at
the same time. Otoh ...

> Such as:
> vcvttps2dq, 0xF35B, AVX512F, Modrm|Masking|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM }

... aiui this one could have EVexDYN added without change in behavior, but
would then allow being recognized as needed EVEX-encoding by just checking
the .evex field, not any of the other fields is_evex_encoding() presently
needs to check.

>> Furthermore, is this correct at all for mixed VEX/EVEX templates?
>>
> After merging the templates we only have one entry and I prefer to set [NoEgpr].value to 1. Don't check NoEgpr for all EVEX instruction in check_EgprOperands function. 
> 
> check_EgprOperands (const insn_template *t)
>  {
> -  if (t->opcode_modifier.noegpr)
> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())

So why would you add an attribute just to then ignore it by adding extra
code?

>>> --- a/opcodes/i386-opc.tbl
>>> +++ b/opcodes/i386-opc.tbl
>>> @@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
>>> <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>>>                        load:Load:0, store:Store:0, +
>>>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
>>> -                      rex:REX:x64, nooptimize:NoOptimize:0>
>>> +                      rex:REX:x64, rex2:REX2:x64,
>>> + nooptimize:NoOptimize:0>
>>
>> Seeing this I realized that there's something missing here (an APX_F
>> dependency), which then again would not have had an effect without the
>> patch [1] sent earlier today.
>>
>> Jan
>>
>> [1] https://sourceware.org/pipermail/binutils/2023-November/130345.html
> 
> Changed to 
> 
> +#define APX_F_64 APX_F|x64
> +
> <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>                        load:Load:0, store:Store:0, +
>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> +                      rex:REX:x64, rex2:REX2:APX_F_64, nooptimize:NoOptimize:0>
> 
> When we have" x86: split insn templates' CPU field" in trunk, I will change it to #define APX_F_64 APX_F&x64.

I've meanwhile put together the Cpu64 patch I was thinking of. No "&x64"
should then be needed anymore for any of the APX templates. Before sending
that one out, I will want to first see whether I can re-order it with the
patch sent earlier, as this would allow that other patch to shrink in
size (fewer "|x64" to convert to "&x64").

Jan
  
Jan Beulich Nov. 7, 2023, 10:43 a.m. UTC | #11
On 07.11.2023 09:16, Cui, Lili wrote:
>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>
>>>> On 02.11.2023 12:29, Cui, Lili wrote:
>>>>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
>>>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
>>>>>
>>>>> +/* Build (2 bytes) rex2 prefix.
>>>>> +   | D5h |
>>>>> +   | m | R4 X4 B4 | W R X B |
>>>>> +*/
>>>>> +static void
>>>>> +build_rex2_prefix (void)
>>>>> +{
>>>>> +  i.vex.length = 2;
>>>>> +  i.vex.bytes[0] = 0xd5;
>>>>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>>>>> +		    | (i.rex2 << 4) | i.rex);
>>>>> +}
>>>>
>>>> I may have asked on v1 already: For emitting REX we don't resort to
>>>> (ab)using i.vex. Is that really necessary? (If so, a comment next to
>>>> the field declaration may be warranted.)
>>>>
>>> Added comment for it.
>>>
>>>   /* For the W R X B bits, the variables of rex prefix will be reused.  */
>>>   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>>>                     | (i.rex2 << 4) | i.rex);
>>
>> How does the comment relate to the (ab)use of i.vex?
>>
> Ah ha, it's i.vex, not i.rex. At first I thought rex2 should have its own variable, but in the output_insn function they have the same special handling of i.tm.opcode_space as VEX. Reusing i.vex can reduce some ugly code. 

Things like this are very helpful to explain in the patch description.

>>>>> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
>>>>>  	  return;
>>>>>  	}
>>>>>
>>>>> +      /* Check for explicit REX2 prefix.  */
>>>>> +      if (i.rex2 || i.rex2_encoding)
>>>>
>>>> This open-codes is_any_apx_rex2_encoding(). But read on.
>>>>
>>>>> +	{
>>>>> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
>>>>
>>>> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is
>>>> what case the i.rex2 check above is intended to cover. Error message
>>>> comment, and condition want to reflect that.
>>>>
>>>
>>> Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase
>> for it.
>>>
>>>         {rex} vmovaps %xmm7,%xmm2
>>>         {rex} vmovaps %xmm17,%xmm2
>>>         {rex} rorx $7,%eax,%ebx
>>> +       {rex2} vmovaps %xmm7,%xmm2
>>
>> Right, but please see my "optional vs required" comment in the pseudo- prefix
>> related patch I did send earlier today. I question the correctness of the {rex}
>> related check here, which would then extend to the {rex2} one as well.
>>
> 
> A REX byte that is immediately followed by a legacy prefix byte (LOCK, REPE, REPNE, OSIZE override, ASIZE override, or segment overrides) or another REX byte is ignored and behaves as if it does not exist (except for contributing to the instruction length)
> but in this case I think it's correct.

I'm afraid I can't relate this to the aspect I raised above. Perhaps better to
discuss in the context of the patch that I sent (and that I mentioned above;
"x86: CPU-qualify {disp16} / {disp32}"). You did reply to the patch, but you
didn't reply to the more detailed description of the issue (which I did refer
to above).

>>>>> +	    {
>>>>> +	      i.error = register_type_mismatch;
>>>>> +	      return 1;
>>>>> +	    }
>>>>> +	}
>>>>> +
>>>>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>>>>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>>>>> +	{
>>>>> +	  i.error = register_type_of_address_mismatch;
>>>>> +	  return 1;
>>>>> +	}
>>>>> +
>>>>> +      /* Check pseudo prefix {rex2} are valid.  */
>>>>> +      if (i.rex2_encoding)
>>>>> +	{
>>>>> +	  i.error = invalid_pseudo_prefix;
>>>>> +	  return 1;
>>>>> +	}
>>>>
>>>> Further up in md_assemble() {rex} or {rex2} is simply ignored when
>>>> wrong to apply. Why would an inapplicable {rex2} be treated as an
>>>> error here? This would then also ...
>>>>
>>>>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
>>>>>        /* Do not verify operands when there are none.  */
>>>>>        if (!t->operands)
>>>>>  	{
>>>>> -	  if (VEX_check_encoding (t))
>>>>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
>>>>>  	    {
>>>>>  	      specific_error = progress (i.error);
>>>>>  	      continue;
>>>>
>>>> ... eliminate the need for this change, which is kind of bogus anyway:
>>>> There are no operands here, so calling a function of the given name
>>>> is at least suspicious.
>>>>
>>>
>>> We have these tests and I'm confused whether to remove them or not.
>>>
>>> +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
>>> +       {rex2} wrmsr
>>> +       {rex2} rdtsc
>>> +       {rex2} rdmsr
>>> +       {rex2} sysenter
>>> +       {rex2} sysexitl
>>> +       {rex2} rdpmc
>>
>> They should all stay. But as to my comment: There's no use of any eGPR here. If
>> you want to abuse that function and if there's no better descriptive name for it,
>> then once again at least a comment is needed.
>> (Considering this, the attribute's name NoEgpr is probably also misleading in
>> the cases here, i.e. when there are no operands. Hence, if not to be renamed,
>> requires yet another comment in i386-opc.h.)
>>
> This question also confused me , some instructions only support Acc register, but we need to add NoEgpr for them, this seems a bit strange. if we use NoRex2 , it doesn't fit the vex and evex instructions either. So I will add comments to it for now.
> 
> +         /* When there are no operands, we still need to use the
> +            check_EgprOperands function to check whether {rex2} is valid.  */
>           if (VEX_check_encoding (t) || check_EgprOperands (t))
> 
> -  /* egprs (r16-r31) on instruction illegal.  */
> +  /* egprs (r16-r31) on instruction illegal. We also use it to judge
> +     whether the instruction supports pseudo-prefix {rex2}.  */
>    NoEgpr,

This looks okay commentary-wise, but as per above we first need to settle on
whether an inapplicable {rex2} shouldn't simply be ignored.

>>>>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>>>>>    return elem_size;
>>>>>  }
>>>>>
>>>>> +static bool
>>>>> +if_entry_needs_special_handle (const unsigned long long opcode,
>>>>> +unsigned
>>>> int space,
>>>>> +			       const char *cpu_flags)
>>>>> +{
>>>>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
>>>>> +#UD.  */
>>>>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
>>>>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
>>>>> +      || strcmp (cpu_flags, "Xsave") >= 0
>>>>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
>>>>
>>>> Upon further thought for these (and maybe even ...
>>>>
>>>>> +      || !strcmp (cpu_flags, "3dnow")
>>>>> +      || !strcmp (cpu_flags, "3dnowA"))
>>>>
>>>> ... for these, but see also below) it might be better to add the
>>>> attribute right in the opcode table.
>>>>
>>>> As to the 3dnow insns - I think I'd like to revise my earlier
>>>> suggestion to also tag those. Like e.g. FPU insns they're pretty
>>>> normal GPR-wise, so allowing them to be used like that would appear
>>>> only consistent. Otherwise, if we were concerned of AMD extensions in
>>>> general, SSE4a insns (and maybe further
>>>> ones) would also need excluding. (Additionally recall that there's an
>>>> overlap between 3dnowa and SSE, which would result in another
>>>> [apparent] inconsistency when excluding 3dnow insns here.)
>>>>
>>>
>>> I see, for example  I think I need to split this table into two parts, one is for
>> SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
>>> pextrw, 0xfc5, SSE|3dnowA,
>>> Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
>> RegMMX,
>>> Reg32|Reg64 }
>>
>> I'm afraid I don't understand the question. All I've asked for is that the special
>> treatment of 3dnow insns be removed again. Unless you want to special-case
>> further insns; it's not really clear to me what's best, as both approaches have
>> noticable downsides (either we allow to encode something which may never
>> become valid, or we disallow something which may become valid).
>>
>> In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui they can
>> all be encoded with REX2.
>>
> I need to correct it:  There are some instructions table present both SSE and AMD instructions. I need to split them first and then add NoEgpr to AMD instructions.
> Another point is that we have not split the common instructions of AMD and Intel, so just adding NoEgpr to 3dnowA and 3dnow does not seem to make much sense.
> 
> Do you want me also to remove this part  and add  NoEgpr in insn table?

First we need to settle on what to do with 3DNow!, SSE4a, and maybe further
AMD-only insns (beyond e.g. XOP and TBM ones, which aiui are covered by
virtue of being VEX[-like], and hence never eligible for eGPR use). Then we
can sort out how to best express what we have decided to enforce.

I'm not convinced at all that templates like that for MASKMOVQ would need
splitting: The difference would be noticeable only if someone disabled SSE,
but kept 3DNow! and APX_F enabled. We could easily document the resulting
pitfall instead.

Jan
  
Cui, Lili Nov. 7, 2023, 2:32 p.m. UTC | #12
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 07.11.2023 09:06, Cui, Lili wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Monday, November 6, 2023 11:03 PM
> >>
> >> On 02.11.2023 12:29, Cui, Lili wrote:
> >>> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table,
> >>> char
> >> *mod, unsigned int space,
> >>>  	fprintf (stderr,
> >>>  		 "%s: %d: W modifier without Word/Dword/Qword
> >> operand(s)\n",
> >>>  		 filename, lineno);
> >>> +
> >>> +      /* The part about judging EVEX encoding should be synchronized with
> >>> +	 is_evex_encoding.  */
> >>> +      if (modifiers[Vex].value
> >>> +	  || ((space > SPACE_0F || has_special_handle)
> >>> +	      && !modifiers[EVex].value
> >>> +	      && !modifiers[Disp8MemShift].value
> >>> +	      && !modifiers[Broadcast].value
> >>> +	      && !modifiers[Masking].value
> >>> +	      && !modifiers[SAE].value))
> >>> +	modifiers[NoEgpr].value = 1;
> >>
> >> While this is just i386-gen (and hence being somewhat inefficient
> >> isn't the end of the world) I still wonder whether we need all the parts of
> this condition:
> >> Do we really need all the constituents of this EVEX related checks?
> >> Wouldn't it also help is_evex_encoding() if we switched to uniformly
> >> having EVex attributes on all EVEX templates? A presently missing
> >> EVex attribute, after all, merely is another way of saying EVexDYN,
> >> if I'm not mistaken. (Such an adjustment, if deemed to help, would of
> >> course want to come as a separate, prereq patch.)
> >>
> >
> > Yes, EVex is another way of saying EVexDYN, it should be appear in every
> EVEX template, when we merge EVex128, EVex256 and EVex512 into one
> template we omitted the expression of EVexDYN. So some EVEX templates
> don’t have this tag. If we want to re-add it, we need new values.
> 
> I don't understand. When there's (e.g.) EVex128, not EVexDYN should appear at
> the same time. Otoh ...
> 
> > Such as:
> > vcvttps2dq, 0xF35B, AVX512F,
> >
> Modrm|Masking|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize
> |No
> > Suf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex,
> > RegXMM|RegYMM|RegZMM }
> 
> ... aiui this one could have EVexDYN added without change in behavior, but
> would then allow being recognized as needed EVEX-encoding by just checking
> the .evex field, not any of the other fields is_evex_encoding() presently needs
> to check.
>

We have the same idea, that's what I mean too. I'd be happy to rewrite a patch later to implement it. Maybe it needs to be done after these patches are committed to the trunk.

> >> Furthermore, is this correct at all for mixed VEX/EVEX templates?
> >>
> > After merging the templates we only have one entry and I prefer to set
> [NoEgpr].value to 1. Don't check NoEgpr for all EVEX instruction in
> check_EgprOperands function.
> >
> > check_EgprOperands (const insn_template *t)  {
> > -  if (t->opcode_modifier.noegpr)
> > +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
> 
> So why would you add an attribute just to then ignore it by adding extra code?
> 
Since all EVEX encodings support EGPR, the attribute noegpr has little meaning for evex, so when vex and evex share an entry, we use noegpr to indicate the vex format.

> >>> --- a/opcodes/i386-opc.tbl
> >>> +++ b/opcodes/i386-opc.tbl
> >>> @@ -891,7 +891,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
> >>> <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >>>                        load:Load:0, store:Store:0, +
> >>>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> >>> -                      rex:REX:x64, nooptimize:NoOptimize:0>
> >>> +                      rex:REX:x64, rex2:REX2:x64,
> >>> + nooptimize:NoOptimize:0>
> >>
> >> Seeing this I realized that there's something missing here (an APX_F
> >> dependency), which then again would not have had an effect without
> >> the patch [1] sent earlier today.
> >>
> >> Jan
> >>
> >> [1]
> >> https://sourceware.org/pipermail/binutils/2023-November/130345.html
> >
> > Changed to
> >
> > +#define APX_F_64 APX_F|x64
> > +
> > <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >                        load:Load:0, store:Store:0, +
> >                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> > -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> > +                      rex:REX:x64, rex2:REX2:APX_F_64,
> > + nooptimize:NoOptimize:0>
> >
> > When we have" x86: split insn templates' CPU field" in trunk, I will change it
> to #define APX_F_64 APX_F&x64.
> 
> I've meanwhile put together the Cpu64 patch I was thinking of. No "&x64"
> should then be needed anymore for any of the APX templates. Before sending
> that one out, I will want to first see whether I can re-order it with the patch
> sent earlier, as this would allow that other patch to shrink in size (fewer "|x64"
> to convert to "&x64").
> 
Ok, looking forward to your patch.

Lili.
  
Jan Beulich Nov. 7, 2023, 3:08 p.m. UTC | #13
On 07.11.2023 15:32, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 07.11.2023 09:06, Cui, Lili wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Monday, November 6, 2023 11:03 PM
>>>>
>>>> On 02.11.2023 12:29, Cui, Lili wrote:
>>>>> @@ -1119,6 +1148,18 @@ process_i386_opcode_modifier (FILE *table,
>>>>> char
>>>> *mod, unsigned int space,
>>>>>  	fprintf (stderr,
>>>>>  		 "%s: %d: W modifier without Word/Dword/Qword
>>>> operand(s)\n",
>>>>>  		 filename, lineno);
>>>>> +
>>>>> +      /* The part about judging EVEX encoding should be synchronized with
>>>>> +	 is_evex_encoding.  */
>>>>> +      if (modifiers[Vex].value
>>>>> +	  || ((space > SPACE_0F || has_special_handle)
>>>>> +	      && !modifiers[EVex].value
>>>>> +	      && !modifiers[Disp8MemShift].value
>>>>> +	      && !modifiers[Broadcast].value
>>>>> +	      && !modifiers[Masking].value
>>>>> +	      && !modifiers[SAE].value))
>>>>> +	modifiers[NoEgpr].value = 1;
>>>>
>>>> While this is just i386-gen (and hence being somewhat inefficient
>>>> isn't the end of the world) I still wonder whether we need all the parts of
>> this condition:
>>>> Do we really need all the constituents of this EVEX related checks?
>>>> Wouldn't it also help is_evex_encoding() if we switched to uniformly
>>>> having EVex attributes on all EVEX templates? A presently missing
>>>> EVex attribute, after all, merely is another way of saying EVexDYN,
>>>> if I'm not mistaken. (Such an adjustment, if deemed to help, would of
>>>> course want to come as a separate, prereq patch.)
>>>>
>>>
>>> Yes, EVex is another way of saying EVexDYN, it should be appear in every
>> EVEX template, when we merge EVex128, EVex256 and EVex512 into one
>> template we omitted the expression of EVexDYN. So some EVEX templates
>> don’t have this tag. If we want to re-add it, we need new values.
>>
>> I don't understand. When there's (e.g.) EVex128, not EVexDYN should appear at
>> the same time. Otoh ...
>>
>>> Such as:
>>> vcvttps2dq, 0xF35B, AVX512F,
>>>
>> Modrm|Masking|Space0F|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize
>> |No
>>> Suf|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex,
>>> RegXMM|RegYMM|RegZMM }
>>
>> ... aiui this one could have EVexDYN added without change in behavior, but
>> would then allow being recognized as needed EVEX-encoding by just checking
>> the .evex field, not any of the other fields is_evex_encoding() presently needs
>> to check.
>>
> 
> We have the same idea, that's what I mean too. I'd be happy to rewrite a patch later to implement it. Maybe it needs to be done after these patches are committed to the trunk.

Well, you mirror is_evex_encoding() logic into i386-gen, and if that logic
is to be simplified anyway, it would likely be better to do so up front.
That'll reduce two patches then - the one here fiddling with i386-gen and
the one doing the conversion (which then won't need to also touch
i386-gen.c, at least not in this regard). Since it looks like I'm collecting
a set of prereq patches anyway, maybe I should take the time right away ...

>>>> Furthermore, is this correct at all for mixed VEX/EVEX templates?
>>>>
>>> After merging the templates we only have one entry and I prefer to set
>> [NoEgpr].value to 1. Don't check NoEgpr for all EVEX instruction in
>> check_EgprOperands function.
>>>
>>> check_EgprOperands (const insn_template *t)  {
>>> -  if (t->opcode_modifier.noegpr)
>>> +  if (t->opcode_modifier.noegpr && !need_evex_encoding())
>>
>> So why would you add an attribute just to then ignore it by adding extra code?
>>
> Since all EVEX encodings support EGPR, the attribute noegpr has little meaning for evex, so when vex and evex share an entry, we use noegpr to indicate the vex format.

I still don't follow. From VEX alone you know that eGPR isn't possible. A
mixed template very definitely allows for eGPR, otoh (requiring its EVEX
sub-variant to be chosen).

Jan
  
Cui, Lili Nov. 7, 2023, 3:31 p.m. UTC | #14
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 07.11.2023 09:16, Cui, Lili wrote:
> >>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>>>
> >>>> On 02.11.2023 12:29, Cui, Lili wrote:
> >>>>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
> >>>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >>>>>
> >>>>> +/* Build (2 bytes) rex2 prefix.
> >>>>> +   | D5h |
> >>>>> +   | m | R4 X4 B4 | W R X B |
> >>>>> +*/
> >>>>> +static void
> >>>>> +build_rex2_prefix (void)
> >>>>> +{
> >>>>> +  i.vex.length = 2;
> >>>>> +  i.vex.bytes[0] = 0xd5;
> >>>>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> >>>>> +		    | (i.rex2 << 4) | i.rex);
> >>>>> +}
> >>>>
> >>>> I may have asked on v1 already: For emitting REX we don't resort to
> >>>> (ab)using i.vex. Is that really necessary? (If so, a comment next
> >>>> to the field declaration may be warranted.)
> >>>>
> >>> Added comment for it.
> >>>
> >>>   /* For the W R X B bits, the variables of rex prefix will be reused.  */
> >>>   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> >>>                     | (i.rex2 << 4) | i.rex);
> >>
> >> How does the comment relate to the (ab)use of i.vex?
> >>
> > Ah ha, it's i.vex, not i.rex. At first I thought rex2 should have its own variable,
> but in the output_insn function they have the same special handling of
> i.tm.opcode_space as VEX. Reusing i.vex can reduce some ugly code.
> 
> Things like this are very helpful to explain in the patch description.
> 

Done.

> >>>>> +	    {
> >>>>> +	      i.error = register_type_mismatch;
> >>>>> +	      return 1;
> >>>>> +	    }
> >>>>> +	}
> >>>>> +
> >>>>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> >>>>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> >>>>> +	{
> >>>>> +	  i.error = register_type_of_address_mismatch;
> >>>>> +	  return 1;
> >>>>> +	}
> >>>>> +
> >>>>> +      /* Check pseudo prefix {rex2} are valid.  */
> >>>>> +      if (i.rex2_encoding)
> >>>>> +	{
> >>>>> +	  i.error = invalid_pseudo_prefix;
> >>>>> +	  return 1;
> >>>>> +	}
> >>>>
> >>>> Further up in md_assemble() {rex} or {rex2} is simply ignored when
> >>>> wrong to apply. Why would an inapplicable {rex2} be treated as an
> >>>> error here? This would then also ...
> >>>>
> >>>>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
> >>>>>        /* Do not verify operands when there are none.  */
> >>>>>        if (!t->operands)
> >>>>>  	{
> >>>>> -	  if (VEX_check_encoding (t))
> >>>>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> >>>>>  	    {
> >>>>>  	      specific_error = progress (i.error);
> >>>>>  	      continue;
> >>>>
> >>>> ... eliminate the need for this change, which is kind of bogus anyway:
> >>>> There are no operands here, so calling a function of the given name
> >>>> is at least suspicious.
> >>>>
> >>>
> >>> We have these tests and I'm confused whether to remove them or not.
> >>>
> >>> +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
> >>> +       {rex2} wrmsr
> >>> +       {rex2} rdtsc
> >>> +       {rex2} rdmsr
> >>> +       {rex2} sysenter
> >>> +       {rex2} sysexitl
> >>> +       {rex2} rdpmc
> >>
> >> They should all stay. But as to my comment: There's no use of any
> >> eGPR here. If you want to abuse that function and if there's no
> >> better descriptive name for it, then once again at least a comment is
> needed.
> >> (Considering this, the attribute's name NoEgpr is probably also
> >> misleading in the cases here, i.e. when there are no operands. Hence,
> >> if not to be renamed, requires yet another comment in i386-opc.h.)
> >>
> > This question also confused me , some instructions only support Acc register,
> but we need to add NoEgpr for them, this seems a bit strange. if we use
> NoRex2 , it doesn't fit the vex and evex instructions either. So I will add
> comments to it for now.
> >
> > +         /* When there are no operands, we still need to use the
> > +            check_EgprOperands function to check whether {rex2} is
> > + valid.  */
> >           if (VEX_check_encoding (t) || check_EgprOperands (t))
> >
> > -  /* egprs (r16-r31) on instruction illegal.  */
> > +  /* egprs (r16-r31) on instruction illegal. We also use it to judge
> > +     whether the instruction supports pseudo-prefix {rex2}.  */
> >    NoEgpr,
> 
> This looks okay commentary-wise, but as per above we first need to settle on
> whether an inapplicable {rex2} shouldn't simply be ignored.
> 
> >>>>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >>>>>    return elem_size;
> >>>>>  }
> >>>>>
> >>>>> +static bool
> >>>>> +if_entry_needs_special_handle (const unsigned long long opcode,
> >>>>> +unsigned
> >>>> int space,
> >>>>> +			       const char *cpu_flags)
> >>>>> +{
> >>>>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> >>>>> +#UD.  */
> >>>>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> >>>>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> >>>>> +      || strcmp (cpu_flags, "Xsave") >= 0
> >>>>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> >>>>
> >>>> Upon further thought for these (and maybe even ...
> >>>>
> >>>>> +      || !strcmp (cpu_flags, "3dnow")
> >>>>> +      || !strcmp (cpu_flags, "3dnowA"))
> >>>>
> >>>> ... for these, but see also below) it might be better to add the
> >>>> attribute right in the opcode table.
> >>>>
> >>>> As to the 3dnow insns - I think I'd like to revise my earlier
> >>>> suggestion to also tag those. Like e.g. FPU insns they're pretty
> >>>> normal GPR-wise, so allowing them to be used like that would appear
> >>>> only consistent. Otherwise, if we were concerned of AMD extensions
> >>>> in general, SSE4a insns (and maybe further
> >>>> ones) would also need excluding. (Additionally recall that there's
> >>>> an overlap between 3dnowa and SSE, which would result in another
> >>>> [apparent] inconsistency when excluding 3dnow insns here.)
> >>>>
> >>>
> >>> I see, for example  I think I need to split this table into two
> >>> parts, one is for
> >> SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
> >>> pextrw, 0xfc5, SSE|3dnowA,
> >>> Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
> >> RegMMX,
> >>> Reg32|Reg64 }
> >>
> >> I'm afraid I don't understand the question. All I've asked for is
> >> that the special treatment of 3dnow insns be removed again. Unless
> >> you want to special-case further insns; it's not really clear to me
> >> what's best, as both approaches have noticable downsides (either we
> >> allow to encode something which may never become valid, or we disallow
> something which may become valid).
> >>
> >> In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui
> >> they can all be encoded with REX2.
> >>
> > I need to correct it:  There are some instructions table present both SSE and
> AMD instructions. I need to split them first and then add NoEgpr to AMD
> instructions.
> > Another point is that we have not split the common instructions of AMD and
> Intel, so just adding NoEgpr to 3dnowA and 3dnow does not seem to make
> much sense.
> >
> > Do you want me also to remove this part  and add  NoEgpr in insn table?
> 
> First we need to settle on what to do with 3DNow!, SSE4a, and maybe further
> AMD-only insns (beyond e.g. XOP and TBM ones, which aiui are covered by
> virtue of being VEX[-like], and hence never eligible for eGPR use). Then we can
> sort out how to best express what we have decided to enforce.
> 
> I'm not convinced at all that templates like that for MASKMOVQ would need
> splitting: The difference would be noticeable only if someone disabled SSE, but
> kept 3DNow! and APX_F enabled. We could easily document the resulting
> pitfall instead.
> 

 Do you mean we won't add NoEgpr to the entries like this ? I will try to find a list for AMD-only insns.

pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }


Lili.
  
Jan Beulich Nov. 7, 2023, 3:43 p.m. UTC | #15
On 07.11.2023 16:31, Cui, Lili wrote:
>  Do you mean we won't add NoEgpr to the entries like this ? I will try to find a list for AMD-only insns.
> 
> pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }

I mean we need to carefully consider whether to add NoEgpr to AMD-only
insns. As mentioned before there are downsides to either approach.

Jan
  
Cui, Lili Nov. 7, 2023, 3:53 p.m. UTC | #16
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 07.11.2023 16:31, Cui, Lili wrote:
> >  Do you mean we won't add NoEgpr to the entries like this ? I will try to find
> a list for AMD-only insns.
> >
> > pextrw, 0xfc5, SSE|3dnowA,
> > Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
> RegMMX,
> > Reg32|Reg64 }
> 
> I mean we need to carefully consider whether to add NoEgpr to AMD-only
> insns. As mentioned before there are downsides to either approach.
> 
Yes, I would prefer not to add it.

Lili.
  
Cui, Lili Nov. 9, 2023, 8:02 a.m. UTC | #17
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -269,9 +275,17 @@ struct dis_private {
> >        ins->rex_used |= REX_OPCODE;			\
> >    }
> >
> > +#define USED_REX2(value)				\
> > +  {							\
> > +    if ((ins->rex2 & value))				\
> > +      ins->rex2_used |= value;				\
> > +  }
> > +
> >
> >  #define EVEX_b_used 1
> 
> Nit: Please avoid (re)introducing double blank lines. Instead ...
> 

Done.

> >  #define EVEX_len_used 2
> > +/* M0 in rex2 prefix represents map0 or map1.  */ #define REX2_M 0x8
> 
> ... a blank line ahead of this insertion would be helpful.
> 

Done.

> > @@ -1872,23 +1888,23 @@ static const struct dis386 dis386[] = {
> > +  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
> > +  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
> >    /* b0 */
> >    { "movB",		{ RMAL, Ib }, 0 },
> >    { "movB",		{ RMCL, Ib }, 0 },
> 
> Like in the i386-gen.c adjustments for row E look to be missing here, too.
> 

Added them, and also added bad testcase for row E.

> > @@ -2091,12 +2107,12 @@ static const struct dis386 dis386_twobyte[] = {
> >    { PREFIX_TABLE (PREFIX_0F2E) },
> >    { PREFIX_TABLE (PREFIX_0F2F) },
> >    /* 30 */
> > -  { "wrmsr",		{ XX }, 0 },
> > -  { "rdtsc",		{ XX }, 0 },
> > -  { "rdmsr",		{ XX }, 0 },
> > -  { "rdpmc",		{ XX }, 0 },
> > -  { "sysenter",		{ SEP }, 0 },
> > -  { "sysexit%LQ",	{ SEP }, 0 },
> > +  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
> > +  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
> > +  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
> > +  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
> > +  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
> > +  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
> >    { Bad_Opcode },
> >    { "getsec",		{ XX }, 0 },
> >    /* 38 */
> 
> Down from here row 8 also wants adjustment afaict.
> 

Done.

> > @@ -8289,6 +8313,7 @@ ckprefix (instr_info *ins)  {
> >    int i, length;
> >    uint8_t newrex;
> > +  unsigned char rex2_payload;
> 
> Please can this be restricted to the inner scope where it's used?
> 

Done.

> > @@ -9292,13 +9338,17 @@ print_insn (bfd_vma pc, disassemble_info
> *info, int intel_syntax)
> >        goto out;
> >      }
> >
> > -  if (*ins.codep == 0x0f)
> > +  /* M0 in rex2 prefix represents map0 or map1.  */  if (*ins.codep
> > + == 0x0f || (ins.rex2 & REX2_M))
> 
> I'm struggling with the M0 in the comment. DYM just M, or maybe REX2.M?
> 

Changed to REX2.M.

> Also is this, ...
> 
> >      {
> >        unsigned char threebyte;
> >
> > -      ins.codep++;
> > -      if (!fetch_code (info, ins.codep + 1))
> > -	goto fetch_error_out;
> > +      if (!ins.rex2)
> > +	{
> > +	  ins.codep++;
> > +	  if (!fetch_code (info, ins.codep + 1))
> > +	    goto fetch_error_out;
> > +	}
> >        threebyte = *ins.codep;
> >        dp = &dis386_twobyte[threebyte];
> >        ins.need_modrm = twobyte_has_modrm[threebyte];
> 
> ... all the way to here, really correct for d5 00 0f?
> 

I think the 0f here must indicate that it is the first byte of the legacy map1 instruction, meaning legacy map0 does not have 0f opcode. If this instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80. If a bad binary does appear, our original code also has the same issue.

static const struct dis386 dis386[] = {
...
/ * 0f  */
{ Bad_Opcode },       /* 0x0f extended opcode escape */

> > @@ -9454,6 +9504,14 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        goto out;
> >      }
> >
> > +  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
> > +      && ins.last_rex2_prefix >= 0)
> > +    {
> > +      i386_dis_printf (info, dis_style_text, "(bad)");
> > +      ret = ins.end_codep - priv.the_buffer;
> > +      goto out;
> > +    }
> > +
> >    switch (dp->prefix_requirement)
> >      {
> >      case PREFIX_DATA:
> > @@ -9468,6 +9526,7 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        ins.used_prefixes |= PREFIX_DATA;
> >        /* Fall through.  */
> >      case PREFIX_OPCODE:
> > +    case PREFIX_OPCODE | PREFIX_REX2_ILLEGAL:
> 
> May more robust to mask off PREFIX_REX2_ILLEGAL in the control expression
> of the switch()? Or else why don't you move the if() immediately ahead of the
> switch() into here, as a new case block?
> 

Changed it to

+  switch (dp->prefix_requirement & ~PREFIX_REX2_ILLEGAL)
     {
     case PREFIX_DATA:
       /* If only the data prefix is marked as mandatory, its absence renders
@@ -9600,7 +9599,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       ins.used_prefixes |= PREFIX_DATA;
       /* Fall through.  */
     case PREFIX_OPCODE:
-    case PREFIX_OPCODE | PREFIX_REX2_ILLEGAL:


> > @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        && !ins.need_vex && ins.last_rex_prefix >= 0)
> >      ins.all_prefixes[ins.last_rex_prefix] = 0;
> >
> > +  /* Check if the REX2 prefix is used.  */
> > +  if (ins.last_rex2_prefix >= 0
> > +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> > +	   && (ins.rex2 & 0x7))
> 
> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
>

Here's an example of a negative scenario, when ins.rex2 == 1 and ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has egpr and we don't want to add {rex2} to it. 

> > +	  || dp == &bad_opcode))
> 
> What is this last part of the condition about? Other prefix zapping code
> doesn't have such.

Deleted it , there is no impact on current testcase. Don’t know what the author’s intention was at that time. Deleted it.

> 
> > +    ins.all_prefixes[ins.last_rex2_prefix] = 0;
> > +
> >    /* Check if the SEG prefix is used.  */
> >    if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
> >  		       | PREFIX_FS | PREFIX_GS)) != 0 @@ -9541,7 +9607,10
> @@
> > print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
> >  	if (name == NULL)
> >  	  abort ();
> >  	prefix_length += strlen (name) + 1;
> > -	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
> > +	if (ins.all_prefixes[i] == REX2_OPCODE)
> > +	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
> 
> Do braces really count as part of the mnemonic?
> 

Yes, rex2 prefix prefers to use mnemonic {rex2}, unlike rex prefix use rex, rex.B....

> > +	else
> > +	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
> >        }
> 
> Aren't you at risk of wrongly printing a REX prefix here if the high 4 bits of the
> REX2 payload were all zero, but some of the low 4 bits turned out unused?
> 
  For d500 I think we should print rex2 for it. 

> > @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned int
> reg, unsigned int rexmask,
> >      ins->illegal_masking = true;
> >
> >    USED_REX (rexmask);
> > +  USED_REX2 (rexmask);
> 
> Do both really need tracking separately? Whatever consumes REX.B will also
> consume REX2.B4, an so on.
> 
I was confused here, I think we only need to print {rex2} for the upper 4 bits == *000, which means egpr is not used and we need to use {rex2} to distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I intend to delete them.

+  /* Check if the REX2 prefix is used.  */
+  if (ins.last_rex2_prefix >= 0
+      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
+	   && (ins.rex2 & 0x7))

Thanks,
Lili
  
Jan Beulich Nov. 9, 2023, 10:52 a.m. UTC | #18
On 09.11.2023 09:02, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 02.11.2023 12:29, Cui, Lili wrote:
>>> @@ -9292,13 +9338,17 @@ print_insn (bfd_vma pc, disassemble_info
>> *info, int intel_syntax)
>>>        goto out;
>>>      }
>>>
>>> -  if (*ins.codep == 0x0f)
>>> +  /* M0 in rex2 prefix represents map0 or map1.  */  if (*ins.codep
>>> + == 0x0f || (ins.rex2 & REX2_M))
>>
>> I'm struggling with the M0 in the comment. DYM just M, or maybe REX2.M?
>>
> 
> Changed to REX2.M.
> 
>> Also is this, ...
>>
>>>      {
>>>        unsigned char threebyte;
>>>
>>> -      ins.codep++;
>>> -      if (!fetch_code (info, ins.codep + 1))
>>> -	goto fetch_error_out;
>>> +      if (!ins.rex2)
>>> +	{
>>> +	  ins.codep++;
>>> +	  if (!fetch_code (info, ins.codep + 1))
>>> +	    goto fetch_error_out;
>>> +	}
>>>        threebyte = *ins.codep;
>>>        dp = &dis386_twobyte[threebyte];
>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>
>> ... all the way to here, really correct for d5 00 0f?
>>
> 
> I think the 0f here must indicate that it is the first byte of the legacy map1 instruction, meaning legacy map0 does not have 0f opcode. If this instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80. If a bad binary does appear, our original code also has the same issue.
> 
> static const struct dis386 dis386[] = {
> ...
> / * 0f  */
> { Bad_Opcode },       /* 0x0f extended opcode escape */

No, this entry simply will never be used, because of how decoding is done.
My comment was about what's going to happen if you encounter the d5 00 0f
byte sequence. That's _not_ an indication to use map1 for decoding, nor
to read another opcode byte. In this case the table entry you quote above
will need to come into play, not any entry from dis386_twobyte[]. (As long
as both are Bad_Opcode the difference may not even be noticeable, but it
would be a latent trap for someone to fall into down the road.)

>>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info *info,
>> int intel_syntax)
>>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
>>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
>>>
>>> +  /* Check if the REX2 prefix is used.  */
>>> +  if (ins.last_rex2_prefix >= 0
>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>> +	   && (ins.rex2 & 0x7))
>>
>> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
>>
> 
> Here's an example of a negative scenario, when ins.rex2 == 1 and ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has egpr and we don't want to add {rex2} to it. 

Well, that would be dealt with as well by the simpler code I suggested,
wouldn't it?

>> @@
>>> print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>>>  	if (name == NULL)
>>>  	  abort ();
>>>  	prefix_length += strlen (name) + 1;
>>> -	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
>>> +	if (ins.all_prefixes[i] == REX2_OPCODE)
>>> +	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
>>
>> Do braces really count as part of the mnemonic?
> 
> Yes, rex2 prefix prefers to use mnemonic {rex2}, unlike rex prefix use rex, rex.B....

Maybe you didn't understand what I mean: My comment was regarding the use
of dis_style_mnemonic for the entire {rex2} (rather than perhaps just for
what's inside the figure braces).

>>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned int
>> reg, unsigned int rexmask,
>>>      ins->illegal_masking = true;
>>>
>>>    USED_REX (rexmask);
>>> +  USED_REX2 (rexmask);
>>
>> Do both really need tracking separately? Whatever consumes REX.B will also
>> consume REX2.B4, an so on.
>>
> I was confused here, I think we only need to print {rex2} for the upper 4 bits == *000, which means egpr is not used and we need to use {rex2} to distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I intend to delete them.
> 
> +  /* Check if the REX2 prefix is used.  */
> +  if (ins.last_rex2_prefix >= 0
> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> +	   && (ins.rex2 & 0x7))

But that's the same you had before. I'm afraid I don't see what you're
trying to tell me.

Jan
  
Cui, Lili Nov. 9, 2023, 1:27 p.m. UTC | #19
> >> Also is this, ...
> >>
> >>>      {
> >>>        unsigned char threebyte;
> >>>
> >>> -      ins.codep++;
> >>> -      if (!fetch_code (info, ins.codep + 1))
> >>> -	goto fetch_error_out;
> >>> +      if (!ins.rex2)
> >>> +	{
> >>> +	  ins.codep++;
> >>> +	  if (!fetch_code (info, ins.codep + 1))
> >>> +	    goto fetch_error_out;
> >>> +	}
> >>>        threebyte = *ins.codep;
> >>>        dp = &dis386_twobyte[threebyte];
> >>>        ins.need_modrm = twobyte_has_modrm[threebyte];
> >>
> >> ... all the way to here, really correct for d5 00 0f?
> >>
> >
> > I think the 0f here must indicate that it is the first byte of the legacy map1
> instruction, meaning legacy map0 does not have 0f opcode. If this instruction
> has a rex2 prefix, rex2.w must be 1 and should be d5 80. If a bad binary does
> appear, our original code also has the same issue.
> >
> > static const struct dis386 dis386[] = { ...
> > / * 0f  */
> > { Bad_Opcode },       /* 0x0f extended opcode escape */
> 
> No, this entry simply will never be used, because of how decoding is done.
> My comment was about what's going to happen if you encounter the d5 00 0f
> byte sequence. That's _not_ an indication to use map1 for decoding, nor to
> read another opcode byte. In this case the table entry you quote above will
> need to come into play, not any entry from dis386_twobyte[]. (As long as both
> are Bad_Opcode the difference may not even be noticeable, but it would be a
> latent trap for someone to fall into down the road.)
> 


  /* REX2.M in rex2 prefix represents map0 or map1.  */
  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
    {
      unsigned char threebyte;

      if (!ins.rex2)
        {
          ins.codep++;
          if (!fetch_code (info, ins.codep + 1))
            goto fetch_error_out;                                                      ---> When there are no bytes after 0f, it will jump to fetch error, but no error will be reported.
        }
      threebyte = *ins.codep;
      dp = &dis386_twobyte[threebyte];
      ins.need_modrm = twobyte_has_modrm[threebyte];
      ins.codep++;
    }

For d5 00 0f
Decode to:
   0:   d5                      rex2
   1:   00 0f                   add    %cl,(%rdi)

For 40 0f
Decode to:
   0:   40                      rex
   1:   0f                      .byte 0xf


> >>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info
> >>> *info,
> >> int intel_syntax)
> >>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
> >>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
> >>>
> >>> +  /* Check if the REX2 prefix is used.  */
> >>> +  if (ins.last_rex2_prefix >= 0
> >>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> >>> +	   && (ins.rex2 & 0x7))
> >>
> >> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
> >>
> >
> > Here's an example of a negative scenario, when ins.rex2 == 1 and
> ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has egpr and
> we don't want to add {rex2} to it.
> 
> Well, that would be dealt with as well by the simpler code I suggested,
> wouldn't it?
> 

No, for d510 ,  ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) == 0. Anyway, I want to delete them. I don't see any point in it at all.

> >> @@
> >>> print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
> >>>  	if (name == NULL)
> >>>  	  abort ();
> >>>  	prefix_length += strlen (name) + 1;
> >>> -	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
> >>> +	if (ins.all_prefixes[i] == REX2_OPCODE)
> >>> +	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
> >>
> >> Do braces really count as part of the mnemonic?
> >
> > Yes, rex2 prefix prefers to use mnemonic {rex2}, unlike rex prefix use rex,
> rex.B....
> 
> Maybe you didn't understand what I mean: My comment was regarding the
> use of dis_style_mnemonic for the entire {rex2} (rather than perhaps just for
> what's inside the figure braces).
> 
> >>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned
> >>> int
> >> reg, unsigned int rexmask,
> >>>      ins->illegal_masking = true;
> >>>
> >>>    USED_REX (rexmask);
> >>> +  USED_REX2 (rexmask);
> >>
> >> Do both really need tracking separately? Whatever consumes REX.B will
> >> also consume REX2.B4, an so on.
> >>
> > I was confused here, I think we only need to print {rex2} for the upper 4 bits
> == *000, which means egpr is not used and we need to use {rex2} to
> distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2 & 0x7) ^
> (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I intend to delete
> them.
> >
> > +  /* Check if the REX2 prefix is used.  */
> > +  if (ins.last_rex2_prefix >= 0
> > +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> > +	   && (ins.rex2 & 0x7))
> 
> But that's the same you had before. I'm afraid I don't see what you're trying
> to tell me.
> 
After removing  " ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0 ", the code changes to 

  +  /* Check if the REX2 prefix is used.  */
  +  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 0x7))  

When it is true, decode will not print the {rex2} for this insn.

Lili.
  
Jan Beulich Nov. 9, 2023, 3:22 p.m. UTC | #20
On 09.11.2023 14:27, Cui, Lili wrote:
>>>> Also is this, ...
>>>>
>>>>>      {
>>>>>        unsigned char threebyte;
>>>>>
>>>>> -      ins.codep++;
>>>>> -      if (!fetch_code (info, ins.codep + 1))
>>>>> -	goto fetch_error_out;
>>>>> +      if (!ins.rex2)
>>>>> +	{
>>>>> +	  ins.codep++;
>>>>> +	  if (!fetch_code (info, ins.codep + 1))
>>>>> +	    goto fetch_error_out;
>>>>> +	}
>>>>>        threebyte = *ins.codep;
>>>>>        dp = &dis386_twobyte[threebyte];
>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>
>>>> ... all the way to here, really correct for d5 00 0f?
>>>>
>>>
>>> I think the 0f here must indicate that it is the first byte of the legacy map1
>> instruction, meaning legacy map0 does not have 0f opcode. If this instruction
>> has a rex2 prefix, rex2.w must be 1 and should be d5 80. If a bad binary does
>> appear, our original code also has the same issue.
>>>
>>> static const struct dis386 dis386[] = { ...
>>> / * 0f  */
>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
>>
>> No, this entry simply will never be used, because of how decoding is done.
>> My comment was about what's going to happen if you encounter the d5 00 0f
>> byte sequence. That's _not_ an indication to use map1 for decoding, nor to
>> read another opcode byte. In this case the table entry you quote above will
>> need to come into play, not any entry from dis386_twobyte[]. (As long as both
>> are Bad_Opcode the difference may not even be noticeable, but it would be a
>> latent trap for someone to fall into down the road.)
>>
> 
> 
>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
>     {
>       unsigned char threebyte;
> 
>       if (!ins.rex2)
>         {
>           ins.codep++;
>           if (!fetch_code (info, ins.codep + 1))
>             goto fetch_error_out;                                                      ---> When there are no bytes after 0f, it will jump to fetch error, but no error will be reported.
>         }
>       threebyte = *ins.codep;
>       dp = &dis386_twobyte[threebyte];
>       ins.need_modrm = twobyte_has_modrm[threebyte];
>       ins.codep++;
>     }
> 
> For d5 00 0f
> Decode to:
>    0:   d5                      rex2
>    1:   00 0f                   add    %cl,(%rdi)

But this would better have d5 00 0f all on the first line (it
definitely needs to have d5 00 on the same line, as the bytes belong
together), as opposed to ...

> For 40 0f
> Decode to:
>    0:   40                      rex
>    1:   0f                      .byte 0xf

... this where there truly is a known missing byte before we could
proceed further. (It's still a little questionable to print REX
separately in this case, but that's the way the binutils disassembler
has always worked.)

Yet to restate - to see what I mean, you'd need to populate at least
one of the two 0f slots in the mentioned arrays. What I'm suspecting
from the code as this patch version has it is that d5 00 0f would
wrongly descend into dis386_twobyte[]. Yet you can tell that from
it correctly using dis386[] only if the two 0f slots of these arrays
are meaningfully different (or by actually looking at things in e.g.
a debugger).

>>>>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info
>>>>> *info,
>>>> int intel_syntax)
>>>>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
>>>>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
>>>>>
>>>>> +  /* Check if the REX2 prefix is used.  */
>>>>> +  if (ins.last_rex2_prefix >= 0
>>>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>>>> +	   && (ins.rex2 & 0x7))
>>>>
>>>> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
>>>>
>>>
>>> Here's an example of a negative scenario, when ins.rex2 == 1 and
>> ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has egpr and
>> we don't want to add {rex2} to it.
>>
>> Well, that would be dealt with as well by the simpler code I suggested,
>> wouldn't it?
>>
> 
> No, for d510 ,  ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) == 0. Anyway, I want to delete them. I don't see any point in it at all.

Hmm, I guess I'm confused. How would you present unconsumed REX2.{R,X,B}{3,4}
then?

>>>>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned
>>>>> int
>>>> reg, unsigned int rexmask,
>>>>>      ins->illegal_masking = true;
>>>>>
>>>>>    USED_REX (rexmask);
>>>>> +  USED_REX2 (rexmask);
>>>>
>>>> Do both really need tracking separately? Whatever consumes REX.B will
>>>> also consume REX2.B4, an so on.
>>>>
>>> I was confused here, I think we only need to print {rex2} for the upper 4 bits
>> == *000, which means egpr is not used and we need to use {rex2} to
>> distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2 & 0x7) ^
>> (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I intend to delete
>> them.
>>>
>>> +  /* Check if the REX2 prefix is used.  */
>>> +  if (ins.last_rex2_prefix >= 0
>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>> +	   && (ins.rex2 & 0x7))
>>
>> But that's the same you had before. I'm afraid I don't see what you're trying
>> to tell me.
>>
> After removing  " ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0 ", the code changes to 
> 
>   +  /* Check if the REX2 prefix is used.  */
>   +  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 0x7))  
> 
> When it is true, decode will not print the {rex2} for this insn.

Yet ins.rex2 having any of the low 3 bits set says nothing about whether
every one of these was consumed while processing operands / suffixes.
You need to consult .rex{,2}_used; my earlier point was merely that you
don't need a separate .rex2_used; the bits in .rex_used are all you
require to get this right (as a consumer of, say, REX.X / REX2.X3 is
also a consumer of REX2.X4; leaving aside EVEX encoded insns for the
moment).

Jan
  
Cui, Lili Nov. 10, 2023, 7:11 a.m. UTC | #21
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 09.11.2023 14:27, Cui, Lili wrote:
> >>>> Also is this, ...
> >>>>
> >>>>>      {
> >>>>>        unsigned char threebyte;
> >>>>>
> >>>>> -      ins.codep++;
> >>>>> -      if (!fetch_code (info, ins.codep + 1))
> >>>>> -	goto fetch_error_out;
> >>>>> +      if (!ins.rex2)
> >>>>> +	{
> >>>>> +	  ins.codep++;
> >>>>> +	  if (!fetch_code (info, ins.codep + 1))
> >>>>> +	    goto fetch_error_out;
> >>>>> +	}
> >>>>>        threebyte = *ins.codep;
> >>>>>        dp = &dis386_twobyte[threebyte];
> >>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>
> >>>> ... all the way to here, really correct for d5 00 0f?
> >>>>
> >>>
> >>> I think the 0f here must indicate that it is the first byte of the
> >>> legacy map1
> >> instruction, meaning legacy map0 does not have 0f opcode. If this
> >> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
> >> If a bad binary does appear, our original code also has the same issue.
> >>>
> >>> static const struct dis386 dis386[] = { ...
> >>> / * 0f  */
> >>> { Bad_Opcode },       /* 0x0f extended opcode escape */
> >>
> >> No, this entry simply will never be used, because of how decoding is done.
> >> My comment was about what's going to happen if you encounter the d5
> >> 00 0f byte sequence. That's _not_ an indication to use map1 for
> >> decoding, nor to read another opcode byte. In this case the table
> >> entry you quote above will need to come into play, not any entry from
> >> dis386_twobyte[]. (As long as both are Bad_Opcode the difference may
> >> not even be noticeable, but it would be a latent trap for someone to
> >> fall into down the road.)
> >>
> >
> >
> >   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> >     {
> >       unsigned char threebyte;
> >
> >       if (!ins.rex2)
> >         {
> >           ins.codep++;
> >           if (!fetch_code (info, ins.codep + 1))
> >             goto fetch_error_out;                                                      ---> When there
> are no bytes after 0f, it will jump to fetch error, but no error will be reported.
> >         }
> >       threebyte = *ins.codep;
> >       dp = &dis386_twobyte[threebyte];
> >       ins.need_modrm = twobyte_has_modrm[threebyte];
> >       ins.codep++;
> >     }
> >
> > For d5 00 0f
> > Decode to:
> >    0:   d5                      rex2
> >    1:   00 0f                   add    %cl,(%rdi)
> 
> But this would better have d5 00 0f all on the first line (it definitely needs to
> have d5 00 on the same line, as the bytes belong together), as opposed to ...
> 
> > For 40 0f
> > Decode to:
> >    0:   40                      rex
> >    1:   0f                      .byte 0xf
> 
> ... this where there truly is a known missing byte before we could proceed
> further. (It's still a little questionable to print REX separately in this case, but
> that's the way the binutils disassembler has always worked.)
> 
> Yet to restate - to see what I mean, you'd need to populate at least one of the
> two 0f slots in the mentioned arrays. What I'm suspecting from the code as
> this patch version has it is that d5 00 0f would wrongly descend into
> dis386_twobyte[]. Yet you can tell that from it correctly using dis386[] only if
> the two 0f slots of these arrays are meaningfully different (or by actually
> looking at things in e.g.
> a debugger).
> 

I'm confused here, for d5 00 0f when it fetches the next byte after 0f it will find there is no byte there and then go to fetch_error_out and then it will return from print_insn and I don't have a chance to do anything for it. It cannot reach dis386_twobyte[]. 

> >>>>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info
> >>>>> *info,
> >>>> int intel_syntax)
> >>>>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
> >>>>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
> >>>>>
> >>>>> +  /* Check if the REX2 prefix is used.  */
> >>>>> +  if (ins.last_rex2_prefix >= 0
> >>>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> >>>>> +	   && (ins.rex2 & 0x7))
> >>>>
> >>>> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
> >>>>
> >>>
> >>> Here's an example of a negative scenario, when ins.rex2 == 1 and
> >> ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has
> >> egpr and we don't want to add {rex2} to it.
> >>
> >> Well, that would be dealt with as well by the simpler code I
> >> suggested, wouldn't it?
> >>
> >
> > No, for d510 ,  ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) == 0. Anyway, I want to
> delete them. I don't see any point in it at all.
> 
> Hmm, I guess I'm confused. How would you present unconsumed
> REX2.{R,X,B}{3,4} then?
> 

For rex2.R4X4B4. I can't imagine why they set RXB but didn't consume it at the end. Since rex2 has no content listed, so I think it is not useful to have rex2_used.
For rex2.WR3X3B3, I think it can be ignored in rex2, which doesn't list what's there.

We currently use the following rules to disambiguate.

When the instruction has egpr, it means it has rex2 prefix and we don't print {rex2} for it. When it also has an evex version, we add {evex} to the evex instruction.
When the instruction has no egpr, we need to print {rex2} for it. In case of evex instruction, we need to print {evex} for it.

            adc     $0x7b,%r8b
{rex2} adc     $0x7b,%r8b
            adc     $0x7b,%r16b
{evex} adc     $0x7b,%r16b
{evex} adc     $0x7b,%r8b

> >>>>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned
> >>>>> int
> >>>> reg, unsigned int rexmask,
> >>>>>      ins->illegal_masking = true;
> >>>>>
> >>>>>    USED_REX (rexmask);
> >>>>> +  USED_REX2 (rexmask);
> >>>>
> >>>> Do both really need tracking separately? Whatever consumes REX.B
> >>>> will also consume REX2.B4, an so on.
> >>>>
> >>> I was confused here, I think we only need to print {rex2} for the
> >>> upper 4 bits
> >> == *000, which means egpr is not used and we need to use {rex2} to
> >> distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2
> >> & 0x7) ^ (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I
> >> intend to delete them.
> >>>
> >>> +  /* Check if the REX2 prefix is used.  */
> >>> +  if (ins.last_rex2_prefix >= 0
> >>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
> >>> +	   && (ins.rex2 & 0x7))
> >>
> >> But that's the same you had before. I'm afraid I don't see what
> >> you're trying to tell me.
> >>
> > After removing  " ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0 ",
> > the code changes to
> >
> >   +  /* Check if the REX2 prefix is used.  */
> >   +  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 0x7))
> >
> > When it is true, decode will not print the {rex2} for this insn.
> 
> Yet ins.rex2 having any of the low 3 bits set says nothing about whether every
> one of these was consumed while processing operands / suffixes.
> You need to consult .rex{,2}_used; my earlier point was merely that you don't
> need a separate .rex2_used; the bits in .rex_used are all you require to get
> this right (as a consumer of, say, REX.X / REX2.X3 is also a consumer of
> REX2.X4; leaving aside EVEX encoded insns for the moment).
> 

Here are some cases:
0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex will print it.
0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses rex2 prefix. We cannot see the information of the lower 4 bits of rex2

It is not helpful to judge rex_used in rex2.

Thanks.
Lili.
  
Jan Beulich Nov. 10, 2023, 9:14 a.m. UTC | #22
On 10.11.2023 08:11, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 09.11.2023 14:27, Cui, Lili wrote:
>>>>>> Also is this, ...
>>>>>>
>>>>>>>      {
>>>>>>>        unsigned char threebyte;
>>>>>>>
>>>>>>> -      ins.codep++;
>>>>>>> -      if (!fetch_code (info, ins.codep + 1))
>>>>>>> -	goto fetch_error_out;
>>>>>>> +      if (!ins.rex2)
>>>>>>> +	{
>>>>>>> +	  ins.codep++;
>>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
>>>>>>> +	    goto fetch_error_out;
>>>>>>> +	}
>>>>>>>        threebyte = *ins.codep;
>>>>>>>        dp = &dis386_twobyte[threebyte];
>>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>>
>>>>>> ... all the way to here, really correct for d5 00 0f?
>>>>>>
>>>>>
>>>>> I think the 0f here must indicate that it is the first byte of the
>>>>> legacy map1
>>>> instruction, meaning legacy map0 does not have 0f opcode. If this
>>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
>>>> If a bad binary does appear, our original code also has the same issue.
>>>>>
>>>>> static const struct dis386 dis386[] = { ...
>>>>> / * 0f  */
>>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
>>>>
>>>> No, this entry simply will never be used, because of how decoding is done.
>>>> My comment was about what's going to happen if you encounter the d5
>>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
>>>> decoding, nor to read another opcode byte. In this case the table
>>>> entry you quote above will need to come into play, not any entry from
>>>> dis386_twobyte[]. (As long as both are Bad_Opcode the difference may
>>>> not even be noticeable, but it would be a latent trap for someone to
>>>> fall into down the road.)
>>>>
>>>
>>>
>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
>>>     {
>>>       unsigned char threebyte;
>>>
>>>       if (!ins.rex2)
>>>         {
>>>           ins.codep++;
>>>           if (!fetch_code (info, ins.codep + 1))
>>>             goto fetch_error_out;                                                      ---> When there
>> are no bytes after 0f, it will jump to fetch error, but no error will be reported.
>>>         }
>>>       threebyte = *ins.codep;
>>>       dp = &dis386_twobyte[threebyte];
>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
>>>       ins.codep++;
>>>     }
>>>
>>> For d5 00 0f
>>> Decode to:
>>>    0:   d5                      rex2
>>>    1:   00 0f                   add    %cl,(%rdi)
>>
>> But this would better have d5 00 0f all on the first line (it definitely needs to
>> have d5 00 on the same line, as the bytes belong together), as opposed to ...
>>
>>> For 40 0f
>>> Decode to:
>>>    0:   40                      rex
>>>    1:   0f                      .byte 0xf
>>
>> ... this where there truly is a known missing byte before we could proceed
>> further. (It's still a little questionable to print REX separately in this case, but
>> that's the way the binutils disassembler has always worked.)
>>
>> Yet to restate - to see what I mean, you'd need to populate at least one of the
>> two 0f slots in the mentioned arrays. What I'm suspecting from the code as
>> this patch version has it is that d5 00 0f would wrongly descend into
>> dis386_twobyte[]. Yet you can tell that from it correctly using dis386[] only if
>> the two 0f slots of these arrays are meaningfully different (or by actually
>> looking at things in e.g.
>> a debugger).
>>
> 
> I'm confused here, for d5 00 0f when it fetches the next byte after 0f it will find there is no byte there and then go to fetch_error_out and then it will return from print_insn and I don't have a chance to do anything for it. It cannot reach dis386_twobyte[]. 

But why would it even try to fetch the next byte? 0f already is the major
opcode byte in this case. Fetching more can only mean either there's an
entry in dis386[] specifying operands, or there's an attempt to index
dis386_twobyte[]. Since dis386[] has Bad_Opcode at that slot, I conclude
that what you say confirms my suspicion that dis386_twobyte[] is
(attempted to be) used here.

>>>>>>> @@ -9513,6 +9572,13 @@ print_insn (bfd_vma pc, disassemble_info
>>>>>>> *info,
>>>>>> int intel_syntax)
>>>>>>>        && !ins.need_vex && ins.last_rex_prefix >= 0)
>>>>>>>      ins.all_prefixes[ins.last_rex_prefix] = 0;
>>>>>>>
>>>>>>> +  /* Check if the REX2 prefix is used.  */
>>>>>>> +  if (ins.last_rex2_prefix >= 0
>>>>>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>>>>>> +	   && (ins.rex2 & 0x7))
>>>>>>
>>>>>> DYM ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) != 0
>>>>>>
>>>>>
>>>>> Here's an example of a negative scenario, when ins.rex2 == 1 and
>>>> ins.rex2_used == 1, we want to clear last_rex2_prefix, because it has
>>>> egpr and we don't want to add {rex2} to it.
>>>>
>>>> Well, that would be dealt with as well by the simpler code I
>>>> suggested, wouldn't it?
>>>>
>>>
>>> No, for d510 ,  ((ins.rex2 & 7) & ~(ins.rex2_used & 7)) == 0. Anyway, I want to
>> delete them. I don't see any point in it at all.
>>
>> Hmm, I guess I'm confused. How would you present unconsumed
>> REX2.{R,X,B}{3,4} then?
>>
> 
> For rex2.R4X4B4. I can't imagine why they set RXB but didn't consume it at the end. Since rex2 has no content listed, so I think it is not useful to have rex2_used.
> For rex2.WR3X3B3, I think it can be ignored in rex2, which doesn't list what's there.
> 
> We currently use the following rules to disambiguate.
> 
> When the instruction has egpr, it means it has rex2 prefix and we don't print {rex2} for it. When it also has an evex version, we add {evex} to the evex instruction.
> When the instruction has no egpr, we need to print {rex2} for it. In case of evex instruction, we need to print {evex} for it.
> 
>             adc     $0x7b,%r8b
> {rex2} adc     $0x7b,%r8b
>             adc     $0x7b,%r16b
> {evex} adc     $0x7b,%r16b
> {evex} adc     $0x7b,%r8b

Hmm, I guess I see where your and my thinking diverges: You view REX2 more
like EVEX, when I view it more like REX. See more on this below.

>>>>>>> @@ -11086,8 +11155,11 @@ print_register (instr_info *ins, unsigned
>>>>>>> int
>>>>>> reg, unsigned int rexmask,
>>>>>>>      ins->illegal_masking = true;
>>>>>>>
>>>>>>>    USED_REX (rexmask);
>>>>>>> +  USED_REX2 (rexmask);
>>>>>>
>>>>>> Do both really need tracking separately? Whatever consumes REX.B
>>>>>> will also consume REX2.B4, an so on.
>>>>>>
>>>>> I was confused here, I think we only need to print {rex2} for the
>>>>> upper 4 bits
>>>> == *000, which means egpr is not used and we need to use {rex2} to
>>>> distinguish it from legacy encoding.  maybe we don’t need ((ins.rex2
>>>> & 0x7) ^ (ins.rex2_used & 0x7)) == 0, and nor USED_REX2 (rexmask). I
>>>> intend to delete them.
>>>>>
>>>>> +  /* Check if the REX2 prefix is used.  */
>>>>> +  if (ins.last_rex2_prefix >= 0
>>>>> +      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
>>>>> +	   && (ins.rex2 & 0x7))
>>>>
>>>> But that's the same you had before. I'm afraid I don't see what
>>>> you're trying to tell me.
>>>>
>>> After removing  " ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0 ",
>>> the code changes to
>>>
>>>   +  /* Check if the REX2 prefix is used.  */
>>>   +  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 0x7))
>>>
>>> When it is true, decode will not print the {rex2} for this insn.
>>
>> Yet ins.rex2 having any of the low 3 bits set says nothing about whether every
>> one of these was consumed while processing operands / suffixes.
>> You need to consult .rex{,2}_used; my earlier point was merely that you don't
>> need a separate .rex2_used; the bits in .rex_used are all you require to get
>> this right (as a consumer of, say, REX.X / REX2.X3 is also a consumer of
>> REX2.X4; leaving aside EVEX encoded insns for the moment).
>>
> 
> Here are some cases:
> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex will print it.
> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses rex2 prefix. We cannot see the information of the lower 4 bits of rex2
> 
> It is not helpful to judge rex_used in rex2.

So: REX2, like REX, is a prefix for legacy encodings. Therefore my view
is that it ought to be treated similarly to REX in the disassembler (I'm
fine to avoid the introduction of a myriad of rex2... [no figure braces]
prefixes in gas). Just like in the first line of what you present above
for the REX.B case, I'd expect the same for REX2. E.g., taking the 2nd
line from above

0:   d5 01 0f a8             {rex2.B3} push %gs

An alternative, matching your intention of treating REX2 more like EVEX,
would be to (subsequently, not right away) do away with rex.B and alike
as well, and only print {rex} in that case as well. Such an intention
would then want mentioning in the description (and eventually carrying
out). The main difference between REX/REX2 and VEX/EVEX, as I view it
(and as would be speaking against this alternative approach), is that
in VEX/EVEX it is kind of normal that certain bits are deliberately
ignored (and might be set either way - see gas'es command line options
actually allowing to drive the values for some of such ignored fields).
REX/REX2, otoh, shouldn't normally specify unused bits, much like other
legacy prefixes aren't expected to be present for no reason (and hence
are explicitly printed when present).

This explicit printing has, btw, a purpose beyond merely trying to not
hide information: If there was a related bug in the disassembler, this
extra information may allow the observer to still recognize what insn
(form) is actually being dealt with.

Jan
  
Jan Beulich Nov. 10, 2023, 9:21 a.m. UTC | #23
On 10.11.2023 10:14, Jan Beulich wrote:
> On 10.11.2023 08:11, Cui, Lili wrote:
>> Here are some cases:
>> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex will print it.
>> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
>> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses rex2 prefix. We cannot see the information of the lower 4 bits of rex2
>>
>> It is not helpful to judge rex_used in rex2.
> 
> So: REX2, like REX, is a prefix for legacy encodings. Therefore my view
> is that it ought to be treated similarly to REX in the disassembler (I'm
> fine to avoid the introduction of a myriad of rex2... [no figure braces]
> prefixes in gas). Just like in the first line of what you present above
> for the REX.B case, I'd expect the same for REX2. E.g., taking the 2nd
> line from above
> 
> 0:   d5 01 0f a8             {rex2.B3} push %gs
> 
> An alternative, matching your intention of treating REX2 more like EVEX,
> would be to (subsequently, not right away) do away with rex.B and alike
> as well, and only print {rex} in that case as well. Such an intention
> would then want mentioning in the description (and eventually carrying
> out). The main difference between REX/REX2 and VEX/EVEX, as I view it
> (and as would be speaking against this alternative approach), is that
> in VEX/EVEX it is kind of normal that certain bits are deliberately
> ignored (and might be set either way - see gas'es command line options
> actually allowing to drive the values for some of such ignored fields).
> REX/REX2, otoh, shouldn't normally specify unused bits, much like other
> legacy prefixes aren't expected to be present for no reason (and hence
> are explicitly printed when present).

However, irrespective of what I said above, please feel free to go ahead
with the simplified approach, to allow making progress. I'd like to have
H.J.'s input on how to achieve overall consistency, and we can make
further adjustments later on. But please make sure you actually mention
this aspect in the description of the patch.

Jan
  
Cui, Lili Nov. 10, 2023, 9:47 a.m. UTC | #24
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 10.11.2023 08:11, Cui, Lili wrote:
> >> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>
> >> On 09.11.2023 14:27, Cui, Lili wrote:
> >>>>>> Also is this, ...
> >>>>>>
> >>>>>>>      {
> >>>>>>>        unsigned char threebyte;
> >>>>>>>
> >>>>>>> -      ins.codep++;
> >>>>>>> -      if (!fetch_code (info, ins.codep + 1))
> >>>>>>> -	goto fetch_error_out;
> >>>>>>> +      if (!ins.rex2)
> >>>>>>> +	{
> >>>>>>> +	  ins.codep++;
> >>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
> >>>>>>> +	    goto fetch_error_out;
> >>>>>>> +	}
> >>>>>>>        threebyte = *ins.codep;
> >>>>>>>        dp = &dis386_twobyte[threebyte];
> >>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>>>
> >>>>>> ... all the way to here, really correct for d5 00 0f?
> >>>>>>
> >>>>>
> >>>>> I think the 0f here must indicate that it is the first byte of the
> >>>>> legacy map1
> >>>> instruction, meaning legacy map0 does not have 0f opcode. If this
> >>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
> >>>> If a bad binary does appear, our original code also has the same issue.
> >>>>>
> >>>>> static const struct dis386 dis386[] = { ...
> >>>>> / * 0f  */
> >>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
> >>>>
> >>>> No, this entry simply will never be used, because of how decoding is
> done.
> >>>> My comment was about what's going to happen if you encounter the d5
> >>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
> >>>> decoding, nor to read another opcode byte. In this case the table
> >>>> entry you quote above will need to come into play, not any entry
> >>>> from dis386_twobyte[]. (As long as both are Bad_Opcode the
> >>>> difference may not even be noticeable, but it would be a latent
> >>>> trap for someone to fall into down the road.)
> >>>>
> >>>
> >>>
> >>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> >>>     {
> >>>       unsigned char threebyte;
> >>>
> >>>       if (!ins.rex2)
> >>>         {
> >>>           ins.codep++;
> >>>           if (!fetch_code (info, ins.codep + 1))
> >>>             goto fetch_error_out;                                                      ---> When
> there
> >> are no bytes after 0f, it will jump to fetch error, but no error will be
> reported.
> >>>         }
> >>>       threebyte = *ins.codep;
> >>>       dp = &dis386_twobyte[threebyte];
> >>>       ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>       ins.codep++;
> >>>     }
> >>>
> >>> For d5 00 0f
> >>> Decode to:
> >>>    0:   d5                      rex2
> >>>    1:   00 0f                   add    %cl,(%rdi)
> >>
> >> But this would better have d5 00 0f all on the first line (it
> >> definitely needs to have d5 00 on the same line, as the bytes belong
> together), as opposed to ...
> >>
> >>> For 40 0f
> >>> Decode to:
> >>>    0:   40                      rex
> >>>    1:   0f                      .byte 0xf
> >>
> >> ... this where there truly is a known missing byte before we could
> >> proceed further. (It's still a little questionable to print REX
> >> separately in this case, but that's the way the binutils disassembler
> >> has always worked.)
> >>
> >> Yet to restate - to see what I mean, you'd need to populate at least
> >> one of the two 0f slots in the mentioned arrays. What I'm suspecting
> >> from the code as this patch version has it is that d5 00 0f would
> >> wrongly descend into dis386_twobyte[]. Yet you can tell that from it
> >> correctly using dis386[] only if the two 0f slots of these arrays are
> >> meaningfully different (or by actually looking at things in e.g.
> >> a debugger).
> >>
> >
> > I'm confused here, for d5 00 0f when it fetches the next byte after 0f it will
> find there is no byte there and then go to fetch_error_out and then it will
> return from print_insn and I don't have a chance to do anything for it. It
> cannot reach dis386_twobyte[].
> 
> But why would it even try to fetch the next byte? 0f already is the major
> opcode byte in this case. Fetching more can only mean either there's an entry
> in dis386[] specifying operands, or there's an attempt to index
> dis386_twobyte[]. Since dis386[] has Bad_Opcode at that slot, I conclude that
> what you say confirms my suspicion that dis386_twobyte[] is (attempted to
> be) used here.
> 

I don't know how to identify that 0f is the last byte of the binary, if we can get this information in advance, we can use dis386[] to report bad, in the current case, only when ins.codep++ and fetch code return error, then we can know 0f is the last byte, we should use dis386[] for it, but it has returned. This is what I'm confused about.

Lili.
  
Jan Beulich Nov. 10, 2023, 9:57 a.m. UTC | #25
On 10.11.2023 10:47, Cui, Lili wrote:
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 10.11.2023 08:11, Cui, Lili wrote:
>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>
>>>> On 09.11.2023 14:27, Cui, Lili wrote:
>>>>>>>> Also is this, ...
>>>>>>>>
>>>>>>>>>      {
>>>>>>>>>        unsigned char threebyte;
>>>>>>>>>
>>>>>>>>> -      ins.codep++;
>>>>>>>>> -      if (!fetch_code (info, ins.codep + 1))
>>>>>>>>> -	goto fetch_error_out;
>>>>>>>>> +      if (!ins.rex2)
>>>>>>>>> +	{
>>>>>>>>> +	  ins.codep++;
>>>>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
>>>>>>>>> +	    goto fetch_error_out;
>>>>>>>>> +	}
>>>>>>>>>        threebyte = *ins.codep;
>>>>>>>>>        dp = &dis386_twobyte[threebyte];
>>>>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>>>>
>>>>>>>> ... all the way to here, really correct for d5 00 0f?
>>>>>>>>
>>>>>>>
>>>>>>> I think the 0f here must indicate that it is the first byte of the
>>>>>>> legacy map1
>>>>>> instruction, meaning legacy map0 does not have 0f opcode. If this
>>>>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
>>>>>> If a bad binary does appear, our original code also has the same issue.
>>>>>>>
>>>>>>> static const struct dis386 dis386[] = { ...
>>>>>>> / * 0f  */
>>>>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
>>>>>>
>>>>>> No, this entry simply will never be used, because of how decoding is
>> done.
>>>>>> My comment was about what's going to happen if you encounter the d5
>>>>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
>>>>>> decoding, nor to read another opcode byte. In this case the table
>>>>>> entry you quote above will need to come into play, not any entry
>>>>>> from dis386_twobyte[]. (As long as both are Bad_Opcode the
>>>>>> difference may not even be noticeable, but it would be a latent
>>>>>> trap for someone to fall into down the road.)
>>>>>>
>>>>>
>>>>>
>>>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>>>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
>>>>>     {
>>>>>       unsigned char threebyte;
>>>>>
>>>>>       if (!ins.rex2)
>>>>>         {
>>>>>           ins.codep++;
>>>>>           if (!fetch_code (info, ins.codep + 1))
>>>>>             goto fetch_error_out;                                                      ---> When
>> there
>>>> are no bytes after 0f, it will jump to fetch error, but no error will be
>> reported.
>>>>>         }
>>>>>       threebyte = *ins.codep;
>>>>>       dp = &dis386_twobyte[threebyte];
>>>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>       ins.codep++;
>>>>>     }
>>>>>
>>>>> For d5 00 0f
>>>>> Decode to:
>>>>>    0:   d5                      rex2
>>>>>    1:   00 0f                   add    %cl,(%rdi)
>>>>
>>>> But this would better have d5 00 0f all on the first line (it
>>>> definitely needs to have d5 00 on the same line, as the bytes belong
>> together), as opposed to ...
>>>>
>>>>> For 40 0f
>>>>> Decode to:
>>>>>    0:   40                      rex
>>>>>    1:   0f                      .byte 0xf
>>>>
>>>> ... this where there truly is a known missing byte before we could
>>>> proceed further. (It's still a little questionable to print REX
>>>> separately in this case, but that's the way the binutils disassembler
>>>> has always worked.)
>>>>
>>>> Yet to restate - to see what I mean, you'd need to populate at least
>>>> one of the two 0f slots in the mentioned arrays. What I'm suspecting
>>>> from the code as this patch version has it is that d5 00 0f would
>>>> wrongly descend into dis386_twobyte[]. Yet you can tell that from it
>>>> correctly using dis386[] only if the two 0f slots of these arrays are
>>>> meaningfully different (or by actually looking at things in e.g.
>>>> a debugger).
>>>>
>>>
>>> I'm confused here, for d5 00 0f when it fetches the next byte after 0f it will
>> find there is no byte there and then go to fetch_error_out and then it will
>> return from print_insn and I don't have a chance to do anything for it. It
>> cannot reach dis386_twobyte[].
>>
>> But why would it even try to fetch the next byte? 0f already is the major
>> opcode byte in this case. Fetching more can only mean either there's an entry
>> in dis386[] specifying operands, or there's an attempt to index
>> dis386_twobyte[]. Since dis386[] has Bad_Opcode at that slot, I conclude that
>> what you say confirms my suspicion that dis386_twobyte[] is (attempted to
>> be) used here.
>>
> 
> I don't know how to identify that 0f is the last byte of the binary,

That's entirely irrelevant here. I gave the byte sequence d5 00 0f just
as the minimal one required to make my point. My original concern equally
applies to e.g. d5 00 0f 01 00, which may not use dis386_twobyte[0x01].

Jan

> if we can get this information in advance, we can use dis386[] to report bad, in the current case, only when ins.codep++ and fetch code return error, then we can know 0f is the last byte, we should use dis386[] for it, but it has returned. This is what I'm confused about.
> 
> Lili.
  
Cui, Lili Nov. 10, 2023, 12:05 p.m. UTC | #26
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 10, 2023 5:57 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 10.11.2023 10:47, Cui, Lili wrote:
> >> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>
> >> On 10.11.2023 08:11, Cui, Lili wrote:
> >>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>>>
> >>>> On 09.11.2023 14:27, Cui, Lili wrote:
> >>>>>>>> Also is this, ...
> >>>>>>>>
> >>>>>>>>>      {
> >>>>>>>>>        unsigned char threebyte;
> >>>>>>>>>
> >>>>>>>>> -      ins.codep++;
> >>>>>>>>> -      if (!fetch_code (info, ins.codep + 1))
> >>>>>>>>> -	goto fetch_error_out;
> >>>>>>>>> +      if (!ins.rex2)
> >>>>>>>>> +	{
> >>>>>>>>> +	  ins.codep++;
> >>>>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
> >>>>>>>>> +	    goto fetch_error_out;
> >>>>>>>>> +	}
> >>>>>>>>>        threebyte = *ins.codep;
> >>>>>>>>>        dp = &dis386_twobyte[threebyte];
> >>>>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>>>>>
> >>>>>>>> ... all the way to here, really correct for d5 00 0f?
> >>>>>>>>
> >>>>>>>
> >>>>>>> I think the 0f here must indicate that it is the first byte of
> >>>>>>> the legacy map1
> >>>>>> instruction, meaning legacy map0 does not have 0f opcode. If this
> >>>>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
> >>>>>> If a bad binary does appear, our original code also has the same
> issue.
> >>>>>>>
> >>>>>>> static const struct dis386 dis386[] = { ...
> >>>>>>> / * 0f  */
> >>>>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
> >>>>>>
> >>>>>> No, this entry simply will never be used, because of how decoding
> >>>>>> is
> >> done.
> >>>>>> My comment was about what's going to happen if you encounter the
> >>>>>> d5
> >>>>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
> >>>>>> decoding, nor to read another opcode byte. In this case the table
> >>>>>> entry you quote above will need to come into play, not any entry
> >>>>>> from dis386_twobyte[]. (As long as both are Bad_Opcode the
> >>>>>> difference may not even be noticeable, but it would be a latent
> >>>>>> trap for someone to fall into down the road.)
> >>>>>>
> >>>>>
> >>>>>
> >>>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >>>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> >>>>>     {
> >>>>>       unsigned char threebyte;
> >>>>>
> >>>>>       if (!ins.rex2)
> >>>>>         {
> >>>>>           ins.codep++;
> >>>>>           if (!fetch_code (info, ins.codep + 1))
> >>>>>             goto fetch_error_out;                                                      ---> When
> >> there
> >>>> are no bytes after 0f, it will jump to fetch error, but no error
> >>>> will be
> >> reported.
> >>>>>         }
> >>>>>       threebyte = *ins.codep;
> >>>>>       dp = &dis386_twobyte[threebyte];
> >>>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>>       ins.codep++;
> >>>>>     }
> >>>>>
> >>>>> For d5 00 0f
> >>>>> Decode to:
> >>>>>    0:   d5                      rex2
> >>>>>    1:   00 0f                   add    %cl,(%rdi)
> >>>>
> >>>> But this would better have d5 00 0f all on the first line (it
> >>>> definitely needs to have d5 00 on the same line, as the bytes
> >>>> belong
> >> together), as opposed to ...
> >>>>
> >>>>> For 40 0f
> >>>>> Decode to:
> >>>>>    0:   40                      rex
> >>>>>    1:   0f                      .byte 0xf
> >>>>
> >>>> ... this where there truly is a known missing byte before we could
> >>>> proceed further. (It's still a little questionable to print REX
> >>>> separately in this case, but that's the way the binutils
> >>>> disassembler has always worked.)
> >>>>
> >>>> Yet to restate - to see what I mean, you'd need to populate at
> >>>> least one of the two 0f slots in the mentioned arrays. What I'm
> >>>> suspecting from the code as this patch version has it is that d5 00
> >>>> 0f would wrongly descend into dis386_twobyte[]. Yet you can tell
> >>>> that from it correctly using dis386[] only if the two 0f slots of
> >>>> these arrays are meaningfully different (or by actually looking at things in
> e.g.
> >>>> a debugger).
> >>>>
> >>>
> >>> I'm confused here, for d5 00 0f when it fetches the next byte after
> >>> 0f it will
> >> find there is no byte there and then go to fetch_error_out and then
> >> it will return from print_insn and I don't have a chance to do
> >> anything for it. It cannot reach dis386_twobyte[].
> >>
> >> But why would it even try to fetch the next byte? 0f already is the
> >> major opcode byte in this case. Fetching more can only mean either
> >> there's an entry in dis386[] specifying operands, or there's an
> >> attempt to index dis386_twobyte[]. Since dis386[] has Bad_Opcode at
> >> that slot, I conclude that what you say confirms my suspicion that
> >> dis386_twobyte[] is (attempted to
> >> be) used here.
> >>
> >
> > I don't know how to identify that 0f is the last byte of the binary,
> 
> That's entirely irrelevant here. I gave the byte sequence d5 00 0f just as the
> minimal one required to make my point. My original concern equally applies
> to e.g. d5 00 0f 01 00, which may not use dis386_twobyte[0x01].
> 
Aha, I  got you. Changed the code to

   /* REX2.M in rex2 prefix represents map0 or map1.  */
-  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
+  if ((*ins.codep == 0x0f && ins.last_rex2_prefix < 0) || (ins.rex2 & REX2_M))

For d5 00 0f c0
0000000000000000 <_start>:
   0:   d5 00 0f                {rex2} (bad)
   3:   c0                      .byte 0xc0

Thanks,
Lili.

> 
> > if we can get this information in advance, we can use dis386[] to report bad,
> in the current case, only when ins.codep++ and fetch code return error, then
> we can know 0f is the last byte, we should use dis386[] for it, but it has
> returned. This is what I'm confused about.
> >
> > Lili.
  
Jan Beulich Nov. 10, 2023, 12:35 p.m. UTC | #27
On 10.11.2023 13:05, Cui, Lili wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, November 10, 2023 5:57 PM
>> To: Cui, Lili <lili.cui@intel.com>
>> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
>> binutils@sourceware.org
>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>
>> On 10.11.2023 10:47, Cui, Lili wrote:
>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>
>>>> On 10.11.2023 08:11, Cui, Lili wrote:
>>>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>>>
>>>>>> On 09.11.2023 14:27, Cui, Lili wrote:
>>>>>>>>>> Also is this, ...
>>>>>>>>>>
>>>>>>>>>>>      {
>>>>>>>>>>>        unsigned char threebyte;
>>>>>>>>>>>
>>>>>>>>>>> -      ins.codep++;
>>>>>>>>>>> -      if (!fetch_code (info, ins.codep + 1))
>>>>>>>>>>> -	goto fetch_error_out;
>>>>>>>>>>> +      if (!ins.rex2)
>>>>>>>>>>> +	{
>>>>>>>>>>> +	  ins.codep++;
>>>>>>>>>>> +	  if (!fetch_code (info, ins.codep + 1))
>>>>>>>>>>> +	    goto fetch_error_out;
>>>>>>>>>>> +	}
>>>>>>>>>>>        threebyte = *ins.codep;
>>>>>>>>>>>        dp = &dis386_twobyte[threebyte];
>>>>>>>>>>>        ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>>>>>>
>>>>>>>>>> ... all the way to here, really correct for d5 00 0f?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think the 0f here must indicate that it is the first byte of
>>>>>>>>> the legacy map1
>>>>>>>> instruction, meaning legacy map0 does not have 0f opcode. If this
>>>>>>>> instruction has a rex2 prefix, rex2.w must be 1 and should be d5 80.
>>>>>>>> If a bad binary does appear, our original code also has the same
>> issue.
>>>>>>>>>
>>>>>>>>> static const struct dis386 dis386[] = { ...
>>>>>>>>> / * 0f  */
>>>>>>>>> { Bad_Opcode },       /* 0x0f extended opcode escape */
>>>>>>>>
>>>>>>>> No, this entry simply will never be used, because of how decoding
>>>>>>>> is
>>>> done.
>>>>>>>> My comment was about what's going to happen if you encounter the
>>>>>>>> d5
>>>>>>>> 00 0f byte sequence. That's _not_ an indication to use map1 for
>>>>>>>> decoding, nor to read another opcode byte. In this case the table
>>>>>>>> entry you quote above will need to come into play, not any entry
>>>>>>>> from dis386_twobyte[]. (As long as both are Bad_Opcode the
>>>>>>>> difference may not even be noticeable, but it would be a latent
>>>>>>>> trap for someone to fall into down the road.)
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
>>>>>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
>>>>>>>     {
>>>>>>>       unsigned char threebyte;
>>>>>>>
>>>>>>>       if (!ins.rex2)
>>>>>>>         {
>>>>>>>           ins.codep++;
>>>>>>>           if (!fetch_code (info, ins.codep + 1))
>>>>>>>             goto fetch_error_out;                                                      ---> When
>>>> there
>>>>>> are no bytes after 0f, it will jump to fetch error, but no error
>>>>>> will be
>>>> reported.
>>>>>>>         }
>>>>>>>       threebyte = *ins.codep;
>>>>>>>       dp = &dis386_twobyte[threebyte];
>>>>>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
>>>>>>>       ins.codep++;
>>>>>>>     }
>>>>>>>
>>>>>>> For d5 00 0f
>>>>>>> Decode to:
>>>>>>>    0:   d5                      rex2
>>>>>>>    1:   00 0f                   add    %cl,(%rdi)
>>>>>>
>>>>>> But this would better have d5 00 0f all on the first line (it
>>>>>> definitely needs to have d5 00 on the same line, as the bytes
>>>>>> belong
>>>> together), as opposed to ...
>>>>>>
>>>>>>> For 40 0f
>>>>>>> Decode to:
>>>>>>>    0:   40                      rex
>>>>>>>    1:   0f                      .byte 0xf
>>>>>>
>>>>>> ... this where there truly is a known missing byte before we could
>>>>>> proceed further. (It's still a little questionable to print REX
>>>>>> separately in this case, but that's the way the binutils
>>>>>> disassembler has always worked.)
>>>>>>
>>>>>> Yet to restate - to see what I mean, you'd need to populate at
>>>>>> least one of the two 0f slots in the mentioned arrays. What I'm
>>>>>> suspecting from the code as this patch version has it is that d5 00
>>>>>> 0f would wrongly descend into dis386_twobyte[]. Yet you can tell
>>>>>> that from it correctly using dis386[] only if the two 0f slots of
>>>>>> these arrays are meaningfully different (or by actually looking at things in
>> e.g.
>>>>>> a debugger).
>>>>>>
>>>>>
>>>>> I'm confused here, for d5 00 0f when it fetches the next byte after
>>>>> 0f it will
>>>> find there is no byte there and then go to fetch_error_out and then
>>>> it will return from print_insn and I don't have a chance to do
>>>> anything for it. It cannot reach dis386_twobyte[].
>>>>
>>>> But why would it even try to fetch the next byte? 0f already is the
>>>> major opcode byte in this case. Fetching more can only mean either
>>>> there's an entry in dis386[] specifying operands, or there's an
>>>> attempt to index dis386_twobyte[]. Since dis386[] has Bad_Opcode at
>>>> that slot, I conclude that what you say confirms my suspicion that
>>>> dis386_twobyte[] is (attempted to
>>>> be) used here.
>>>>
>>>
>>> I don't know how to identify that 0f is the last byte of the binary,
>>
>> That's entirely irrelevant here. I gave the byte sequence d5 00 0f just as the
>> minimal one required to make my point. My original concern equally applies
>> to e.g. d5 00 0f 01 00, which may not use dis386_twobyte[0x01].
>>
> Aha, I  got you. Changed the code to
> 
>    /* REX2.M in rex2 prefix represents map0 or map1.  */
> -  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> +  if ((*ins.codep == 0x0f && ins.last_rex2_prefix < 0) || (ins.rex2 & REX2_M))

Would you mind considering

  if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))

as an alternative?

Jan
  
Cui, Lili Nov. 10, 2023, 12:38 p.m. UTC | #28
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 10, 2023 5:22 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 10.11.2023 10:14, Jan Beulich wrote:
> > On 10.11.2023 08:11, Cui, Lili wrote:
> >> Here are some cases:
> >> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex
> will print it.
> >> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to
> print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
> >> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it
> uses rex2 prefix. We cannot see the information of the lower 4 bits of rex2
> >>
> >> It is not helpful to judge rex_used in rex2.
> >
> > So: REX2, like REX, is a prefix for legacy encodings. Therefore my
> > view is that it ought to be treated similarly to REX in the
> > disassembler (I'm fine to avoid the introduction of a myriad of
> > rex2... [no figure braces] prefixes in gas). Just like in the first
> > line of what you present above for the REX.B case, I'd expect the same
> > for REX2. E.g., taking the 2nd line from above
> >
> > 0:   d5 01 0f a8             {rex2.B3} push %gs
> >
> > An alternative, matching your intention of treating REX2 more like
> > EVEX, would be to (subsequently, not right away) do away with rex.B
> > and alike as well, and only print {rex} in that case as well. Such an
> > intention would then want mentioning in the description (and
> > eventually carrying out). The main difference between REX/REX2 and
> > VEX/EVEX, as I view it (and as would be speaking against this
> > alternative approach), is that in VEX/EVEX it is kind of normal that
> > certain bits are deliberately ignored (and might be set either way -
> > see gas'es command line options actually allowing to drive the values for
> some of such ignored fields).
> > REX/REX2, otoh, shouldn't normally specify unused bits, much like
> > other legacy prefixes aren't expected to be present for no reason (and
> > hence are explicitly printed when present).
> 
> However, irrespective of what I said above, please feel free to go ahead with
> the simplified approach, to allow making progress. I'd like to have H.J.'s input
> on how to achieve overall consistency, and we can make further adjustments
> later on. But please make sure you actually mention this aspect in the
> description of the patch.
> 

OK, I'll add a description of the current implementation.

Thanks,
Lili.
  
Cui, Lili Nov. 13, 2023, 12:18 a.m. UTC | #29
> >>>>>>>   /* REX2.M in rex2 prefix represents map0 or map1.  */
> >>>>>>>   if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> >>>>>>>     {
> >>>>>>>       unsigned char threebyte;
> >>>>>>>
> >>>>>>>       if (!ins.rex2)
> >>>>>>>         {
> >>>>>>>           ins.codep++;
> >>>>>>>           if (!fetch_code (info, ins.codep + 1))
> >>>>>>>             goto fetch_error_out;                                                      ---> When
> >>>> there
> >>>>>> are no bytes after 0f, it will jump to fetch error, but no error
> >>>>>> will be
> >>>> reported.
> >>>>>>>         }
> >>>>>>>       threebyte = *ins.codep;
> >>>>>>>       dp = &dis386_twobyte[threebyte];
> >>>>>>>       ins.need_modrm = twobyte_has_modrm[threebyte];
> >>>>>>>       ins.codep++;
> >>>>>>>     }
> >>>>>>>
> >>>>>>> For d5 00 0f
> >>>>>>> Decode to:
> >>>>>>>    0:   d5                      rex2
> >>>>>>>    1:   00 0f                   add    %cl,(%rdi)
> >>>>>>
> >>>>>> But this would better have d5 00 0f all on the first line (it
> >>>>>> definitely needs to have d5 00 on the same line, as the bytes
> >>>>>> belong
> >>>> together), as opposed to ...
> >>>>>>
> >>>>>>> For 40 0f
> >>>>>>> Decode to:
> >>>>>>>    0:   40                      rex
> >>>>>>>    1:   0f                      .byte 0xf
> >>>>>>
> >>>>>> ... this where there truly is a known missing byte before we
> >>>>>> could proceed further. (It's still a little questionable to print
> >>>>>> REX separately in this case, but that's the way the binutils
> >>>>>> disassembler has always worked.)
> >>>>>>
> >>>>>> Yet to restate - to see what I mean, you'd need to populate at
> >>>>>> least one of the two 0f slots in the mentioned arrays. What I'm
> >>>>>> suspecting from the code as this patch version has it is that d5
> >>>>>> 00 0f would wrongly descend into dis386_twobyte[]. Yet you can
> >>>>>> tell that from it correctly using dis386[] only if the two 0f
> >>>>>> slots of these arrays are meaningfully different (or by actually
> >>>>>> looking at things in
> >> e.g.
> >>>>>> a debugger).
> >>>>>>
> >>>>>
> >>>>> I'm confused here, for d5 00 0f when it fetches the next byte
> >>>>> after 0f it will
> >>>> find there is no byte there and then go to fetch_error_out and then
> >>>> it will return from print_insn and I don't have a chance to do
> >>>> anything for it. It cannot reach dis386_twobyte[].
> >>>>
> >>>> But why would it even try to fetch the next byte? 0f already is the
> >>>> major opcode byte in this case. Fetching more can only mean either
> >>>> there's an entry in dis386[] specifying operands, or there's an
> >>>> attempt to index dis386_twobyte[]. Since dis386[] has Bad_Opcode at
> >>>> that slot, I conclude that what you say confirms my suspicion that
> >>>> dis386_twobyte[] is (attempted to
> >>>> be) used here.
> >>>>
> >>>
> >>> I don't know how to identify that 0f is the last byte of the binary,
> >>
> >> That's entirely irrelevant here. I gave the byte sequence d5 00 0f
> >> just as the minimal one required to make my point. My original
> >> concern equally applies to e.g. d5 00 0f 01 00, which may not use
> dis386_twobyte[0x01].
> >>
> > Aha, I  got you. Changed the code to
> >
> >    /* REX2.M in rex2 prefix represents map0 or map1.  */
> > -  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
> > +  if ((*ins.codep == 0x0f && ins.last_rex2_prefix < 0) || (ins.rex2 &
> > + REX2_M))
> 
> Would you mind considering
> 
>   if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
> 
> as an alternative?
> 

Done, thanks!

Lili
  
Cui, Lili Dec. 14, 2023, 10:13 a.m. UTC | #30
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 10, 2023 5:22 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; ccoutant@gmail.com;
> binutils@sourceware.org
> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 10.11.2023 10:14, Jan Beulich wrote:
> > On 10.11.2023 08:11, Cui, Lili wrote:
> >> Here are some cases:
> >> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex
> will print it.
> >> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to
> print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
> >> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses
> rex2 prefix. We cannot see the information of the lower 4 bits of rex2
> >>
> >> It is not helpful to judge rex_used in rex2.
> >
> > So: REX2, like REX, is a prefix for legacy encodings. Therefore my
> > view is that it ought to be treated similarly to REX in the
> > disassembler (I'm fine to avoid the introduction of a myriad of
> > rex2... [no figure braces] prefixes in gas). Just like in the first
> > line of what you present above for the REX.B case, I'd expect the same
> > for REX2. E.g., taking the 2nd line from above
> >
> > 0:   d5 01 0f a8             {rex2.B3} push %gs
> >
> > An alternative, matching your intention of treating REX2 more like
> > EVEX, would be to (subsequently, not right away) do away with rex.B
> > and alike as well, and only print {rex} in that case as well. Such an
> > intention would then want mentioning in the description (and
> > eventually carrying out). The main difference between REX/REX2 and
> > VEX/EVEX, as I view it (and as would be speaking against this
> > alternative approach), is that in VEX/EVEX it is kind of normal that
> > certain bits are deliberately ignored (and might be set either way -
> > see gas'es command line options actually allowing to drive the values for
> some of such ignored fields).
> > REX/REX2, otoh, shouldn't normally specify unused bits, much like
> > other legacy prefixes aren't expected to be present for no reason (and
> > hence are explicitly printed when present).
> 
> However, irrespective of what I said above, please feel free to go ahead with
> the simplified approach, to allow making progress. I'd like to have H.J.'s input
> on how to achieve overall consistency, and we can make further adjustments
> later on. But please make sure you actually mention this aspect in the
> description of the patch.
> 

Hi Jan,

We are going to use {rex2 0xXX}.

Lili.
  
Jan Beulich Dec. 18, 2023, 3:24 p.m. UTC | #31
On 14.12.2023 11:13, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, November 10, 2023 5:22 PM
>>
>> On 10.11.2023 10:14, Jan Beulich wrote:
>>> On 10.11.2023 08:11, Cui, Lili wrote:
>>>> Here are some cases:
>>>> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex
>> will print it.
>>>> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to
>> print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
>>>> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses
>> rex2 prefix. We cannot see the information of the lower 4 bits of rex2
>>>>
>>>> It is not helpful to judge rex_used in rex2.
>>>
>>> So: REX2, like REX, is a prefix for legacy encodings. Therefore my
>>> view is that it ought to be treated similarly to REX in the
>>> disassembler (I'm fine to avoid the introduction of a myriad of
>>> rex2... [no figure braces] prefixes in gas). Just like in the first
>>> line of what you present above for the REX.B case, I'd expect the same
>>> for REX2. E.g., taking the 2nd line from above
>>>
>>> 0:   d5 01 0f a8             {rex2.B3} push %gs
>>>
>>> An alternative, matching your intention of treating REX2 more like
>>> EVEX, would be to (subsequently, not right away) do away with rex.B
>>> and alike as well, and only print {rex} in that case as well. Such an
>>> intention would then want mentioning in the description (and
>>> eventually carrying out). The main difference between REX/REX2 and
>>> VEX/EVEX, as I view it (and as would be speaking against this
>>> alternative approach), is that in VEX/EVEX it is kind of normal that
>>> certain bits are deliberately ignored (and might be set either way -
>>> see gas'es command line options actually allowing to drive the values for
>> some of such ignored fields).
>>> REX/REX2, otoh, shouldn't normally specify unused bits, much like
>>> other legacy prefixes aren't expected to be present for no reason (and
>>> hence are explicitly printed when present).
>>
>> However, irrespective of what I said above, please feel free to go ahead with
>> the simplified approach, to allow making progress. I'd like to have H.J.'s input
>> on how to achieve overall consistency, and we can make further adjustments
>> later on. But please make sure you actually mention this aspect in the
>> description of the patch.
> 
> We are going to use {rex2 0xXX}.

So that's going to be {rex2} alone as equivalent to {rex} and {rex2 0xXX}
as equivalent to rex (no braces)?

Independent of that I'm of course curious how you're meaning to deal with
- collisions with REX2 bits already set for other reasons, and
- yet more specifically control over REX2.M.

Jan
  
H.J. Lu Dec. 18, 2023, 4:23 p.m. UTC | #32
On Mon, Dec 18, 2023 at 7:24 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 14.12.2023 11:13, Cui, Lili wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Friday, November 10, 2023 5:22 PM
> >>
> >> On 10.11.2023 10:14, Jan Beulich wrote:
> >>> On 10.11.2023 08:11, Cui, Lili wrote:
> >>>> Here are some cases:
> >>>> 0:   41 0f a8                  rex.B push %gs         ---> rex.B was not consumed, rex
> >> will print it.
> >>>> 0:   d5 01 0f a8             {rex2} push %gs      ----> Without egpr, we need to
> >> print {rex2} for it. But we can't see anything about REX2.B3 from the prefix.
> >>>> 4:   d5 19 58                  pop    %r24             ----> There is egpr, we know it uses
> >> rex2 prefix. We cannot see the information of the lower 4 bits of rex2
> >>>>
> >>>> It is not helpful to judge rex_used in rex2.
> >>>
> >>> So: REX2, like REX, is a prefix for legacy encodings. Therefore my
> >>> view is that it ought to be treated similarly to REX in the
> >>> disassembler (I'm fine to avoid the introduction of a myriad of
> >>> rex2... [no figure braces] prefixes in gas). Just like in the first
> >>> line of what you present above for the REX.B case, I'd expect the same
> >>> for REX2. E.g., taking the 2nd line from above
> >>>
> >>> 0:   d5 01 0f a8             {rex2.B3} push %gs
> >>>
> >>> An alternative, matching your intention of treating REX2 more like
> >>> EVEX, would be to (subsequently, not right away) do away with rex.B
> >>> and alike as well, and only print {rex} in that case as well. Such an
> >>> intention would then want mentioning in the description (and
> >>> eventually carrying out). The main difference between REX/REX2 and
> >>> VEX/EVEX, as I view it (and as would be speaking against this
> >>> alternative approach), is that in VEX/EVEX it is kind of normal that
> >>> certain bits are deliberately ignored (and might be set either way -
> >>> see gas'es command line options actually allowing to drive the values for
> >> some of such ignored fields).
> >>> REX/REX2, otoh, shouldn't normally specify unused bits, much like
> >>> other legacy prefixes aren't expected to be present for no reason (and
> >>> hence are explicitly printed when present).
> >>
> >> However, irrespective of what I said above, please feel free to go ahead with
> >> the simplified approach, to allow making progress. I'd like to have H.J.'s input
> >> on how to achieve overall consistency, and we can make further adjustments
> >> later on. But please make sure you actually mention this aspect in the
> >> description of the patch.
> >
> > We are going to use {rex2 0xXX}.
>
> So that's going to be {rex2} alone as equivalent to {rex} and {rex2 0xXX}
> as equivalent to rex (no braces)?
>
> Independent of that I'm of course curious how you're meaning to deal with
> - collisions with REX2 bits already set for other reasons, and
> - yet more specifically control over REX2.M.
>

This is only for disassembler to display the unused REX2 bits.
  

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 714354d5116..3d917c34d15 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -234,6 +234,7 @@  enum i386_error
     operand_size_mismatch,
     operand_type_mismatch,
     register_type_mismatch,
+    register_type_of_address_mismatch,
     number_of_operands_mismatch,
     invalid_instruction_suffix,
     bad_imm4,
@@ -247,6 +248,7 @@  enum i386_error
     invalid_vector_register_set,
     invalid_tmm_register_set,
     invalid_dest_and_src_register_set,
+    invalid_pseudo_prefix,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -354,6 +356,7 @@  struct _i386_insn
     modrm_byte rm;
     rex_byte rex;
     rex_byte vrex;
+    rex_byte rex2;
     sib_byte sib;
     vex_prefix vex;
 
@@ -406,6 +409,11 @@  struct _i386_insn
     /* Compressed disp8*N attribute.  */
     unsigned int memshift;
 
+    /* No CSPAZO flags update.*/
+    bool has_nf;
+
+    bool has_zero_upper;
+
     /* Prefer load or store in encoding.  */
     enum
       {
@@ -427,6 +435,9 @@  struct _i386_insn
     /* Prefer the REX byte in encoding.  */
     bool rex_encoding;
 
+    /* Prefer the REX2 prefix in encoding.  */
+    bool rex2_encoding;
+
     /* Disable instruction size optimization.  */
     bool no_optimize;
 
@@ -1164,6 +1175,7 @@  static const arch_entry cpu_arch[] =
   VECARCH (sm4, SM4, ANY_SM4, reset),
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
   VECARCH (avx10.1, AVX10_1, ANY_AVX512F, set),
+  SUBARCH (apx_f, APX_F, APX_F, false),
 };
 
 #undef SUBARCH
@@ -1693,6 +1705,7 @@  is_cpu (const insn_template *t, enum i386_cpu cpu)
     case CpuHLE:      return t->cpu.bitfield.cpuhle;
     case CpuAVX512F:  return t->cpu.bitfield.cpuavx512f;
     case CpuAVX512VL: return t->cpu.bitfield.cpuavx512vl;
+    case CpuAPX_F:    return t->cpu.bitfield.cpuapx_f;
     case Cpu64:       return t->cpu.bitfield.cpu64;
     case CpuNo64:     return t->cpu.bitfield.cpuno64;
     default:
@@ -2375,6 +2388,9 @@  register_number (const reg_entry *r)
   if (r->reg_flags & RegRex)
     nr += 8;
 
+  if (r->reg_flags & RegRex2)
+    nr += 16;
+
   if (r->reg_flags & RegVRex)
     nr += 16;
 
@@ -3890,6 +3906,8 @@  build_vex_prefix (const insn_template *t)
 static INLINE bool
 is_evex_encoding (const insn_template *t)
 {
+   /* When modifying this function, you also need to modify the evex judgment
+      part of process_i386_opcode_modifier in i386-gen.c.  */
   return t->opcode_modifier.evex || t->opcode_modifier.disp8memshift
 	 || t->opcode_modifier.broadcast || t->opcode_modifier.masking
 	 || t->opcode_modifier.sae;
@@ -3901,6 +3919,12 @@  is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || is_evex_encoding (t);
 }
 
+static INLINE bool
+is_any_apx_rex2_encoding (void)
+{
+  return i.rex2 || i.rex2_encoding;
+}
+
 static unsigned int
 get_broadcast_bytes (const insn_template *t, bool diag)
 {
@@ -4158,6 +4182,19 @@  build_evex_prefix (void)
     i.vex.bytes[3] |= i.mask.reg->reg_num;
 }
 
+/* Build (2 bytes) rex2 prefix.
+   | D5h |
+   | m | R4 X4 B4 | W R X B |
+*/
+static void
+build_rex2_prefix (void)
+{
+  i.vex.length = 2;
+  i.vex.bytes[0] = 0xd5;
+  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
+		    | (i.rex2 << 4) | i.rex);
+}
+
 static void
 process_immext (void)
 {
@@ -4423,12 +4460,16 @@  optimize_encoding (void)
 	  i.suffix = 0;
 	  /* Convert to byte registers.  */
 	  if (i.types[1].bitfield.word)
-	    j = 16;
-	  else if (i.types[1].bitfield.dword)
+	    /* There are 32 8-bit registers.  */
 	    j = 32;
+	  else if (i.types[1].bitfield.dword)
+	    /* 32 8-bit registers + 32 16-bit registers.  */
+	    j = 64;
 	  else
-	    j = 48;
-	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
+	    /* 32 8-bit registers + 32 16-bit registers
+	       + 32 32-bit registers.  */
+	    j = 96;
+	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
 	    j += 8;
 	  i.op[1].regs -= j;
 	}
@@ -5278,6 +5319,9 @@  md_assemble (char *line)
 	case register_type_mismatch:
 	  err_msg = _("register type mismatch");
 	  break;
+	case register_type_of_address_mismatch:
+	  err_msg = _("register type of address mismatch");
+	  break;
 	case number_of_operands_mismatch:
 	  err_msg = _("number of operands mismatch");
 	  break;
@@ -5340,6 +5384,9 @@  md_assemble (char *line)
 	case invalid_dest_and_src_register_set:
 	  err_msg = _("destination and source registers must be distinct");
 	  break;
+	case invalid_pseudo_prefix:
+	  err_msg = _("rex2 pseudo prefix cannot be used here");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -5578,7 +5625,7 @@  md_assemble (char *line)
       as_warn (_("translating to `%sp'"), insn_name (&i.tm));
     }
 
-  if (is_any_vex_encoding (&i.tm))
+ if (is_any_vex_encoding (&i.tm))
     {
       if (!cpu_arch_flags.bitfield.cpui286)
 	{
@@ -5594,6 +5641,13 @@  md_assemble (char *line)
 	  return;
 	}
 
+      /* Check for explicit REX2 prefix.  */
+      if (i.rex2 || i.rex2_encoding)
+	{
+	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
+	  return;
+	}
+
       if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
@@ -5633,11 +5687,11 @@  md_assemble (char *line)
 	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
 	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && i.rex != 0))
+	  && (i.rex != 0 || i.rex2 != 0)))
     {
       int x;
-
-      i.rex |= REX_OPCODE;
+      if (!i.rex2)
+	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
 	  /* Look for 8 bit operand that uses old registers.  */
@@ -5647,9 +5701,11 @@  md_assemble (char *line)
 	      gas_assert (!(i.op[x].regs->reg_flags & RegRex));
 	      /* In case it is "hi" register, give up.  */
 	      if (i.op[x].regs->reg_num > 3)
-		as_bad (_("can't encode register '%s%s' in an "
-			  "instruction requiring REX prefix."),
-			register_prefix, i.op[x].regs->reg_name);
+		{
+		  as_bad (_("can't encode register '%s%s' in an "
+			    "instruction requiring REX/REX2 prefix."),
+			  register_prefix, i.op[x].regs->reg_name);
+		}
 
 	      /* Otherwise it is equivalent to the extended register.
 		 Since the encoding doesn't change this is merely
@@ -5660,11 +5716,11 @@  md_assemble (char *line)
 	}
     }
 
-  if (i.rex == 0 && i.rex_encoding)
+  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 && i.rex2_encoding))
     {
       /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
 	 that uses legacy register.  If it is "hi" register, don't add
-	 the REX_OPCODE byte.  */
+	 rex and rex2 prefix.  */
       int x;
       for (x = 0; x < 2; x++)
 	if (i.types[x].bitfield.class == Reg
@@ -5674,6 +5730,7 @@  md_assemble (char *line)
 	  {
 	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
 	    i.rex_encoding = false;
+	    i.rex2_encoding = false;
 	    break;
 	  }
 
@@ -5681,7 +5738,13 @@  md_assemble (char *line)
 	i.rex = REX_OPCODE;
     }
 
-  if (i.rex != 0)
+  if (i.rex2 != 0 || i.rex2_encoding)
+    {
+      build_rex2_prefix ();
+      /* The individual REX.RXBW bits got consumed.  */
+      i.rex &= REX_OPCODE;
+    }
+  else if (i.rex != 0)
     add_prefix (REX_OPCODE | i.rex);
 
   insert_lfence_before ();
@@ -5852,6 +5915,10 @@  parse_insn (const char *line, char *mnemonic, bool prefix_only)
 		  /* {rex} */
 		  i.rex_encoding = true;
 		  break;
+		case Prefix_REX2:
+		  /* {rex2} */
+		  i.rex2_encoding = true;
+		  break;
 		case Prefix_NoOptimize:
 		  /* {nooptimize} */
 		  i.no_optimize = true;
@@ -6989,6 +7056,44 @@  VEX_check_encoding (const insn_template *t)
   return 0;
 }
 
+/* Check if Egprs operands are valid for the instruction.  */
+
+static int
+check_EgprOperands (const insn_template *t)
+{
+  if (t->opcode_modifier.noegpr)
+    {
+      for (unsigned int op = 0; op < i.operands; op++)
+	{
+	  if (i.types[op].bitfield.class != Reg
+	      /* Special case for (%dx) while doing input/output op */
+	      || i.input_output_operand)
+	    continue;
+
+	  if (i.op[op].regs->reg_flags & RegRex2)
+	    {
+	      i.error = register_type_mismatch;
+	      return 1;
+	    }
+	}
+
+      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
+	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
+	{
+	  i.error = register_type_of_address_mismatch;
+	  return 1;
+	}
+
+      /* Check pseudo prefix {rex2} are valid.  */
+      if (i.rex2_encoding)
+	{
+	  i.error = invalid_pseudo_prefix;
+	  return 1;
+	}
+    }
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7125,7 +7230,7 @@  match_template (char mnem_suffix)
       /* Do not verify operands when there are none.  */
       if (!t->operands)
 	{
-	  if (VEX_check_encoding (t))
+	  if (VEX_check_encoding (t) || check_EgprOperands (t))
 	    {
 	      specific_error = progress (i.error);
 	      continue;
@@ -7461,6 +7566,13 @@  match_template (char mnem_suffix)
 	  continue;
 	}
 
+      /* Check if EGRPS operands(r16-r31) are valid.  */
+      if (check_EgprOperands (t))
+	{
+	  specific_error = progress (i.error);
+	  continue;
+	}
+
       /* Check whether to use the shorter VEX encoding for certain insns where
 	 the EVEX enconding comes first in the table.  This requires the respective
 	 AVX-* feature to be explicitly enabled.  */
@@ -8363,6 +8475,18 @@  static INLINE void set_rex_vrex (const reg_entry *r, unsigned int rex_bit,
 
   if (r->reg_flags & RegVRex)
     i.vrex |= rex_bit;
+
+  if (r->reg_flags & RegRex2)
+    i.rex2 |= rex_bit;
+}
+
+static INLINE void
+set_rex_rex2 (const reg_entry *r, unsigned int rex_bit)
+{
+  if ((r->reg_flags & RegRex) != 0)
+    i.rex |= rex_bit;
+  if ((r->reg_flags & RegRex2) != 0)
+    i.rex2 |= rex_bit;
 }
 
 static int
@@ -8846,8 +8970,7 @@  build_modrm_byte (void)
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
 		  i.types[op] = operand_type_and_not (i.types[op], anydisp);
 		  i.types[op].bitfield.disp32 = 1;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 	    }
 	  /* RIP addressing for 64bit mode.  */
@@ -8918,8 +9041,7 @@  build_modrm_byte (void)
 
 	      if (!i.tm.opcode_modifier.sib)
 		i.rm.regmem = i.base_reg->reg_num;
-	      if ((i.base_reg->reg_flags & RegRex) != 0)
-		i.rex |= REX_B;
+	      set_rex_rex2 (i.base_reg, REX_B);
 	      i.sib.base = i.base_reg->reg_num;
 	      /* x86-64 ignores REX prefix bit here to avoid decoder
 		 complications.  */
@@ -8957,8 +9079,7 @@  build_modrm_byte (void)
 		  else
 		    i.sib.index = i.index_reg->reg_num;
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 
 	      if (i.disp_operands
@@ -10105,6 +10226,12 @@  output_insn (void)
 	  for (j = ARRAY_SIZE (i.prefix), q = i.prefix; j > 0; j--, q++)
 	    if (*q)
 	      frag_opcode_byte (*q);
+
+	  if (is_any_apx_rex2_encoding ())
+	    {
+	      frag_opcode_byte (i.vex.bytes[0]);
+	      frag_opcode_byte (i.vex.bytes[1]);
+	    }
 	}
       else
 	{
@@ -14131,6 +14258,13 @@  static bool check_register (const reg_entry *r)
 	i.vec_encoding = vex_encoding_error;
     }
 
+  if (r->reg_flags & RegRex2)
+    {
+      if (!cpu_arch_flags.bitfield.cpuapx_f
+	  || flag_code != CODE_64BIT)
+	return false;
+    }
+
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
       && (!cpu_arch_flags.bitfield.cpu64
 	  || r->reg_type.bitfield.class != RegCR
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index b04e1b00b4b..5d79a332f53 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -216,6 +216,7 @@  accept various extension mnemonics.  For example,
 @code{avx10.1/512},
 @code{avx10.1/256},
 @code{avx10.1/128},
+@code{apx},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1662,7 +1663,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
 @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
 @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
-@item @samp{.tlbsync}
+@item @samp{.tlbsync} @tab @samp{.apx}
 @end multitable
 
 Apart from the warning, there are only two other effects on
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
index a2b09d2e74f..605548285f2 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
@@ -11,11 +11,11 @@  Disassembly of section .text:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
 0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
 0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
 0+5 <aam0>:
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
index 5a17b0b412e..c9d3f2fdbb6 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
@@ -11,11 +11,11 @@  Disassembly of section .text:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
 0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
 0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
 0+5 <aam0>:
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
new file mode 100644
index 00000000000..c69d01b099a
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -0,0 +1,23 @@ 
+.*: Assembler messages:
+.*:4: Error: bad register name `%r17d'
+.*:7: Error: register type of address mismatch for `xsave'
+.*:8: Error: register type of address mismatch for `xsave64'
+.*:9: Error: register type of address mismatch for `xrstor'
+.*:10: Error: register type of address mismatch for `xrstor64'
+.*:11: Error: register type of address mismatch for `xsaves'
+.*:12: Error: register type of address mismatch for `xsaves64'
+.*:13: Error: register type of address mismatch for `xrstors'
+.*:14: Error: register type of address mismatch for `xrstors64'
+.*:15: Error: register type of address mismatch for `xsaveopt'
+.*:16: Error: register type of address mismatch for `xsaveopt64'
+.*:17: Error: register type of address mismatch for `xsavec'
+.*:18: Error: register type of address mismatch for `xsavec64'
+GAS LISTING .*
+#...
+[ 	]*1[ 	]+\# Check Illegal 64bit APX_F instructions
+[ 	]*2[ 	]+\.text
+[ 	]*3[ 	]+\.arch \.noapx_f
+[ 	]*4[ 	]+test    \$0x7, %r17d
+[ 	]*5[ 	]+\.arch \.apx_f
+[ 	]*6[ 	]+\?\?\?\? D510F7C1 		test    \$0x7, %r17d
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
new file mode 100644
index 00000000000..c4d2308a604
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -0,0 +1,18 @@ 
+# Check Illegal 64bit APX_F instructions
+	.text
+	.arch .noapx_f
+	test    $0x7, %r17d
+	.arch .apx_f
+	test    $0x7, %r17d
+	xsave (%r16, %rbx)
+	xsave64 (%r16, %r31)
+	xrstor (%r16, %rbx)
+	xrstor64 (%r16, %rbx)
+	xsaves (%rbx, %r16)
+	xsaves64 (%r16, %rbx)
+	xrstors (%rbx, %r31)
+	xrstors64 (%r16, %rbx)
+	xsaveopt (%r16, %rbx)
+	xsaveopt64 (%r16, %r31)
+	xsavec (%r16, %rbx)
+	xsavec64 (%r16, %r31)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.d b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
new file mode 100644
index 00000000000..e3cd534da11
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
@@ -0,0 +1,83 @@ 
+#as:
+#objdump: -dw
+#name: x86-64 APX_F use gpr32 with rex2 prefix
+#source: x86-64-apx-rex2.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f6 c0 07[	 ]+test   \$0x7,%r24b
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 19 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24
+[	 ]*[a-f0-9]+:[	 ]*66 d5 11 f7 c0 07 00[	 ]+test   \$0x7,%r24w
+[	 ]*[a-f0-9]+:[	 ]*44 0f af f8[	 ]+imul   %eax,%r15d
+[	 ]*[a-f0-9]+:[	 ]*d5 c0 af c0[	 ]+imul   %eax,%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 90 62 12[	 ]+punpckldq %mm2,\(%r18\)
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 00[	 ]+lea    \(%rax\),%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 08[	 ]+lea    \(%rax\),%r17d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 10[	 ]+lea    \(%rax\),%r18d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 18[	 ]+lea    \(%rax\),%r19d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 20[	 ]+lea    \(%rax\),%r20d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 28[	 ]+lea    \(%rax\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 30[	 ]+lea    \(%rax\),%r22d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 38[	 ]+lea    \(%rax\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 00[	 ]+lea    \(%rax\),%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 08[	 ]+lea    \(%rax\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 10[	 ]+lea    \(%rax\),%r26d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 18[	 ]+lea    \(%rax\),%r27d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 20[	 ]+lea    \(%rax\),%r28d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 28[	 ]+lea    \(%rax\),%r29d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 30[	 ]+lea    \(%rax\),%r30d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 38[	 ]+lea    \(%rax\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r16,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r17,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r18,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r19,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r21,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r22,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r23,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r24,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r25,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r26,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r27,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r28,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r29,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r30,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r31,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 00[	 ]+lea    \(%r16\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 01[	 ]+lea    \(%r17\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 02[	 ]+lea    \(%r18\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 03[	 ]+lea    \(%r19\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 04 24       	lea    \(%r20\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 45 00       	lea    0x0\(%r21\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 06[	 ]+lea    \(%r22\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 07[	 ]+lea    \(%r23\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 00[	 ]+lea    \(%r24\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 01[	 ]+lea    \(%r25\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 02[	 ]+lea    \(%r26\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 03[	 ]+lea    \(%r27\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 04 24       	lea    \(%r28\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 45 00       	lea    0x0\(%r29\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 06          	lea    \(%r30\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 07          	lea    \(%r31\),%eax
+[	 ]*[a-f0-9]+:[	 ]*4c 8d 38             	lea    \(%rax\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 48 8d 00          	lea    \(%rax\),%r16
+[	 ]*[a-f0-9]+:[	 ]*49 8d 07             	lea    \(%r15\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 18 8d 00          	lea    \(%r16\),%rax
+[	 ]*[a-f0-9]+:[	 ]*4a 8d 04 3d 00 00 00 00 	lea    0x0\(,%r15,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 28 8d 04 05 00 00 00 00 	lea    0x0\(,%r16,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 00          	add    \(%r16\),%r8
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 38          	add    \(%r16\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 0d 00 00 00 00 	mov    0x0\(,%r9,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 35 00 00 00 00 	mov    0x0\(,%r14,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 3a          	sub    \(%r10\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 7d 00       	sub    0x0\(%r13\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 30 8d 44 20 01    	lea    0x1\(%r16,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 76 8d 7c 20 01    	lea    0x1\(%r16,%r28,1\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 12 8d 84 04 81 00 00 00 	lea    0x81\(%r20,%r8,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 57 8d bc 04 81 00 00 00 	lea    0x81\(%r28,%r8,1\),%r31d
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.s b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
new file mode 100644
index 00000000000..543f0f573d4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
@@ -0,0 +1,86 @@ 
+# Check 64bit instructions with rex2 prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+         test	$0x7, %r24b
+         test	$0x7, %r24d
+         test	$0x7, %r24
+         test	$0x7, %r24w
+## REX2.M bit
+         imull	%eax, %r15d
+         imull	%eax, %r16d
+         punpckldq (%r18), %mm2
+## REX2.R4 bit
+         leal	(%rax), %r16d
+         leal	(%rax), %r17d
+         leal	(%rax), %r18d
+         leal	(%rax), %r19d
+         leal	(%rax), %r20d
+         leal	(%rax), %r21d
+         leal	(%rax), %r22d
+         leal	(%rax), %r23d
+         leal	(%rax), %r24d
+         leal	(%rax), %r25d
+         leal	(%rax), %r26d
+         leal	(%rax), %r27d
+         leal	(%rax), %r28d
+         leal	(%rax), %r29d
+         leal	(%rax), %r30d
+         leal	(%rax), %r31d
+## REX2.X4 bit
+         leal	(,%r16), %eax
+         leal	(,%r17), %eax
+         leal	(,%r18), %eax
+         leal	(,%r19), %eax
+         leal	(,%r20), %eax
+         leal	(,%r21), %eax
+         leal	(,%r22), %eax
+         leal	(,%r23), %eax
+         leal	(,%r24), %eax
+         leal	(,%r25), %eax
+         leal	(,%r26), %eax
+         leal	(,%r27), %eax
+         leal	(,%r28), %eax
+         leal	(,%r29), %eax
+         leal	(,%r30), %eax
+         leal	(,%r31), %eax
+## REX.B4 bit
+         leal	(%r16), %eax
+         leal	(%r17), %eax
+         leal	(%r18), %eax
+         leal	(%r19), %eax
+         leal	(%r20), %eax
+         leal	(%r21), %eax
+         leal	(%r22), %eax
+         leal	(%r23), %eax
+         leal	(%r24), %eax
+         leal	(%r25), %eax
+         leal	(%r26), %eax
+         leal	(%r27), %eax
+         leal	(%r28), %eax
+         leal	(%r29), %eax
+         leal	(%r30), %eax
+         leal	(%r31), %eax
+## REX.W bit
+         leaq	(%rax), %r15
+         leaq	(%rax), %r16
+         leaq	(%r15), %rax
+         leaq	(%r16), %rax
+         leaq	(,%r15), %rax
+         leaq	(,%r16), %rax
+## REX.R3 bit
+         add    (%r16), %r8
+         add    (%r16), %r15
+## REX.X3 bit
+         mov    (,%r9), %r16
+         mov    (,%r14), %r16
+## REX.B3 bit
+	 sub   (%r10), %r31
+	 sub   (%r13), %r31
+
+## SIB
+         leal	1(%r16, %r20), %eax
+         leal	1(%r16, %r28), %r31d
+         leal	129(%r20, %r8), %eax
+         leal	129(%r28, %r8), %r31d
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
index 13ad0fb768f..256e1b9a370 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
@@ -1,10 +1,16 @@ 
 .*: Assembler messages:
 .*:2: Error: .*
 .*:3: Error: .*
+.*:6: Error: .*
+.*:7: Error: .*
 GAS LISTING .*
 
 
 [ 	]*1[ 	]+\.text
 [ 	]*2[ 	]+\{disp16\} movb \(%ebp\),%al
 [ 	]*3[ 	]+\{disp16\} movb \(%rbp\),%al
+[ 	]*4[ 	]+
+[ 	]*5[ 	]+.*
+[ 	]*6[ 	]+\{rex2\} xsave \(%r15, %rbx\)
+[ 	]*7[ 	]+\{rex2\} xsave64 \(%r15, %rbx\)
 #...
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
index c10b14c2099..ae30476e500 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
@@ -1,4 +1,8 @@ 
 	.text
 	{disp16} movb (%ebp),%al
 	{disp16} movb (%rbp),%al
+
+	/* Instruction not support APX.  */
+	{rex2} xsave (%r15, %rbx)
+	{rex2} xsave64 (%r15, %rbx)
 	.p2align 4,0
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
index 6ee5b2f95ce..03126541f24 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
@@ -11,11 +11,11 @@  Disassembly of section .text:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
 0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
 0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
 0+5 <aam0>:
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
index 12f02c1766c..0200f3dfd92 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
@@ -10,11 +10,11 @@  Disassembly of section .text:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
 0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
 0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
+[ 	]*[a-f0-9]+:	d5                   	rex2
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
 0+5 <aam0>:
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
index 3f9f67fcf4b..63aeb739b7d 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
@@ -4,3 +4,45 @@ 
 .*:5: Error: .*`vmovaps'.*
 .*:6: Error: .*`vmovaps'.*
 .*:7: Error: .*`rorx'.*
+.*:8: Error: .*`xsave'.*
+.*:9: Error: .*`xsaves'.*
+.*:10: Error: .*`xsaves64'.*
+.*:11: Error: .*`xsavec'.*
+.*:12: Error: .*`xrstors'.*
+.*:13: Error: .*`xrstors64'.*
+.*:17: Error: .*`mov'.*
+.*:18: Error: .*`movabs'.*
+.*:19: Error: .*`cmps'.*
+.*:20: Error: .*`lods'.*
+.*:21: Error: .*`lods'.*
+.*:22: Error: .*`lods'.*
+.*:23: Error: .*`movs'.*
+.*:24: Error: .*`movs'.*
+.*:25: Error: .*`scas'.*
+.*:26: Error: .*`scas'.*
+.*:27: Error: .*`scas'.*
+.*:28: Error: .*`stos'.*
+.*:29: Error: .*`stos'.*
+.*:30: Error: .*`stos'.*
+.*:33: Error: .*`jo'.*
+.*:34: Error: .*`jno'.*
+.*:35: Error: .*`jb'.*
+.*:36: Error: .*`jae'.*
+.*:37: Error: .*`je'.*
+.*:38: Error: .*`jne'.*
+.*:39: Error: .*`jbe'.*
+.*:40: Error: .*`ja'.*
+.*:41: Error: .*`js'.*
+.*:42: Error: .*`jns'.*
+.*:43: Error: .*`jp'.*
+.*:44: Error: .*`jnp'.*
+.*:45: Error: .*`jl'.*
+.*:46: Error: .*`jge'.*
+.*:47: Error: .*`jle'.*
+.*:48: Error: .*`jg'.*
+.*:51: Error: .*`wrmsr'.*
+.*:52: Error: .*`rdtsc'.*
+.*:53: Error: .*`rdmsr'.*
+.*:54: Error: .*`sysenter'.*
+.*:55: Error: .*`sysexit'.*
+.*:56: Error: .*`rdpmc'.*
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
index 3b923593a6a..91630eb589b 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
@@ -5,3 +5,52 @@  pseudos:
 	{rex} vmovaps %xmm7,%xmm2
 	{rex} vmovaps %xmm17,%xmm2
 	{rex} rorx $7,%eax,%ebx
+	{rex2} xsave (%rax)
+	{rex2} xsaves (%ecx)
+	{rex2} xsaves64 (%ecx)
+	{rex2} xsavec (%ecx)
+	{rex2} xrstors (%ecx)
+	{rex2} xrstors64 (%ecx)
+
+	#All opcodes in the row 0xa* prefixed REX2 are illegal.
+	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
+	{rex2} mov    0x90909090,%al
+	{rex2} movabs 0x1,%al
+	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
+	{rex2} lodsb
+	{rex2} lods   %ds:(%esi),%al
+	{rex2} lodsb   (%esi)
+	{rex2} movs
+	{rex2} movs   (%esi), (%edi)
+	{rex2} scasl
+	{rex2} scas   %es:(%edi),%eax
+	{rex2} scasb   (%edi)
+	{rex2} stosb
+	{rex2} stosb   (%edi)
+	{rex2} stos   %eax,%es:(%edi)
+
+	#All opcodes in the row 0x7* prefixed REX2 are illegal.
+	{rex2} jo     .+2-0x70
+	{rex2} jno    .+2-0x70
+	{rex2} jb     .+2-0x70
+	{rex2} jae    .+2-0x70
+	{rex2} je     .+2-0x70
+	{rex2} jne    .+2-0x70
+	{rex2} jbe    .+2-0x70
+	{rex2} ja     .+2-0x70
+	{rex2} js     .+2-0x70
+	{rex2} jns    .+2-0x70
+	{rex2} jp     .+2-0x70
+	{rex2} jnp    .+2-0x70
+	{rex2} jl     .+2-0x70
+	{rex2} jge    .+2-0x70
+	{rex2} jle    .+2-0x70
+	{rex2} jg     .+2-0x70
+
+	#All opcodes in the row 0xf3* prefixed REX2 are illegal.
+	{rex2} wrmsr
+	{rex2} rdtsc
+	{rex2} rdmsr
+	{rex2} sysenter
+	{rex2} sysexitl
+	{rex2} rdpmc
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 0cc75ef2457..708c22b5899 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -404,6 +404,18 @@  Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 d3 e0          	{rex2} shl %cl,%eax
+ +[a-f0-9]+:	d5 00 38 ca          	{rex2} cmp %cl,%dl
+ +[a-f0-9]+:	d5 00 b3 01          	{rex2} mov \$(0x)?1,%bl
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
@@ -458,6 +470,15 @@  Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index 08fac8381c6..29a0c3368fc 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -360,6 +360,19 @@  _start:
 	{rex} movaps (%r8),%xmm2
 	{rex} phaddw (%rcx),%mm0
 	{rex} phaddw (%r8),%mm0
+	{rex2} mov %al,%ah
+	{rex2} shl %cl, %eax
+	{rex2} cmp %cl, %dl
+	{rex2} mov $1, %bl
+	{rex2} movl %eax,%ebx
+	{rex2} movl %eax,%r14d
+	{rex2} movl %eax,(%r8)
+	{rex2} movaps %xmm7,%xmm2
+	{rex2} movaps %xmm7,%xmm12
+	{rex2} movaps (%rcx),%xmm2
+	{rex2} movaps (%r8),%xmm2
+	{rex2} pmullw %mm0,%mm6
+
 
 	movb (%rbp),%al
 	{disp8} movb (%rbp),%al
@@ -422,6 +435,15 @@  _start:
 	{rex} movaps xmm2,XMMWORD PTR [r8]
 	{rex} phaddw mm0,QWORD PTR [rcx]
 	{rex} phaddw mm0,QWORD PTR [r8]
+	{rex2} mov ah,al
+	{rex2} mov ebx,eax
+	{rex2} mov r14d,eax
+	{rex2} mov DWORD PTR [r8],eax
+	{rex2} movaps xmm2,xmm7
+	{rex2} movaps xmm12,xmm7
+	{rex2} movaps xmm2,XMMWORD PTR [rcx]
+	{rex2} movaps xmm2,XMMWORD PTR [r8]
+	{rex2} pmullw mm6,mm0
 
 	mov al, BYTE PTR [rbp]
 	{disp8} mov al, BYTE PTR [rbp]
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 52711cdcf6f..a698a467c53 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -360,6 +360,8 @@  run_dump_test "x86-64-avx512f-rcigrne-intel"
 run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
+run_list_test "x86-64-apx-egpr-inval" "-al"
+run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/include/opcode/i386.h b/include/opcode/i386.h
index dec7652c1cc..a6af3d54da0 100644
--- a/include/opcode/i386.h
+++ b/include/opcode/i386.h
@@ -112,6 +112,8 @@ 
 /* x86-64 extension prefix.  */
 #define REX_OPCODE	0x40
 
+#define REX2_OPCODE	0xd5
+
 /* Non-zero if OPCODE is the rex prefix.  */
 #define REX_PREFIX_P(opcode) (((opcode) & 0xf0) == REX_OPCODE)
 
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 87ecf0f5e23..22c450ac414 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -144,6 +144,11 @@  struct instr_info
   /* Bits of REX we've already used.  */
   uint8_t rex_used;
 
+  /* REX2 prefix for the current instruction use gpr32(r16-r31). */
+  unsigned char rex2;
+  /* Bits of REX2 we've already used.  */
+  unsigned char rex2_used;
+
   bool need_modrm;
   unsigned char need_vex;
   bool has_sib;
@@ -169,6 +174,7 @@  struct instr_info
   signed char last_data_prefix;
   signed char last_addr_prefix;
   signed char last_rex_prefix;
+  signed char last_rex2_prefix;
   signed char last_seg_prefix;
   signed char fwait_prefix;
   /* The active segment register prefix.  */
@@ -269,9 +275,17 @@  struct dis_private {
       ins->rex_used |= REX_OPCODE;			\
   }
 
+#define USED_REX2(value)				\
+  {							\
+    if ((ins->rex2 & value))				\
+      ins->rex2_used |= value;				\
+  }
+
 
 #define EVEX_b_used 1
 #define EVEX_len_used 2
+/* M0 in rex2 prefix represents map0 or map1.  */
+#define REX2_M 0x8
 
 /* Flags stored in PREFIXES.  */
 #define PREFIX_REPZ 1
@@ -286,6 +300,7 @@  struct dis_private {
 #define PREFIX_DATA 0x200
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
+#define PREFIX_REX2 0x1000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -367,6 +382,7 @@  fetch_error (const instr_info *ins)
 #define PREFIX_IGNORED_DATA	(PREFIX_DATA << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_ADDR	(PREFIX_ADDR << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_LOCK	(PREFIX_LOCK << PREFIX_IGNORED_SHIFT)
+#define PREFIX_REX2_ILLEGAL	(PREFIX_REX2 << PREFIX_IGNORED_SHIFT)
 
 /* Opcode prefixes.  */
 #define PREFIX_OPCODE		(PREFIX_REPZ \
@@ -1872,23 +1888,23 @@  static const struct dis386 dis386[] = {
   { "outs{b|}",		{ indirDXr, Xb }, 0 },
   { X86_64_TABLE (X86_64_6F) },
   /* 70 */
-  { "joH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 78 */
-  { "jsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 80 */
   { REG_TABLE (REG_80) },
   { REG_TABLE (REG_81) },
@@ -1926,23 +1942,23 @@  static const struct dis386 dis386[] = {
   { "sahf",		{ XX }, 0 },
   { "lahf",		{ XX }, 0 },
   /* a0 */
-  { "mov%LB",		{ AL, Ob }, 0 },
-  { "mov%LS",		{ eAX, Ov }, 0 },
-  { "mov%LB",		{ Ob, AL }, 0 },
-  { "mov%LS",		{ Ov, eAX }, 0 },
-  { "movs{b|}",		{ Ybr, Xb }, 0 },
-  { "movs{R|}",		{ Yvr, Xv }, 0 },
-  { "cmps{b|}",		{ Xb, Yb }, 0 },
-  { "cmps{R|}",		{ Xv, Yv }, 0 },
+  { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
+  { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "movs{R|}",		{ Yvr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "cmps{b|}",		{ Xb, Yb }, PREFIX_REX2_ILLEGAL },
+  { "cmps{R|}",		{ Xv, Yv }, PREFIX_REX2_ILLEGAL },
   /* a8 */
-  { "testB",		{ AL, Ib }, 0 },
-  { "testS",		{ eAX, Iv }, 0 },
-  { "stosB",		{ Ybr, AL }, 0 },
-  { "stosS",		{ Yvr, eAX }, 0 },
-  { "lodsB",		{ ALr, Xb }, 0 },
-  { "lodsS",		{ eAXr, Xv }, 0 },
-  { "scasB",		{ AL, Yb }, 0 },
-  { "scasS",		{ eAX, Yv }, 0 },
+  { "testB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "testS",		{ eAX, Iv }, PREFIX_REX2_ILLEGAL },
+  { "stosB",		{ Ybr, AL }, PREFIX_REX2_ILLEGAL },
+  { "stosS",		{ Yvr, eAX }, PREFIX_REX2_ILLEGAL },
+  { "lodsB",		{ ALr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "lodsS",		{ eAXr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
+  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
   /* b0 */
   { "movB",		{ RMAL, Ib }, 0 },
   { "movB",		{ RMCL, Ib }, 0 },
@@ -2091,12 +2107,12 @@  static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F2E) },
   { PREFIX_TABLE (PREFIX_0F2F) },
   /* 30 */
-  { "wrmsr",		{ XX }, 0 },
-  { "rdtsc",		{ XX }, 0 },
-  { "rdmsr",		{ XX }, 0 },
-  { "rdpmc",		{ XX }, 0 },
-  { "sysenter",		{ SEP }, 0 },
-  { "sysexit%LQ",	{ SEP }, 0 },
+  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
+  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
   { Bad_Opcode },
   { "getsec",		{ XX }, 0 },
   /* 38 */
@@ -2390,22 +2406,30 @@  static const char intel_index16[][6] = {
 
 static const char att_names64[][8] = {
   "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi",
-  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15"
+  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15",
+  "%r16", "%r17", "%r18", "%r19", "%r20", "%r21", "%r22", "%r23",
+  "%r24", "%r25", "%r26", "%r27", "%r28", "%r29", "%r30", "%r31"
 };
 static const char att_names32[][8] = {
   "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi",
-  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d"
+  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d",
+  "%r16d", "%r17d", "%r18d", "%r19d", "%r20d", "%r21d", "%r22d", "%r23d",
+  "%r24d", "%r25d", "%r26d", "%r27d", "%r28d", "%r29d", "%r30d", "%r31d"
 };
 static const char att_names16[][8] = {
   "%ax", "%cx", "%dx", "%bx", "%sp", "%bp", "%si", "%di",
-  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w"
+  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w",
+  "%r16w", "%r17w", "%r18w", "%r19w", "%r20w", "%r21w", "%r22w", "%r23w",
+  "%r24w", "%r25w", "%r26w", "%r27w", "%r28w", "%r29w", "%r30w", "%r31w"
 };
 static const char att_names8[][8] = {
   "%al", "%cl", "%dl", "%bl", "%ah", "%ch", "%dh", "%bh",
 };
 static const char att_names8rex[][8] = {
   "%al", "%cl", "%dl", "%bl", "%spl", "%bpl", "%sil", "%dil",
-  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b"
+  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b",
+  "%r16b", "%r17b", "%r18b", "%r19b", "%r20b", "%r21b", "%r22b", "%r23b",
+  "%r24b", "%r25b", "%r26b", "%r27b", "%r28b", "%r29b", "%r30b", "%r31b"
 };
 static const char att_names_seg[][4] = {
   "%es", "%cs", "%ss", "%ds", "%fs", "%gs", "%?", "%?",
@@ -2794,9 +2818,9 @@  static const struct dis386 reg_table[][8] = {
     { Bad_Opcode },
     { "cmpxchg8b", { { CMPXCHG8B_Fixup, q_mode } }, 0 },
     { Bad_Opcode },
-    { "xrstors", { FXSAVE }, 0 },
-    { "xsavec", { FXSAVE }, 0 },
-    { "xsaves", { FXSAVE }, 0 },
+    { "xrstors", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsavec", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsaves", { FXSAVE }, PREFIX_REX2_ILLEGAL },
     { MOD_TABLE (MOD_0FC7_REG_6) },
     { MOD_TABLE (MOD_0FC7_REG_7) },
   },
@@ -3364,7 +3388,7 @@  static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_4_MOD_0 */
   {
-    { "xsave",	{ FXSAVE }, 0 },
+    { "xsave",	{ FXSAVE }, PREFIX_REX2_ILLEGAL },
     { "ptwrite{%LQ|}", { Edq }, 0 },
   },
 
@@ -3382,7 +3406,7 @@  static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_6_MOD_0 */
   {
-    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE },
+    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { "clrssbsy",	{ Mq }, PREFIX_OPCODE },
     { "clwb",	{ Mb }, PREFIX_OPCODE },
   },
@@ -8125,7 +8149,7 @@  static const struct dis386 mod_table[][2] = {
   },
   {
     /* MOD_0FAE_REG_5 */
-    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE },
+    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { PREFIX_TABLE (PREFIX_0FAE_REG_5_MOD_3) },
   },
   {
@@ -8289,6 +8313,7 @@  ckprefix (instr_info *ins)
 {
   int i, length;
   uint8_t newrex;
+  unsigned char rex2_payload;
 
   i = 0;
   length = 0;
@@ -8323,6 +8348,24 @@  ckprefix (instr_info *ins)
 	    return ckp_okay;
 	  ins->last_rex_prefix = i;
 	  break;
+	/* REX2 must be the last prefix. */
+	case 0xd5:
+	  if (ins->address_mode == mode_64bit)
+	    {
+	      if (ins->last_rex_prefix >= 0)
+		return ckp_bogus;
+
+	      ins->codep++;
+	      if (!fetch_code (ins->info, ins->codep + 1))
+		return ckp_fetch_error;
+	      rex2_payload = *ins->codep;
+	      ins->rex2 = rex2_payload >> 4;
+	      ins->rex = (rex2_payload & 0xf) | REX_OPCODE;
+	      ins->codep++;
+	      ins->last_rex2_prefix = i;
+	      ins->all_prefixes[i] = REX2_OPCODE;
+	    }
+	  return ckp_okay;
 	case 0xf3:
 	  ins->prefixes |= PREFIX_REPZ;
 	  ins->last_repz_prefix = i;
@@ -8490,6 +8533,8 @@  prefix_name (enum address_mode mode, uint8_t pref, int sizeflag)
       return "bnd";
     case NOTRACK_PREFIX:
       return "notrack";
+    case REX2_OPCODE:
+      return "rex2";
     default:
       return NULL;
     }
@@ -9128,6 +9173,7 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     .last_data_prefix = -1,
     .last_addr_prefix = -1,
     .last_rex_prefix = -1,
+    .last_rex2_prefix = -1,
     .last_seg_prefix = -1,
     .fwait_prefix = -1,
   };
@@ -9292,13 +9338,17 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  if (*ins.codep == 0x0f)
+  /* M0 in rex2 prefix represents map0 or map1.  */
+  if (*ins.codep == 0x0f || (ins.rex2 & REX2_M))
     {
       unsigned char threebyte;
 
-      ins.codep++;
-      if (!fetch_code (info, ins.codep + 1))
-	goto fetch_error_out;
+      if (!ins.rex2)
+	{
+	  ins.codep++;
+	  if (!fetch_code (info, ins.codep + 1))
+	    goto fetch_error_out;
+	}
       threebyte = *ins.codep;
       dp = &dis386_twobyte[threebyte];
       ins.need_modrm = twobyte_has_modrm[threebyte];
@@ -9454,6 +9504,14 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
+  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
+      && ins.last_rex2_prefix >= 0)
+    {
+      i386_dis_printf (info, dis_style_text, "(bad)");
+      ret = ins.end_codep - priv.the_buffer;
+      goto out;
+    }
+
   switch (dp->prefix_requirement)
     {
     case PREFIX_DATA:
@@ -9468,6 +9526,7 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       ins.used_prefixes |= PREFIX_DATA;
       /* Fall through.  */
     case PREFIX_OPCODE:
+    case PREFIX_OPCODE | PREFIX_REX2_ILLEGAL:
       /* If the mandatory PREFIX_REPZ/PREFIX_REPNZ/PREFIX_DATA prefix is
 	 unused, opcode is invalid.  Since the PREFIX_DATA prefix may be
 	 used by putop and MMX/SSE operand and may be overridden by the
@@ -9513,6 +9572,13 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       && !ins.need_vex && ins.last_rex_prefix >= 0)
     ins.all_prefixes[ins.last_rex_prefix] = 0;
 
+  /* Check if the REX2 prefix is used.  */
+  if (ins.last_rex2_prefix >= 0
+      && ((((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
+	   && (ins.rex2 & 0x7))
+	  || dp == &bad_opcode))
+    ins.all_prefixes[ins.last_rex2_prefix] = 0;
+
   /* Check if the SEG prefix is used.  */
   if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
 		       | PREFIX_FS | PREFIX_GS)) != 0
@@ -9541,7 +9607,10 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
+	if (ins.all_prefixes[i] == REX2_OPCODE)
+	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
+	else
+	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
@@ -11086,8 +11155,11 @@  print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     ins->illegal_masking = true;
 
   USED_REX (rexmask);
+  USED_REX2 (rexmask);
   if (ins->rex & rexmask)
     reg += 8;
+  if (ins->rex2 & rexmask)
+    reg += 16;
 
   switch (bytemode)
     {
@@ -11310,6 +11382,8 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
   int riprel = 0;
   int shift;
 
+  add += (ins->rex2 & REX_B) ? 16 : 0;
+
   if (ins->vex.evex)
     {
 
@@ -11414,6 +11488,7 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
     shift = 0;
 
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->intel_syntax)
     intel_operand_size (ins, bytemode, sizeflag);
   append_seg (ins);
@@ -11444,8 +11519,11 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	{
 	  vindex = ins->sib.index;
 	  USED_REX (REX_X);
+	  USED_REX2 (REX_X);
 	  if (ins->rex & REX_X)
 	    vindex += 8;
+	  if (ins->rex2 & REX_X)
+	    vindex += 16;
 	  switch (bytemode)
 	    {
 	    case vex_vsib_d_w_dq_mode:
@@ -11866,7 +11944,7 @@  static bool
 OP_REG (instr_info *ins, int code, int sizeflag)
 {
   const char *s;
-  int add;
+  int add = 0;
 
   switch (code)
     {
@@ -11877,10 +11955,11 @@  OP_REG (instr_info *ins, int code, int sizeflag)
     }
 
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->rex & REX_B)
     add = 8;
-  else
-    add = 0;
+  if (ins->rex2 & REX_B)
+    add += 16;
 
   switch (code)
     {
@@ -12590,8 +12669,11 @@  OP_EX (instr_info *ins, int bytemode, int sizeflag)
 
   reg = ins->modrm.rm;
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->rex & REX_B)
     reg += 8;
+  if (ins->rex2 & REX_B)
+    reg += 16;
   if (ins->vex.evex)
     {
       USED_REX (REX_X);
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index cfc5a7a6172..589f9682699 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -259,6 +259,8 @@  static const dependency isa_dependencies[] =
     "SSE2" },
   { "WIDEKL",
     "KL" },
+  { "APX_F",
+    "XSAVE" },
 };
 
 /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
@@ -380,6 +382,7 @@  static bitfield cpu_flags[] =
   BITFIELD (RAO_INT),
   BITFIELD (FRED),
   BITFIELD (LKGS),
+  BITFIELD (APX_F),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -469,6 +472,7 @@  static bitfield opcode_modifiers[] =
   BITFIELD (ATTSyntax),
   BITFIELD (IntelSyntax),
   BITFIELD (ISA64),
+  BITFIELD (NoEgpr),
 };
 
 #define CLASS(n) #n, n
@@ -1008,10 +1012,35 @@  get_element_size (char **opnd, int lineno)
   return elem_size;
 }
 
+static bool
+if_entry_needs_special_handle (const unsigned long long opcode, unsigned int space,
+			       const char *cpu_flags)
+{
+  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
+  if (strcmp (cpu_flags, "XSAVES") >= 0
+      || strcmp (cpu_flags, "XSAVEC") >= 0
+      || strcmp (cpu_flags, "Xsave") >= 0
+      || strcmp (cpu_flags, "Xsaveopt") >= 0
+      || !strcmp (cpu_flags, "3dnow")
+      || !strcmp (cpu_flags, "3dnowA"))
+    return true;
+
+  /* All opcodes listed map0 0x4*, 0x7*, 0xa* and map0 0x3*, 0x8*
+     are reserved under REX2 and triggers #UD when prefixed with REX2 */
+  if ((space == 0 && (opcode >> 4 == 0x4
+		      || opcode >> 4 == 0x7
+		      || opcode >> 4 == 0xA))
+      || (space == SPACE_0F && (opcode >> 4 == 0x3
+				|| opcode >> 4 == 0x8)))
+    return true;
+
+  return false;
+}
+
 static void
 process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 			      unsigned int prefix, const char *extension_opcode,
-			      char **opnd, int lineno)
+			      char **opnd, int lineno, bool has_special_handle)
 {
   char *str, *next, *last;
   bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
@@ -1119,6 +1148,18 @@  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 	fprintf (stderr,
 		 "%s: %d: W modifier without Word/Dword/Qword operand(s)\n",
 		 filename, lineno);
+
+      /* The part about judging EVEX encoding should be synchronized with
+	 is_evex_encoding.  */
+      if (modifiers[Vex].value
+	  || ((space > SPACE_0F || has_special_handle)
+	      && !modifiers[EVex].value
+	      && !modifiers[Disp8MemShift].value
+	      && !modifiers[Broadcast].value
+	      && !modifiers[Masking].value
+	      && !modifiers[SAE].value))
+	modifiers[NoEgpr].value = 1;
+
     }
 
   if (space >= ARRAY_SIZE (spaces) || !spaces[space])
@@ -1350,8 +1391,11 @@  output_i386_opcode (FILE *table, const char *name, char *str,
 	   ident, 2 * (int)length, opcode, end, i);
   free (ident);
 
+  /* Add some specilal handle for current entry.  */
+  bool  has_special_handle = if_entry_needs_special_handle (opcode, space, cpu_flags);
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
-				extension_opcode, operand_types, lineno);
+				extension_opcode, operand_types, lineno,
+				has_special_handle);
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 149ae0e950c..c8082971f81 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -317,6 +317,8 @@  enum i386_cpu
   CpuAVX512F,
   /* Intel AVX-512 VL Instructions support required.  */
   CpuAVX512VL,
+  /* Intel APX_F Instructions support required.  */
+  CpuAPX_F,
   /* Not supported in the 64bit mode  */
   CpuNo64,
 
@@ -352,6 +354,7 @@  enum i386_cpu
 		   cpuhle:1, \
 		   cpuavx512f:1, \
 		   cpuavx512vl:1, \
+		   cpuapx_f:1, \
       /* NOTE: This field needs to remain last. */ \
 		   cpuno64:1
 
@@ -742,6 +745,10 @@  enum
 #define INTEL64		2
 #define INTEL64ONLY	3
   ISA64,
+
+  /* egprs (r16-r31) on instruction illegal.  */
+  NoEgpr,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -789,6 +796,7 @@  typedef struct i386_opcode_modifier
   unsigned int attsyntax:1;
   unsigned int intelsyntax:1;
   unsigned int isa64:2;
+  unsigned int noegpr:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -1001,7 +1009,8 @@  typedef struct insn_template
 #define Prefix_VEX3		6	/* {vex3} */
 #define Prefix_EVEX		7	/* {evex} */
 #define Prefix_REX		8	/* {rex} */
-#define Prefix_NoOptimize	9	/* {nooptimize} */
+#define Prefix_REX2		9	/* {rex2} */
+#define Prefix_NoOptimize	10	/* {nooptimize} */
 
   /* the bits in opcode_modifier are used to generate the final opcode from
      the base_opcode.  These bits also are used to detect alternate forms of
@@ -1028,6 +1037,7 @@  typedef struct
 #define RegRex	    0x1  /* Extended register.  */
 #define RegRex64    0x2  /* Extended 8 bit register.  */
 #define RegVRex	    0x4  /* Extended vector register.  */
+#define RegRex2	    0x8  /* Extended GPRs R16–R31 register.  */
   unsigned char reg_num;
 #define RegIP	((unsigned char ) ~0)
 /* EIZ and RIZ are fake index registers.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index e60184ba154..17be21fdf0e 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -891,7 +891,7 @@  rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
 <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
 
 {<pseudopfx>}, PSEUDO_PREFIX/Prefix_<pseudopfx:ident>, <pseudopfx:cpu>, NoSuf|IsPrefix, {}
 
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index 2ac56e3fd0b..8fead35e320 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -43,6 +43,22 @@  r12b, Class=Reg|Byte, RegRex|RegRex64, 4, Dw2Inval, Dw2Inval
 r13b, Class=Reg|Byte, RegRex|RegRex64, 5, Dw2Inval, Dw2Inval
 r14b, Class=Reg|Byte, RegRex|RegRex64, 6, Dw2Inval, Dw2Inval
 r15b, Class=Reg|Byte, RegRex|RegRex64, 7, Dw2Inval, Dw2Inval
+r16b, Class=Reg|Byte, RegRex2|RegRex64, 0, Dw2Inval, Dw2Inval
+r17b, Class=Reg|Byte, RegRex2|RegRex64, 1, Dw2Inval, Dw2Inval
+r18b, Class=Reg|Byte, RegRex2|RegRex64, 2, Dw2Inval, Dw2Inval
+r19b, Class=Reg|Byte, RegRex2|RegRex64, 3, Dw2Inval, Dw2Inval
+r20b, Class=Reg|Byte, RegRex2|RegRex64, 4, Dw2Inval, Dw2Inval
+r21b, Class=Reg|Byte, RegRex2|RegRex64, 5, Dw2Inval, Dw2Inval
+r22b, Class=Reg|Byte, RegRex2|RegRex64, 6, Dw2Inval, Dw2Inval
+r23b, Class=Reg|Byte, RegRex2|RegRex64, 7, Dw2Inval, Dw2Inval
+r24b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 0, Dw2Inval, Dw2Inval
+r25b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 1, Dw2Inval, Dw2Inval
+r26b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 2, Dw2Inval, Dw2Inval
+r27b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 3, Dw2Inval, Dw2Inval
+r28b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 4, Dw2Inval, Dw2Inval
+r29b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 5, Dw2Inval, Dw2Inval
+r30b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 6, Dw2Inval, Dw2Inval
+r31b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 7, Dw2Inval, Dw2Inval
 // 16 bit regs
 ax, Class=Reg|Instance=Accum|Word, 0, 0, Dw2Inval, Dw2Inval
 cx, Class=Reg|Word, 0, 1, Dw2Inval, Dw2Inval
@@ -60,6 +76,22 @@  r12w, Class=Reg|Word, RegRex, 4, Dw2Inval, Dw2Inval
 r13w, Class=Reg|Word, RegRex, 5, Dw2Inval, Dw2Inval
 r14w, Class=Reg|Word, RegRex, 6, Dw2Inval, Dw2Inval
 r15w, Class=Reg|Word, RegRex, 7, Dw2Inval, Dw2Inval
+r16w, Class=Reg|Word, RegRex2, 0, Dw2Inval, Dw2Inval
+r17w, Class=Reg|Word, RegRex2, 1, Dw2Inval, Dw2Inval
+r18w, Class=Reg|Word, RegRex2, 2, Dw2Inval, Dw2Inval
+r19w, Class=Reg|Word, RegRex2, 3, Dw2Inval, Dw2Inval
+r20w, Class=Reg|Word, RegRex2, 4, Dw2Inval, Dw2Inval
+r21w, Class=Reg|Word, RegRex2, 5, Dw2Inval, Dw2Inval
+r22w, Class=Reg|Word, RegRex2, 6, Dw2Inval, Dw2Inval
+r23w, Class=Reg|Word, RegRex2, 7, Dw2Inval, Dw2Inval
+r24w, Class=Reg|Word, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25w, Class=Reg|Word, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26w, Class=Reg|Word, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27w, Class=Reg|Word, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28w, Class=Reg|Word, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29w, Class=Reg|Word, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30w, Class=Reg|Word, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31w, Class=Reg|Word, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 // 32 bit regs
 eax, Class=Reg|Instance=Accum|Dword|BaseIndex, 0, 0, 0, Dw2Inval
 ecx, Class=Reg|Instance=RegC|Dword|BaseIndex, 0, 1, 1, Dw2Inval
@@ -77,6 +109,22 @@  r12d, Class=Reg|Dword|BaseIndex, RegRex, 4, Dw2Inval, Dw2Inval
 r13d, Class=Reg|Dword|BaseIndex, RegRex, 5, Dw2Inval, Dw2Inval
 r14d, Class=Reg|Dword|BaseIndex, RegRex, 6, Dw2Inval, Dw2Inval
 r15d, Class=Reg|Dword|BaseIndex, RegRex, 7, Dw2Inval, Dw2Inval
+r16d, Class=Reg|Dword|BaseIndex, RegRex2, 0, Dw2Inval, Dw2Inval
+r17d, Class=Reg|Dword|BaseIndex, RegRex2, 1, Dw2Inval, Dw2Inval
+r18d, Class=Reg|Dword|BaseIndex, RegRex2, 2, Dw2Inval, Dw2Inval
+r19d, Class=Reg|Dword|BaseIndex, RegRex2, 3, Dw2Inval, Dw2Inval
+r20d, Class=Reg|Dword|BaseIndex, RegRex2, 4, Dw2Inval, Dw2Inval
+r21d, Class=Reg|Dword|BaseIndex, RegRex2, 5, Dw2Inval, Dw2Inval
+r22d, Class=Reg|Dword|BaseIndex, RegRex2, 6, Dw2Inval, Dw2Inval
+r23d, Class=Reg|Dword|BaseIndex, RegRex2, 7, Dw2Inval, Dw2Inval
+r24d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 rax, Class=Reg|Instance=Accum|Qword|BaseIndex, 0, 0, Dw2Inval, 0
 rcx, Class=Reg|Instance=RegC|Qword|BaseIndex, 0, 1, Dw2Inval, 2
 rdx, Class=Reg|Instance=RegD|Qword|BaseIndex, 0, 2, Dw2Inval, 1
@@ -93,6 +141,22 @@  r12, Class=Reg|Qword|BaseIndex, RegRex, 4, Dw2Inval, 12
 r13, Class=Reg|Qword|BaseIndex, RegRex, 5, Dw2Inval, 13
 r14, Class=Reg|Qword|BaseIndex, RegRex, 6, Dw2Inval, 14
 r15, Class=Reg|Qword|BaseIndex, RegRex, 7, Dw2Inval, 15
+r16, Class=Reg|Qword|BaseIndex, RegRex2, 0, Dw2Inval, 130
+r17, Class=Reg|Qword|BaseIndex, RegRex2, 1, Dw2Inval, 131
+r18, Class=Reg|Qword|BaseIndex, RegRex2, 2, Dw2Inval, 132
+r19, Class=Reg|Qword|BaseIndex, RegRex2, 3, Dw2Inval, 133
+r20, Class=Reg|Qword|BaseIndex, RegRex2, 4, Dw2Inval, 134
+r21, Class=Reg|Qword|BaseIndex, RegRex2, 5, Dw2Inval, 135
+r22, Class=Reg|Qword|BaseIndex, RegRex2, 6, Dw2Inval, 136
+r23, Class=Reg|Qword|BaseIndex, RegRex2, 7, Dw2Inval, 137
+r24, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, 138
+r25, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, 139
+r26, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, 140
+r27, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, 141
+r28, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, 142
+r29, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, 143
+r30, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, 144
+r31, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, 145
 // Vector mask registers.
 k0, Class=RegMask, 0, 0, 93, 118
 k1, Class=RegMask, 0, 1, 94, 119