[v4,1/9] Support APX GPR32 with rex2 prefix

Message ID 20231219121218.974012-2-lili.cui@intel.com
State New
Headers
Series Support Intel APX EGPR |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply

Commit Message

Cui, Lili Dec. 19, 2023, 12:12 p.m. UTC
  APX uses the REX2 prefix to support EGPR for map0 and map1 of legacy
instructions. We added the NoEgpr flag in i386-gen.c for instructions
that do not support EGPR.

gas/ChangeLog:

2023-12-19  Lingling Kong <lingling.kong@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Lili Cui <lili.cui@intel.com>
	    Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c
	(enum i386_error): Add unsupported_EGPR_for_addressing
	and invalid_pseudo_prefix.
	(struct _i386_insn): Add rex2 and rex2_encoding for
	gpr32.
	(cpu_arch): Add apx_f.
	(is_cpu): Ditto.
	(register_number): Handle RegRex2 for gpr32.
	(is_apx_rex2_encoding): New func. Test rex2 prefix encoding.
	(build_rex2_prefix): New func. Build legacy insn in
	opcode 0/1 use gpr32 with rex2 prefix.
	(optimize_encoding): Handel add r16-r31 for registers.
	(md_assemble): Handle apx encoding.
	(parse_insn): Handle Prefix_REX2.
	(check_EgprOperands): New func. Check if Egprs operands
	are valid for the instruction
	(match_template):  Handle Egpr operands check.
	(set_rex_rex2):  New func. set i.rex and i.rex2.
	(build_modrm_byte): Ditto.
	(output_insn): Handle rex2 2-byte prefix output.
	(check_register): Handle check egpr illegal without
	target apx, 64-bit mode and with rex_prefix.
	* doc/c-i386.texi: Document .apx.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d: D5 valid
	in 64-bit mode.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-inval-pseudo.l: Add rex2 invalid testcase.
	* testsuite/gas/i386/x86-64-inval-pseudo.s: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos-bad.l: Add illegal rex2 test.
	* testsuite/gas/i386/x86-64-pseudos-bad.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add rex2 test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Run APX tests.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.d: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.s: New test.

include/ChangeLog:

	* opcode/i386.h (REX2_OPCODE): Add REX2_OPCODE.

opcodes/ChangeLog:

	* i386-dis.c (struct instr_info): Add erex for gpr32.
	Add last_erex_prefix for rex2 prefix.
	(REX2_M): Extend for gpr32.
	(PREFIX_REX2): Ditto.
	(PREFIX_REX2_ILLEGAL): Ditto.
	(ckprefix): Ditto.
	(prefix_name): Ditto.
	(print_insn): Ditto.
	(print_register): Ditto.
	(OP_E_memory): Ditto.
	(OP_REG): Ditto.
	(OP_EX): Ditto.
	* i386-gen.c (rex2_disallowed): Some instructions are not allowed rex2 prefix.
	(process_i386_opcode_modifier): Set NoEgpr for VEX and some special instructions.
	(output_i386_opcode): Handle if_entry_needs_special_handle.
	* i386-init.h : Regenerated.
	* i386-mnem.h : Regenerated.
	* i386-opc.h (enum i386_cpu): Add CpuAPX_F.
	(NoEgpr): New.
	(Prefix_NoOptimize): Ditto.
	(Prefix_REX2): Ditto.
	(RegRex2): Ditto.
	* i386-opc.tbl: Add rex2 prefix.
	* i386-reg.tbl: Add egprs (r16-r31).
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          | 187 ++++++++++--
 gas/doc/c-i386.texi                           |   7 +-
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |  47 +---
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |  47 +---
 .../gas/i386/x86-64-apx-egpr-inval.l          |  15 +
 .../gas/i386/x86-64-apx-egpr-inval.s          |  18 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 ++++++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 ++++++
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
 .../gas/i386/x86-64-opcode-inval-intel.d      |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.s  |   4 -
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  75 ++++-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  74 +++++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  21 ++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  21 ++
 gas/testsuite/gas/i386/x86-64.exp             |   2 +
 include/opcode/i386.h                         |   2 +
 opcodes/i386-dis.c                            | 265 ++++++++++++------
 opcodes/i386-gen.c                            |  50 +++-
 opcodes/i386-opc.h                            |  13 +-
 opcodes/i386-opc.tbl                          |  27 +-
 opcodes/i386-reg.tbl                          |  64 +++++
 24 files changed, 907 insertions(+), 263 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s
  

Comments

Jan Beulich Dec. 22, 2023, 1:08 p.m. UTC | #1
On 19.12.2023 13:12, Cui, Lili wrote:
> @@ -5283,6 +5321,9 @@ md_assemble (char *line)
>  	case unsupported_syntax:
>  	  err_msg = _("unsupported syntax");
>  	  break;
> +	case unsupported_EGPR_for_addressing:
> +	  err_msg = _("extended GPR cannot be used as base/index");
> +	  break;

While this one's now suitable for the as_bad() below the switch, ...

> @@ -5336,6 +5377,9 @@ md_assemble (char *line)
>  	case invalid_dest_and_src_register_set:
>  	  err_msg = _("destination and source registers must be distinct");
>  	  break;
> +	case invalid_pseudo_prefix:
> +	  err_msg = _("rex2 pseudo prefix cannot be used here");
> +	  break;

... this one still doesn't really fit the "... for `<insn>'" there. At
least the "here" needs dropping.

> @@ -5630,11 +5681,12 @@ md_assemble (char *line)
>  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
>        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
>  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && i.rex != 0))
> +	  && (i.rex != 0 || i.rex2 != 0)))
>      {
>        int x;
>  
> -      i.rex |= REX_OPCODE;
> +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
> +	i.rex |= REX_OPCODE;
>        for (x = 0; x < 2; x++)
>  	{
>  	  /* Look for 8 bit operand that uses old registers.  */
> @@ -5645,7 +5697,7 @@ md_assemble (char *line)
>  	      /* In case it is "hi" register, give up.  */
>  	      if (i.op[x].regs->reg_num > 3)
>  		as_bad (_("can't encode register '%s%s' in an "
> -			  "instruction requiring REX prefix."),
> +			  "instruction requiring REX/REX2 prefix."),
>  			register_prefix, i.op[x].regs->reg_name);
>  
>  	      /* Otherwise it is equivalent to the extended register.
> @@ -5657,11 +5709,11 @@ md_assemble (char *line)
>  	}
>      }
>  
> -  if (i.rex == 0 && i.rex_encoding)
> +  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))
>      {
>        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
>  	 that uses legacy register.  If it is "hi" register, don't add
> -	 the REX_OPCODE byte.  */
> +	 rex and rex2 prefix.  */
>        int x;
>        for (x = 0; x < 2; x++)
>  	if (i.types[x].bitfield.class == Reg
> @@ -5671,6 +5723,7 @@ md_assemble (char *line)
>  	  {
>  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
>  	    i.rex_encoding = false;
> +	    i.rex2_encoding = false;
>  	    break;
>  	  }
>  
> @@ -5678,7 +5731,13 @@ md_assemble (char *line)
>  	i.rex = REX_OPCODE;
>      }
>  
> -  if (i.rex != 0)
> +  if (is_apx_rex2_encoding ())
> +    {
> +      build_rex2_prefix ();
> +      /* The individual REX.RXBW bits got consumed.  */
> +      i.rex &= REX_OPCODE;
> +    }
> +  else if (i.rex != 0)
>      add_prefix (REX_OPCODE | i.rex);
>  
>    insert_lfence_before (last_insn);

All of this will need re-basing over "x86: properly respect rex/{rex}",
with the result (I hope) that .insn will then also be covered REX2-wise.

> @@ -5752,6 +5811,20 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
>  	    goto too_long;
>  	  *mnem_p = '\0';
>  
> +	  /* Point l at the closing brace if there's no other separator.  */
> +	  if (*l != END_OF_INSN && !is_space_char (*l)
> +	      && *l != PREFIX_SEPARATOR)
> +	    --l;
> +	}
> +      /* Skip the immediate 0x** of {rex2 0x00} prefix.  */
> +      else if (*mnemonic == '{'&& is_space_char (*l))

Nit: Missing blank.

> +	{
> +	  while ( *l != '}')

Nit: Stray blank.

> +	    ++l;

What if there's no '}' on the line?

> +	  *mnem_p++ = *l++;
> +	  if (mnem_p >= mnemonic + MAX_MNEM_SIZE)
> +	    goto too_long;
> +	  *mnem_p = '\0';

You skip everything, not just 0xNN. You also skip stuff after other pseudo
prefixes, if I'm not mistaken. That's both too lax. However, skipping
isn't an option here anyway. Either we actually respect what the user has
written, or we error out. Skipping therefore is an option only if the
provided expression (not just plain number!) evaluates to 0. I don't
understand anyway why this code was added: When I asked about the specific
plans, H.J. clearly said the form with a constant would be a disassembler-
only thing for now.

> @@ -7005,6 +7082,43 @@ VEX_check_encoding (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if Egprs operands are valid for the instruction.  */
> +
> +static int
> +check_EgprOperands (const insn_template *t)

Hmm, I thought I had asked before to make functions with boolean return
values have a return type of bool, and then use "true" for success. An
alternative would be to return the error indicator, rather than putting
it in i.error here.

Then again I realize this is in line with VEX_check_encoding() and
check_VecOperands() (which I think would better be changed, but anyway).

> @@ -7149,6 +7263,14 @@ match_template (char mnem_suffix)
>  	      continue;
>  	    }
>  
> +	  /* Check if pseudo prefix {rex2} is valid.  */
> +	  if (t->opcode_modifier.noegpr && i.rex2_encoding)
> +	    {
> +	      i.error = invalid_pseudo_prefix;

What is this needed for? I.e. why can't you pass the value ...

> +	      specific_error = progress (i.error);

... in here directly (as is done elsewhere as well)?

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> @@ -0,0 +1,86 @@
> +# Check 64bit instructions with rex2 prefix encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +         test	$0x7, %r24b
> +         test	$0x7, %r24d
> +         test	$0x7, %r24
> +         test	$0x7, %r24w
> +## REX2.M bit
> +         imull	%eax, %r15d
> +         imull	%eax, %r16d
> +         punpckldq (%r18), %mm2
> +## REX2.R4 bit
> +         leal	(%rax), %r16d
> +         leal	(%rax), %r17d
> +         leal	(%rax), %r18d
> +         leal	(%rax), %r19d
> +         leal	(%rax), %r20d
> +         leal	(%rax), %r21d
> +         leal	(%rax), %r22d
> +         leal	(%rax), %r23d
> +         leal	(%rax), %r24d
> +         leal	(%rax), %r25d
> +         leal	(%rax), %r26d
> +         leal	(%rax), %r27d
> +         leal	(%rax), %r28d
> +         leal	(%rax), %r29d
> +         leal	(%rax), %r30d
> +         leal	(%rax), %r31d
> +## REX2.X4 bit
> +         leal	(,%r16), %eax
> +         leal	(,%r17), %eax
> +         leal	(,%r18), %eax
> +         leal	(,%r19), %eax
> +         leal	(,%r20), %eax
> +         leal	(,%r21), %eax
> +         leal	(,%r22), %eax
> +         leal	(,%r23), %eax
> +         leal	(,%r24), %eax
> +         leal	(,%r25), %eax
> +         leal	(,%r26), %eax
> +         leal	(,%r27), %eax
> +         leal	(,%r28), %eax
> +         leal	(,%r29), %eax
> +         leal	(,%r30), %eax
> +         leal	(,%r31), %eax
> +## REX2.B4 bit
> +         leal	(%r16), %eax
> +         leal	(%r17), %eax
> +         leal	(%r18), %eax
> +         leal	(%r19), %eax
> +         leal	(%r20), %eax
> +         leal	(%r21), %eax
> +         leal	(%r22), %eax
> +         leal	(%r23), %eax
> +         leal	(%r24), %eax
> +         leal	(%r25), %eax
> +         leal	(%r26), %eax
> +         leal	(%r27), %eax
> +         leal	(%r28), %eax
> +         leal	(%r29), %eax
> +         leal	(%r30), %eax
> +         leal	(%r31), %eax
> +## REX2.W bit
> +         leaq	(%rax), %r15
> +         leaq	(%rax), %r16
> +         leaq	(%r15), %rax
> +         leaq	(%r16), %rax
> +         leaq	(,%r15), %rax
> +         leaq	(,%r16), %rax
> +## REX2.R3 bit
> +         add    (%r16), %r8
> +         add    (%r16), %r15
> +## REX2.X3 bit
> +         mov    (,%r9), %r16
> +         mov    (,%r14), %r16
> +## REX2.B3 bit
> +	 sub   (%r10), %r31
> +	 sub   (%r13), %r31

Nit: There's still an indentation anomaly here.

> --- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
> +++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
> @@ -1,4 +1,8 @@
>  	.text
>  	{disp16} movb (%ebp),%al
>  	{disp16} movb (%rbp),%al
> +
> +	/* Instruction not support APX.  */
> +	{rex2} xsave (%r15, %rbx)
> +	{rex2} xsave64 (%r15, %rbx)
>  	.p2align 4,0

Aren't these dealt with (in a more complete fashion) in x86-64-pseudos-bad.s?

> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> @@ -5,3 +5,77 @@ pseudos:
>  	{rex} vmovaps %xmm7,%xmm2
>  	{rex} vmovaps %xmm17,%xmm2
>  	{rex} rorx $7,%eax,%ebx
> +	{rex2} vmovaps %xmm7,%xmm2
> +	{rex2} xsave (%rax)
> +	{rex2} xsaves (%ecx)
> +	{rex2} xsaves64 (%ecx)
> +	{rex2} xsavec (%ecx)
> +	{rex2} xrstors (%ecx)
> +	{rex2} xrstors64 (%ecx)

Here.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -144,6 +144,12 @@ struct instr_info
>    /* Bits of REX we've already used.  */
>    uint8_t rex_used;
>  
> +  /* Record W R4 X4 B4 bits for rex2.  */
> +  unsigned char rex2;
> +  /* Bits of REX2 we've already used.  */
> +  unsigned char rex2_used;

When you say REX2, one ought to be permitted to imply you mean the
prefix, not the struct field. That's ambiguous here, though - bit
positions used match those in rex2, not those in the REX2 payload.
IOW either you use lower case to make more obvious that you mean the
other struct field, or you say e.g. "REX2 prefix bits we've already
used." Albeit that would still be imprecise, as other REX2 prefix
bits' use is recorded in rex_used.

> @@ -265,8 +272,13 @@ struct dis_private {
>    {							\
>      if (value)						\
>        {							\
> -	if ((ins->rex & value))				\
> +	if (ins->rex & value)				\
>  	  ins->rex_used |= (value) | REX_OPCODE;	\

Like is done here, ...

> +	if (ins->rex2 & value)				\
> +	  {						\
> +	    ins->rex2_used |= value;			\

other uses of "value" also want parenthesizing, unless not used with
any kind of operator (e.g. in the if() above).

> @@ -276,6 +288,9 @@ struct dis_private {
>  #define EVEX_b_used 1
>  #define EVEX_len_used 2
>  
> +/* M0 in rex2 prefix represents map0 or map1.  */
> +#define REX2_M 0x8

Extending an earlier comment: This really should go next to REX_W and
friends. In principle the assembler could want to use this constant as
well, hence why it would better go the opcode/i386.h anyway.

> @@ -4196,19 +4221,19 @@ static const struct dis386 x86_64_table[][2] = {
>  
>    /* X86_64_E8 */
>    {
> -    { "callP",		{ Jv, BND }, 0 },
> -    { "call@",		{ Jv, BND }, 0 }
> +    { "callP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },

This, ...

> +    { "call@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
>    },
>  
>    /* X86_64_E9 */
>    {
> -    { "jmpP",		{ Jv, BND }, 0 },
> -    { "jmp@",		{ Jv, BND }, 0 }
> +    { "jmpP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },

... this, and ...

> +    { "jmp@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
>    },
>  
>    /* X86_64_EA */
>    {
> -    { "{l|}jmp{P|}", { Ap }, 0 },
> +    { "{l|}jmp{P|}", { Ap }, PREFIX_REX2_ILLEGAL },
>    },

... this change isn't really needed, is it? The marker is only needed
on 64-bit insns (i.e. respective slots 2 of x86_64_table[] entries).

Jan
  
Cui, Lili Dec. 25, 2023, 6:14 a.m. UTC | #2
> On 19.12.2023 13:12, Cui, Lili wrote:
> > @@ -5283,6 +5321,9 @@ md_assemble (char *line)
> >  	case unsupported_syntax:
> >  	  err_msg = _("unsupported syntax");
> >  	  break;
> > +	case unsupported_EGPR_for_addressing:
> > +	  err_msg = _("extended GPR cannot be used as base/index");
> > +	  break;
> 
> While this one's now suitable for the as_bad() below the switch, ...
> 
> > @@ -5336,6 +5377,9 @@ md_assemble (char *line)
> >  	case invalid_dest_and_src_register_set:
> >  	  err_msg = _("destination and source registers must be distinct");
> >  	  break;
> > +	case invalid_pseudo_prefix:
> > +	  err_msg = _("rex2 pseudo prefix cannot be used here");
> > +	  break;
> 
> ... this one still doesn't really fit the "... for `<insn>'" there. At least the "here"
> needs dropping.
> 
Changed to :

"rex2 pseudo prefix cannot be used"

> > @@ -5630,11 +5681,12 @@ md_assemble (char *line)
> >  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> >        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> >  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && i.rex != 0))
> > +	  && (i.rex != 0 || i.rex2 != 0)))
> >      {
> >        int x;
> >
> > -      i.rex |= REX_OPCODE;
> > +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
> > +	i.rex |= REX_OPCODE;
> >        for (x = 0; x < 2; x++)
> >  	{
> >  	  /* Look for 8 bit operand that uses old registers.  */ @@ -5645,7
> > +5697,7 @@ md_assemble (char *line)
> >  	      /* In case it is "hi" register, give up.  */
> >  	      if (i.op[x].regs->reg_num > 3)
> >  		as_bad (_("can't encode register '%s%s' in an "
> > -			  "instruction requiring REX prefix."),
> > +			  "instruction requiring REX/REX2 prefix."),
> >  			register_prefix, i.op[x].regs->reg_name);
> >
> >  	      /* Otherwise it is equivalent to the extended register.
> > @@ -5657,11 +5709,11 @@ md_assemble (char *line)
> >  	}
> >      }
> >
> > -  if (i.rex == 0 && i.rex_encoding)
> > +  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding ||
> > + i.rex2_encoding))
> >      {
> >        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
> >  	 that uses legacy register.  If it is "hi" register, don't add
> > -	 the REX_OPCODE byte.  */
> > +	 rex and rex2 prefix.  */
> >        int x;
> >        for (x = 0; x < 2; x++)
> >  	if (i.types[x].bitfield.class == Reg @@ -5671,6 +5723,7 @@
> > md_assemble (char *line)
> >  	  {
> >  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> >  	    i.rex_encoding = false;
> > +	    i.rex2_encoding = false;
> >  	    break;
> >  	  }
> >
> > @@ -5678,7 +5731,13 @@ md_assemble (char *line)
> >  	i.rex = REX_OPCODE;
> >      }
> >
> > -  if (i.rex != 0)
> > +  if (is_apx_rex2_encoding ())
> > +    {
> > +      build_rex2_prefix ();
> > +      /* The individual REX.RXBW bits got consumed.  */
> > +      i.rex &= REX_OPCODE;
> > +    }
> > +  else if (i.rex != 0)
> >      add_prefix (REX_OPCODE | i.rex);
> >
> >    insert_lfence_before (last_insn);
> 
> All of this will need re-basing over "x86: properly respect rex/{rex}", with the
> result (I hope) that .insn will then also be covered REX2-wise.
> 
Done.

> > @@ -5752,6 +5811,20 @@ parse_insn (const char *line, char *mnemonic,
> bool prefix_only)
> >  	    goto too_long;
> >  	  *mnem_p = '\0';
> >
> > +	  /* Point l at the closing brace if there's no other separator.  */
> > +	  if (*l != END_OF_INSN && !is_space_char (*l)
> > +	      && *l != PREFIX_SEPARATOR)
> > +	    --l;
> > +	}
> > +      /* Skip the immediate 0x** of {rex2 0x00} prefix.  */
> > +      else if (*mnemonic == '{'&& is_space_char (*l))
> 
> Nit: Missing blank.
> 
> > +	{
> > +	  while ( *l != '}')
> 
> Nit: Stray blank.
> 
> > +	    ++l;
> 
> What if there's no '}' on the line?
> 
> > +	  *mnem_p++ = *l++;
> > +	  if (mnem_p >= mnemonic + MAX_MNEM_SIZE)
> > +	    goto too_long;
> > +	  *mnem_p = '\0';
> 
> You skip everything, not just 0xNN. You also skip stuff after other pseudo
> prefixes, if I'm not mistaken. That's both too lax. However, skipping isn't an
> option here anyway. Either we actually respect what the user has written, or
> we error out. Skipping therefore is an option only if the provided expression
> (not just plain number!) evaluates to 0. I don't understand anyway why this
> code was added: When I asked about the specific plans, H.J. clearly said the
> form with a constant would be a disassembler- only thing for now.
> 

It should be only supported in disassembler, I will drop it in assembler.

> > @@ -7005,6 +7082,43 @@ VEX_check_encoding (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if Egprs operands are valid for the instruction.  */
> > +
> > +static int
> > +check_EgprOperands (const insn_template *t)
> 
> Hmm, I thought I had asked before to make functions with boolean return
> values have a return type of bool, and then use "true" for success. An
> alternative would be to return the error indicator, rather than putting it in
> i.error here.
> 
> Then again I realize this is in line with VEX_check_encoding() and
> check_VecOperands() (which I think would better be changed, but anyway).
> 

Changed it to bool. For the rest, it's a bit strange to only change check_EgprOperands. Can this place be left unchanged? Or should I submit a new patch and change the old one first?

> > @@ -7149,6 +7263,14 @@ match_template (char mnem_suffix)
> >  	      continue;
> >  	    }
> >
> > +	  /* Check if pseudo prefix {rex2} is valid.  */
> > +	  if (t->opcode_modifier.noegpr && i.rex2_encoding)
> > +	    {
> > +	      i.error = invalid_pseudo_prefix;
> 
> What is this needed for? I.e. why can't you pass the value ...
> 
> > +	      specific_error = progress (i.error);
> 
> ... in here directly (as is done elsewhere as well)?
> 

Done.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> > @@ -0,0 +1,86 @@
> > +## REX2.B3 bit
> > +	 sub   (%r10), %r31
> > +	 sub   (%r13), %r31
> 
> Nit: There's still an indentation anomaly here.
> 
Done.

> > --- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
> > +++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
> > @@ -1,4 +1,8 @@
> >  	.text
> >  	{disp16} movb (%ebp),%al
> >  	{disp16} movb (%rbp),%al
> > +
> > +	/* Instruction not support APX.  */
> > +	{rex2} xsave (%r15, %rbx)
> > +	{rex2} xsave64 (%r15, %rbx)
> >  	.p2align 4,0
> 
> Aren't these dealt with (in a more complete fashion) in x86-64-pseudos-
> bad.s?
> 
Removed.

> > --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> > +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> > @@ -5,3 +5,77 @@ pseudos:
> >  	{rex} vmovaps %xmm7,%xmm2
> >  	{rex} vmovaps %xmm17,%xmm2
> >  	{rex} rorx $7,%eax,%ebx
> > +	{rex2} vmovaps %xmm7,%xmm2
> > +	{rex2} xsave (%rax)
> > +	{rex2} xsaves (%ecx)
> > +	{rex2} xsaves64 (%ecx)
> > +	{rex2} xsavec (%ecx)
> > +	{rex2} xrstors (%ecx)
> > +	{rex2} xrstors64 (%ecx)
> 
> Here.
> 
> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> > @@ -144,6 +144,12 @@ struct instr_info
> >    /* Bits of REX we've already used.  */
> >    uint8_t rex_used;
> >
> > +  /* Record W R4 X4 B4 bits for rex2.  */  unsigned char rex2;
> > +  /* Bits of REX2 we've already used.  */  unsigned char rex2_used;
> 
> When you say REX2, one ought to be permitted to imply you mean the prefix,
> not the struct field. That's ambiguous here, though - bit positions used match
> those in rex2, not those in the REX2 payload.
> IOW either you use lower case to make more obvious that you mean the other
> struct field, or you say e.g. "REX2 prefix bits we've already used." Albeit that
> would still be imprecise, as other REX2 prefix bits' use is recorded in rex_used.
> 

Changed to 

  /* Bits of rex2 we've already used.  */
  unsigned char rex2_used;

> > @@ -265,8 +272,13 @@ struct dis_private {
> >    {							\
> >      if (value)						\
> >        {							\
> > -	if ((ins->rex & value))				\
> > +	if (ins->rex & value)				\
> >  	  ins->rex_used |= (value) | REX_OPCODE;	\
> 
> Like is done here, ...
> 
> > +	if (ins->rex2 & value)				\
> > +	  {						\
> > +	    ins->rex2_used |= value;			\
> 
> other uses of "value" also want parenthesizing, unless not used with any kind
> of operator (e.g. in the if() above).
> 
Done.

> > @@ -276,6 +288,9 @@ struct dis_private {  #define EVEX_b_used 1
> > #define EVEX_len_used 2
> >
> > +/* M0 in rex2 prefix represents map0 or map1.  */ #define REX2_M 0x8
> 
> Extending an earlier comment: This really should go next to REX_W and
> friends. In principle the assembler could want to use this constant as well,
> hence why it would better go the opcode/i386.h anyway.
> 
Done.

> > @@ -4196,19 +4221,19 @@ static const struct dis386 x86_64_table[][2] =
> > {
> >
> >    /* X86_64_E8 */
> >    {
> > -    { "callP",		{ Jv, BND }, 0 },
> > -    { "call@",		{ Jv, BND }, 0 }
> > +    { "callP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
> 
> This, ...
> 
> > +    { "call@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
> >    },
> >
> >    /* X86_64_E9 */
> >    {
> > -    { "jmpP",		{ Jv, BND }, 0 },
> > -    { "jmp@",		{ Jv, BND }, 0 }
> > +    { "jmpP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
> 
> ... this, and ...
> 
> > +    { "jmp@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
> >    },
> >
> >    /* X86_64_EA */
> >    {
> > -    { "{l|}jmp{P|}", { Ap }, 0 },
> > +    { "{l|}jmp{P|}", { Ap }, PREFIX_REX2_ILLEGAL },
> >    },
> 
> ... this change isn't really needed, is it? The marker is only needed on 64-bit
> insns (i.e. respective slots 2 of x86_64_table[] entries).
> 
Done.

Thanks,
Lili.
  
Jan Beulich Jan. 4, 2024, 8:57 a.m. UTC | #3
On 25.12.2023 07:14, Cui, Lili wrote:
>> On 19.12.2023 13:12, Cui, Lili wrote:
>>> @@ -7005,6 +7082,43 @@ VEX_check_encoding (const insn_template *t)
>>>    return 0;
>>>  }
>>>
>>> +/* Check if Egprs operands are valid for the instruction.  */
>>> +
>>> +static int
>>> +check_EgprOperands (const insn_template *t)
>>
>> Hmm, I thought I had asked before to make functions with boolean return
>> values have a return type of bool, and then use "true" for success. An
>> alternative would be to return the error indicator, rather than putting it in
>> i.error here.
>>
>> Then again I realize this is in line with VEX_check_encoding() and
>> check_VecOperands() (which I think would better be changed, but anyway).
>>
> 
> Changed it to bool. For the rest, it's a bit strange to only change check_EgprOperands. Can this place be left unchanged? Or should I submit a new patch and change the old one first?

Leaving alone is okay; I'll see about cleaning that up at some later point.

Jan
  

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index aa26f5ce034..051cdef2a3a 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -239,6 +239,7 @@  enum i386_error
     bad_imm4,
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
+    unsupported_EGPR_for_addressing,
     unsupported,
     unsupported_on_arch,
     unsupported_64bit,
@@ -249,6 +250,7 @@  enum i386_error
     invalid_vector_register_set,
     invalid_tmm_register_set,
     invalid_dest_and_src_register_set,
+    invalid_pseudo_prefix,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -356,6 +358,7 @@  struct _i386_insn
     modrm_byte rm;
     rex_byte rex;
     rex_byte vrex;
+    rex_byte rex2;
     sib_byte sib;
     vex_prefix vex;
 
@@ -429,6 +432,9 @@  struct _i386_insn
     /* Prefer the REX byte in encoding.  */
     bool rex_encoding;
 
+    /* Prefer the REX2 prefix in encoding.  */
+    bool rex2_encoding;
+
     /* Disable instruction size optimization.  */
     bool no_optimize;
 
@@ -1146,6 +1152,7 @@  static const arch_entry cpu_arch[] =
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
   VECARCH (avx10.1, AVX10_1, ANY_AVX512F, set),
   SUBARCH (user_msr, USER_MSR, USER_MSR, false),
+  SUBARCH (apx_f, APX_F, APX_F, false),
 };
 
 #undef SUBARCH
@@ -1661,6 +1668,7 @@  _is_cpu (const i386_cpu_attr *a, enum i386_cpu cpu)
     case CpuHLE:      return a->bitfield.cpuhle;
     case CpuAVX512F:  return a->bitfield.cpuavx512f;
     case CpuAVX512VL: return a->bitfield.cpuavx512vl;
+    case CpuAPX_F:    return a->bitfield.cpuapx_f;
     case Cpu64:       return a->bitfield.cpu64;
     case CpuNo64:     return a->bitfield.cpuno64;
     default:
@@ -2332,7 +2340,7 @@  register_number (const reg_entry *r)
   if (r->reg_flags & RegRex)
     nr += 8;
 
-  if (r->reg_flags & RegVRex)
+  if (r->reg_flags & (RegVRex | RegRex2))
     nr += 16;
 
   return nr;
@@ -3868,6 +3876,12 @@  is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || t->opcode_modifier.evex;
 }
 
+static INLINE bool
+is_apx_rex2_encoding (void)
+{
+  return i.rex2 || i.rex2_encoding;
+}
+
 static unsigned int
 get_broadcast_bytes (const insn_template *t, bool diag)
 {
@@ -4123,6 +4137,22 @@  build_evex_prefix (void)
     i.vex.bytes[3] |= i.mask.reg->reg_num;
 }
 
+/* Build (2 bytes) rex2 prefix.
+   | D5h |
+   | m | R4 X4 B4 | W R X B |
+
+   Rex2 reuses i.vex as they both encode i.tm.opcode_space in their prefixes.
+ */
+static void
+build_rex2_prefix (void)
+{
+  i.vex.length = 2;
+  i.vex.bytes[0] = 0xd5;
+  /* For the W R X B bits, the variables of rex prefix will be reused.  */
+  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
+		    | (i.rex2 << 4) | i.rex);
+}
+
 static void
 process_immext (void)
 {
@@ -4386,14 +4416,22 @@  optimize_encoding (void)
 	  i.types[1].bitfield.byte = 1;
 	  /* Ignore the suffix.  */
 	  i.suffix = 0;
-	  /* Convert to byte registers.  */
+	  /* Convert to byte registers. 8-bit registers are special,
+	     RegRex64 and non-RegRex64 each have 8 registers.  */
 	  if (i.types[1].bitfield.word)
-	    j = 16;
-	  else if (i.types[1].bitfield.dword)
+	    /* 32 (or 40) 8-bit registers.  */
 	    j = 32;
+	  else if (i.types[1].bitfield.dword)
+	    /* 32 (or 40) 8-bit registers + 32 16-bit registers.  */
+	    j = 64;
 	  else
-	    j = 48;
-	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
+	    /* 32 (or 40) 8-bit registers + 32 16-bit registers
+	       + 32 32-bit registers.  */
+	    j = 96;
+
+	  /* In 64-bit mode, the following byte registers cannot be accessed
+	     if using the Rex and Rex2 prefix: AH, BH, CH, DH */
+	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
 	    j += 8;
 	  i.op[1].regs -= j;
 	}
@@ -5283,6 +5321,9 @@  md_assemble (char *line)
 	case unsupported_syntax:
 	  err_msg = _("unsupported syntax");
 	  break;
+	case unsupported_EGPR_for_addressing:
+	  err_msg = _("extended GPR cannot be used as base/index");
+	  break;
 	case unsupported:
 	  as_bad (_("unsupported instruction `%s'"),
 		  pass1_mnem ? pass1_mnem : insn_name (current_templates.start));
@@ -5336,6 +5377,9 @@  md_assemble (char *line)
 	case invalid_dest_and_src_register_set:
 	  err_msg = _("destination and source registers must be distinct");
 	  break;
+	case invalid_pseudo_prefix:
+	  err_msg = _("rex2 pseudo prefix cannot be used here");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -5591,6 +5635,13 @@  md_assemble (char *line)
 	  return;
 	}
 
+      /* Check for explicit REX2 prefix.  */
+      if (i.rex2_encoding)
+	{
+	  as_bad (_("{rex2} prefix invalid with `%s'"), insn_name (&i.tm));
+	  return;
+	}
+
       if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
@@ -5630,11 +5681,12 @@  md_assemble (char *line)
 	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
 	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && i.rex != 0))
+	  && (i.rex != 0 || i.rex2 != 0)))
     {
       int x;
 
-      i.rex |= REX_OPCODE;
+      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
+	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
 	  /* Look for 8 bit operand that uses old registers.  */
@@ -5645,7 +5697,7 @@  md_assemble (char *line)
 	      /* In case it is "hi" register, give up.  */
 	      if (i.op[x].regs->reg_num > 3)
 		as_bad (_("can't encode register '%s%s' in an "
-			  "instruction requiring REX prefix."),
+			  "instruction requiring REX/REX2 prefix."),
 			register_prefix, i.op[x].regs->reg_name);
 
 	      /* Otherwise it is equivalent to the extended register.
@@ -5657,11 +5709,11 @@  md_assemble (char *line)
 	}
     }
 
-  if (i.rex == 0 && i.rex_encoding)
+  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))
     {
       /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
 	 that uses legacy register.  If it is "hi" register, don't add
-	 the REX_OPCODE byte.  */
+	 rex and rex2 prefix.  */
       int x;
       for (x = 0; x < 2; x++)
 	if (i.types[x].bitfield.class == Reg
@@ -5671,6 +5723,7 @@  md_assemble (char *line)
 	  {
 	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
 	    i.rex_encoding = false;
+	    i.rex2_encoding = false;
 	    break;
 	  }
 
@@ -5678,7 +5731,13 @@  md_assemble (char *line)
 	i.rex = REX_OPCODE;
     }
 
-  if (i.rex != 0)
+  if (is_apx_rex2_encoding ())
+    {
+      build_rex2_prefix ();
+      /* The individual REX.RXBW bits got consumed.  */
+      i.rex &= REX_OPCODE;
+    }
+  else if (i.rex != 0)
     add_prefix (REX_OPCODE | i.rex);
 
   insert_lfence_before (last_insn);
@@ -5752,6 +5811,20 @@  parse_insn (const char *line, char *mnemonic, bool prefix_only)
 	    goto too_long;
 	  *mnem_p = '\0';
 
+	  /* Point l at the closing brace if there's no other separator.  */
+	  if (*l != END_OF_INSN && !is_space_char (*l)
+	      && *l != PREFIX_SEPARATOR)
+	    --l;
+	}
+      /* Skip the immediate 0x** of {rex2 0x00} prefix.  */
+      else if (*mnemonic == '{'&& is_space_char (*l))
+	{
+	  while ( *l != '}')
+	    ++l;
+	  *mnem_p++ = *l++;
+	  if (mnem_p >= mnemonic + MAX_MNEM_SIZE)
+	    goto too_long;
+	  *mnem_p = '\0';
 	  /* Point l at the closing brace if there's no other separator.  */
 	  if (*l != END_OF_INSN && !is_space_char (*l)
 	      && *l != PREFIX_SEPARATOR)
@@ -5856,6 +5929,10 @@  parse_insn (const char *line, char *mnemonic, bool prefix_only)
 		  /* {rex} */
 		  i.rex_encoding = true;
 		  break;
+		case Prefix_REX2:
+		  /* {rex2} */
+		  i.rex2_encoding = true;
+		  break;
 		case Prefix_NoOptimize:
 		  /* {nooptimize} */
 		  i.no_optimize = true;
@@ -7005,6 +7082,43 @@  VEX_check_encoding (const insn_template *t)
   return 0;
 }
 
+/* Check if Egprs operands are valid for the instruction.  */
+
+static int
+check_EgprOperands (const insn_template *t)
+{
+  if (!t->opcode_modifier.noegpr)
+    return 0;
+
+  for (unsigned int op = 0; op < i.operands; op++)
+    {
+      if (i.types[op].bitfield.class != Reg)
+	continue;
+
+      if (i.op[op].regs->reg_flags & RegRex2)
+	{
+	  i.error = register_type_mismatch;
+	  return 1;
+	}
+    }
+
+  if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
+      || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
+    {
+      i.error = unsupported_EGPR_for_addressing;
+      return 1;
+    }
+
+  /* Check if pseudo prefix {rex2} is valid.  */
+  if (i.rex2_encoding)
+    {
+      i.error = invalid_pseudo_prefix;
+      return 1;
+    }
+
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7149,6 +7263,14 @@  match_template (char mnem_suffix)
 	      continue;
 	    }
 
+	  /* Check if pseudo prefix {rex2} is valid.  */
+	  if (t->opcode_modifier.noegpr && i.rex2_encoding)
+	    {
+	      i.error = invalid_pseudo_prefix;
+	      specific_error = progress (i.error);
+	      continue;
+	    }
+
 	  /* We've found a match; break out of loop.  */
 	  break;
 	}
@@ -7479,6 +7601,13 @@  match_template (char mnem_suffix)
 	  continue;
 	}
 
+      /* Check if EGPR operands(r16-r31) are valid.  */
+      if (check_EgprOperands (t))
+	{
+	  specific_error = progress (i.error);
+	  continue;
+	}
+
       /* Check whether to use the shorter VEX encoding for certain insns where
 	 the EVEX encoding comes first in the table.  This requires the respective
 	 AVX-* feature to be explicitly enabled.
@@ -8377,6 +8506,18 @@  static INLINE void set_rex_vrex (const reg_entry *r, unsigned int rex_bit,
 
   if (r->reg_flags & RegVRex)
     i.vrex |= rex_bit;
+
+  if (r->reg_flags & RegRex2)
+    i.rex2 |= rex_bit;
+}
+
+static INLINE void
+set_rex_rex2 (const reg_entry *r, unsigned int rex_bit)
+{
+  if ((r->reg_flags & RegRex) != 0)
+    i.rex |= rex_bit;
+  if ((r->reg_flags & RegRex2) != 0)
+    i.rex2 |= rex_bit;
 }
 
 static int
@@ -8860,8 +9001,7 @@  build_modrm_byte (void)
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
 		  i.types[op] = operand_type_and_not (i.types[op], anydisp);
 		  i.types[op].bitfield.disp32 = 1;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 	    }
 	  /* RIP addressing for 64bit mode.  */
@@ -8932,8 +9072,7 @@  build_modrm_byte (void)
 
 	      if (!i.tm.opcode_modifier.sib)
 		i.rm.regmem = i.base_reg->reg_num;
-	      if ((i.base_reg->reg_flags & RegRex) != 0)
-		i.rex |= REX_B;
+	      set_rex_rex2 (i.base_reg, REX_B);
 	      i.sib.base = i.base_reg->reg_num;
 	      /* x86-64 ignores REX prefix bit here to avoid decoder
 		 complications.  */
@@ -8971,8 +9110,7 @@  build_modrm_byte (void)
 		  else
 		    i.sib.index = i.index_reg->reg_num;
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 
 	      if (i.disp_operands
@@ -10116,6 +10254,12 @@  output_insn (const struct last_insn *last_insn)
 	  for (j = ARRAY_SIZE (i.prefix), q = i.prefix; j > 0; j--, q++)
 	    if (*q)
 	      frag_opcode_byte (*q);
+
+	  if (is_apx_rex2_encoding ())
+	    {
+	      frag_opcode_byte (i.vex.bytes[0]);
+	      frag_opcode_byte (i.vex.bytes[1]);
+	    }
 	}
       else
 	{
@@ -14144,6 +14288,13 @@  static bool check_register (const reg_entry *r)
 	i.vec_encoding = vex_encoding_error;
     }
 
+  if (r->reg_flags & RegRex2)
+    {
+      if (!cpu_arch_flags.bitfield.cpuapx_f
+	  || flag_code != CODE_64BIT)
+	return false;
+    }
+
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
       && (!cpu_arch_flags.bitfield.cpu64
 	  || r->reg_type.bitfield.class != RegCR
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 03ee980bef7..21f48c93300 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -217,6 +217,7 @@  accept various extension mnemonics.  For example,
 @code{avx10.1/256},
 @code{avx10.1/128},
 @code{user_msr},
+@code{apx_f},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -983,6 +984,10 @@  Different encoding options can be specified via pseudo prefixes:
 instructions (x86-64 only).  Note that this differs from the @samp{rex}
 prefix which generates REX prefix unconditionally.
 
+@item
+@samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
+instructions (APX_F only).
+
 @item
 @samp{@{nooptimize@}} -- disable instruction size optimization.
 @end itemize
@@ -1663,7 +1668,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
 @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
 @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
-@item @samp{.tlbsync}
+@item @samp{.tlbsync} @tab @samp{.apx_f}
 @end multitable
 
 Apart from the warning, there are only two other effects on
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
index a2b09d2e74f..56834371133 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
@@ -2,49 +2,4 @@ 
 #as: --32
 #objdump: -dw -Mx86-64 -Mintel
 #name: x86-64 (ILP32) illegal opcodes (Intel mode)
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <aaa>:
-[ 	]*[a-f0-9]+:	37                   	\(bad\)
-
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+7 <aam1>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+9 <aas>:
-[ 	]*[a-f0-9]+:	3f                   	\(bad\)
-
-0+a <bound>:
-[ 	]*[a-f0-9]+:	62                   	.byte 0x62
-[ 	]*[a-f0-9]+:	10                   	.byte 0x10
-
-0+c <daa>:
-[ 	]*[a-f0-9]+:	27                   	\(bad\)
-
-0+d <das>:
-[ 	]*[a-f0-9]+:	2f                   	\(bad\)
-
-0+e <into>:
-[ 	]*[a-f0-9]+:	ce                   	\(bad\)
-
-0+f <pusha>:
-[ 	]*[a-f0-9]+:	60                   	\(bad\)
-
-0+10 <popa>:
-[ 	]*[a-f0-9]+:	61                   	\(bad\)
-#pass
+#dump: ../x86-64-opcode-inval-intel.d
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
index 5a17b0b412e..b5233a5cf93 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
@@ -2,49 +2,4 @@ 
 #as: --32
 #objdump: -dw -Mx86-64
 #name: x86-64 (ILP32) illegal opcodes
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <aaa>:
-[ 	]*[a-f0-9]+:	37                   	\(bad\)
-
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+7 <aam1>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+9 <aas>:
-[ 	]*[a-f0-9]+:	3f                   	\(bad\)
-
-0+a <bound>:
-[ 	]*[a-f0-9]+:	62                   	.byte 0x62
-[ 	]*[a-f0-9]+:	10                   	.byte 0x10
-
-0+c <daa>:
-[ 	]*[a-f0-9]+:	27                   	\(bad\)
-
-0+d <das>:
-[ 	]*[a-f0-9]+:	2f                   	\(bad\)
-
-0+e <into>:
-[ 	]*[a-f0-9]+:	ce                   	\(bad\)
-
-0+f <pusha>:
-[ 	]*[a-f0-9]+:	60                   	\(bad\)
-
-0+10 <popa>:
-[ 	]*[a-f0-9]+:	61                   	\(bad\)
-#pass
+#dump: ../x86-64-opcode-inval.d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
new file mode 100644
index 00000000000..bb5c602a2e2
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -0,0 +1,15 @@ 
+.*: Assembler messages:
+.*:4: Error: bad register name `%r17d'
+.*:7: Error: extended GPR cannot be used as base/index for `xsave'
+.*:8: Error: extended GPR cannot be used as base/index for `xsave64'
+.*:9: Error: extended GPR cannot be used as base/index for `xrstor'
+.*:10: Error: extended GPR cannot be used as base/index for `xrstor64'
+.*:11: Error: extended GPR cannot be used as base/index for `xsaves'
+.*:12: Error: extended GPR cannot be used as base/index for `xsaves64'
+.*:13: Error: extended GPR cannot be used as base/index for `xrstors'
+.*:14: Error: extended GPR cannot be used as base/index for `xrstors64'
+.*:15: Error: extended GPR cannot be used as base/index for `xsaveopt'
+.*:16: Error: extended GPR cannot be used as base/index for `xsaveopt64'
+.*:17: Error: extended GPR cannot be used as base/index for `xsavec'
+.*:18: Error: extended GPR cannot be used as base/index for `xsavec64'
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
new file mode 100644
index 00000000000..bfb6b3fd03b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -0,0 +1,18 @@ 
+# Check illegal 64bit APX_F instructions
+	.text
+	.arch .noapx_f
+	test    $0x7, %r17d
+	.arch .apx_f
+	test    $0x7, %r17d
+	xsave (%r16, %rbx)
+	xsave64 (%r16, %r31)
+	xrstor (%r16, %rbx)
+	xrstor64 (%r16, %rbx)
+	xsaves (%rbx, %r16)
+	xsaves64 (%r16, %rbx)
+	xrstors (%rbx, %r31)
+	xrstors64 (%r16, %rbx)
+	xsaveopt (%r16, %rbx)
+	xsaveopt64 (%r16, %r31)
+	xsavec (%r16, %rbx)
+	xsavec64 (%r16, %r31)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.d b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
new file mode 100644
index 00000000000..e3cd534da11
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
@@ -0,0 +1,83 @@ 
+#as:
+#objdump: -dw
+#name: x86-64 APX_F use gpr32 with rex2 prefix
+#source: x86-64-apx-rex2.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f6 c0 07[	 ]+test   \$0x7,%r24b
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 19 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24
+[	 ]*[a-f0-9]+:[	 ]*66 d5 11 f7 c0 07 00[	 ]+test   \$0x7,%r24w
+[	 ]*[a-f0-9]+:[	 ]*44 0f af f8[	 ]+imul   %eax,%r15d
+[	 ]*[a-f0-9]+:[	 ]*d5 c0 af c0[	 ]+imul   %eax,%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 90 62 12[	 ]+punpckldq %mm2,\(%r18\)
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 00[	 ]+lea    \(%rax\),%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 08[	 ]+lea    \(%rax\),%r17d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 10[	 ]+lea    \(%rax\),%r18d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 18[	 ]+lea    \(%rax\),%r19d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 20[	 ]+lea    \(%rax\),%r20d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 28[	 ]+lea    \(%rax\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 30[	 ]+lea    \(%rax\),%r22d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 38[	 ]+lea    \(%rax\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 00[	 ]+lea    \(%rax\),%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 08[	 ]+lea    \(%rax\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 10[	 ]+lea    \(%rax\),%r26d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 18[	 ]+lea    \(%rax\),%r27d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 20[	 ]+lea    \(%rax\),%r28d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 28[	 ]+lea    \(%rax\),%r29d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 30[	 ]+lea    \(%rax\),%r30d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 38[	 ]+lea    \(%rax\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r16,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r17,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r18,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r19,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r21,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r22,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r23,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r24,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r25,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r26,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r27,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r28,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r29,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r30,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r31,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 00[	 ]+lea    \(%r16\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 01[	 ]+lea    \(%r17\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 02[	 ]+lea    \(%r18\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 03[	 ]+lea    \(%r19\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 04 24       	lea    \(%r20\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 45 00       	lea    0x0\(%r21\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 06[	 ]+lea    \(%r22\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 07[	 ]+lea    \(%r23\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 00[	 ]+lea    \(%r24\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 01[	 ]+lea    \(%r25\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 02[	 ]+lea    \(%r26\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 03[	 ]+lea    \(%r27\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 04 24       	lea    \(%r28\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 45 00       	lea    0x0\(%r29\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 06          	lea    \(%r30\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 07          	lea    \(%r31\),%eax
+[	 ]*[a-f0-9]+:[	 ]*4c 8d 38             	lea    \(%rax\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 48 8d 00          	lea    \(%rax\),%r16
+[	 ]*[a-f0-9]+:[	 ]*49 8d 07             	lea    \(%r15\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 18 8d 00          	lea    \(%r16\),%rax
+[	 ]*[a-f0-9]+:[	 ]*4a 8d 04 3d 00 00 00 00 	lea    0x0\(,%r15,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 28 8d 04 05 00 00 00 00 	lea    0x0\(,%r16,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 00          	add    \(%r16\),%r8
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 38          	add    \(%r16\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 0d 00 00 00 00 	mov    0x0\(,%r9,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 35 00 00 00 00 	mov    0x0\(,%r14,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 3a          	sub    \(%r10\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 7d 00       	sub    0x0\(%r13\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 30 8d 44 20 01    	lea    0x1\(%r16,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 76 8d 7c 20 01    	lea    0x1\(%r16,%r28,1\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 12 8d 84 04 81 00 00 00 	lea    0x81\(%r20,%r8,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 57 8d bc 04 81 00 00 00 	lea    0x81\(%r28,%r8,1\),%r31d
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.s b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
new file mode 100644
index 00000000000..7adf55ff82c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
@@ -0,0 +1,86 @@ 
+# Check 64bit instructions with rex2 prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+         test	$0x7, %r24b
+         test	$0x7, %r24d
+         test	$0x7, %r24
+         test	$0x7, %r24w
+## REX2.M bit
+         imull	%eax, %r15d
+         imull	%eax, %r16d
+         punpckldq (%r18), %mm2
+## REX2.R4 bit
+         leal	(%rax), %r16d
+         leal	(%rax), %r17d
+         leal	(%rax), %r18d
+         leal	(%rax), %r19d
+         leal	(%rax), %r20d
+         leal	(%rax), %r21d
+         leal	(%rax), %r22d
+         leal	(%rax), %r23d
+         leal	(%rax), %r24d
+         leal	(%rax), %r25d
+         leal	(%rax), %r26d
+         leal	(%rax), %r27d
+         leal	(%rax), %r28d
+         leal	(%rax), %r29d
+         leal	(%rax), %r30d
+         leal	(%rax), %r31d
+## REX2.X4 bit
+         leal	(,%r16), %eax
+         leal	(,%r17), %eax
+         leal	(,%r18), %eax
+         leal	(,%r19), %eax
+         leal	(,%r20), %eax
+         leal	(,%r21), %eax
+         leal	(,%r22), %eax
+         leal	(,%r23), %eax
+         leal	(,%r24), %eax
+         leal	(,%r25), %eax
+         leal	(,%r26), %eax
+         leal	(,%r27), %eax
+         leal	(,%r28), %eax
+         leal	(,%r29), %eax
+         leal	(,%r30), %eax
+         leal	(,%r31), %eax
+## REX2.B4 bit
+         leal	(%r16), %eax
+         leal	(%r17), %eax
+         leal	(%r18), %eax
+         leal	(%r19), %eax
+         leal	(%r20), %eax
+         leal	(%r21), %eax
+         leal	(%r22), %eax
+         leal	(%r23), %eax
+         leal	(%r24), %eax
+         leal	(%r25), %eax
+         leal	(%r26), %eax
+         leal	(%r27), %eax
+         leal	(%r28), %eax
+         leal	(%r29), %eax
+         leal	(%r30), %eax
+         leal	(%r31), %eax
+## REX2.W bit
+         leaq	(%rax), %r15
+         leaq	(%rax), %r16
+         leaq	(%r15), %rax
+         leaq	(%r16), %rax
+         leaq	(,%r15), %rax
+         leaq	(,%r16), %rax
+## REX2.R3 bit
+         add    (%r16), %r8
+         add    (%r16), %r15
+## REX2.X3 bit
+         mov    (,%r9), %r16
+         mov    (,%r14), %r16
+## REX2.B3 bit
+	 sub   (%r10), %r31
+	 sub   (%r13), %r31
+
+## SIB
+         leal	1(%r16, %r20), %eax
+         leal	1(%r16, %r28), %r31d
+         leal	129(%r20, %r8), %eax
+         leal	129(%r28, %r8), %r31d
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
index 13ad0fb768f..256e1b9a370 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
@@ -1,10 +1,16 @@ 
 .*: Assembler messages:
 .*:2: Error: .*
 .*:3: Error: .*
+.*:6: Error: .*
+.*:7: Error: .*
 GAS LISTING .*
 
 
 [ 	]*1[ 	]+\.text
 [ 	]*2[ 	]+\{disp16\} movb \(%ebp\),%al
 [ 	]*3[ 	]+\{disp16\} movb \(%rbp\),%al
+[ 	]*4[ 	]+
+[ 	]*5[ 	]+.*
+[ 	]*6[ 	]+\{rex2\} xsave \(%r15, %rbx\)
+[ 	]*7[ 	]+\{rex2\} xsave64 \(%r15, %rbx\)
 #...
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
index c10b14c2099..ae30476e500 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
@@ -1,4 +1,8 @@ 
 	.text
 	{disp16} movb (%ebp),%al
 	{disp16} movb (%rbp),%al
+
+	/* Instruction not support APX.  */
+	{rex2} xsave (%r15, %rbx)
+	{rex2} xsave64 (%r15, %rbx)
 	.p2align 4,0
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
index 6ee5b2f95ce..66c4d2cddc0 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
@@ -10,41 +10,33 @@  Disassembly of section .text:
 0+ <aaa>:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
+0+1 <aam0>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
-0+7 <aam1>:
+0+3 <aam1>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
-0+9 <aas>:
+0+5 <aas>:
 [ 	]*[a-f0-9]+:	3f                   	\(bad\)
 
-0+a <bound>:
+0+6 <bound>:
 [ 	]*[a-f0-9]+:	62                   	.byte 0x62
 [ 	]*[a-f0-9]+:	10                   	.byte 0x10
 
-0+c <daa>:
+0+8 <daa>:
 [ 	]*[a-f0-9]+:	27                   	\(bad\)
 
-0+d <das>:
+0+9 <das>:
 [ 	]*[a-f0-9]+:	2f                   	\(bad\)
 
-0+e <into>:
+0+a <into>:
 [ 	]*[a-f0-9]+:	ce                   	\(bad\)
 
-0+f <pusha>:
+0+b <pusha>:
 [ 	]*[a-f0-9]+:	60                   	\(bad\)
 
-0+10 <popa>:
+0+c <popa>:
 [ 	]*[a-f0-9]+:	61                   	\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
index 12f02c1766c..fbb850b56da 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
@@ -9,41 +9,33 @@  Disassembly of section .text:
 0+ <aaa>:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
+0+1 <aam0>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
-0+7 <aam1>:
+0+3 <aam1>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
-0+9 <aas>:
+0+5 <aas>:
 [ 	]*[a-f0-9]+:	3f                   	\(bad\)
 
-0+a <bound>:
+0+6 <bound>:
 [ 	]*[a-f0-9]+:	62                   	.byte 0x62
 [ 	]*[a-f0-9]+:	10                   	.byte 0x10
 
-0+c <daa>:
+0+8 <daa>:
 [ 	]*[a-f0-9]+:	27                   	\(bad\)
 
-0+d <das>:
+0+9 <das>:
 [ 	]*[a-f0-9]+:	2f                   	\(bad\)
 
-0+e <into>:
+0+a <into>:
 [ 	]*[a-f0-9]+:	ce                   	\(bad\)
 
-0+f <pusha>:
+0+b <pusha>:
 [ 	]*[a-f0-9]+:	60                   	\(bad\)
 
-0+10 <popa>:
+0+c <popa>:
 [ 	]*[a-f0-9]+:	61                   	\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.s b/gas/testsuite/gas/i386/x86-64-opcode-inval.s
index 6cbfe7705a8..fbcda3df773 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.s
@@ -2,10 +2,6 @@ 
 # All the followings are illegal opcodes for x86-64.
 aaa:
 	aaa
-aad0:
-	aad
-aad1:
-	aad $2
 aam0:
 	aam
 aam1:
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
index 3f9f67fcf4b..a72f847085d 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
@@ -1,6 +1,71 @@ 
 .*: Assembler messages:
-.*:3: Error: .*`vmovaps'.*
-.*:4: Error: .*`vmovaps'.*
-.*:5: Error: .*`vmovaps'.*
-.*:6: Error: .*`vmovaps'.*
-.*:7: Error: .*`rorx'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`rorx'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`xsave'.*
+.*:[0-9]+: Error: .*`xsaves'.*
+.*:[0-9]+: Error: .*`xsaves64'.*
+.*:[0-9]+: Error: .*`xsavec'.*
+.*:[0-9]+: Error: .*`xrstors'.*
+.*:[0-9]+: Error: .*`xrstors64'.*
+.*:[0-9]+: Error: .*`mov'.*
+.*:[0-9]+: Error: .*`movabs'.*
+.*:[0-9]+: Error: .*`cmps'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`movs'.*
+.*:[0-9]+: Error: .*`movs'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`jo'.*
+.*:[0-9]+: Error: .*`jno'.*
+.*:[0-9]+: Error: .*`jb'.*
+.*:[0-9]+: Error: .*`jae'.*
+.*:[0-9]+: Error: .*`je'.*
+.*:[0-9]+: Error: .*`jne'.*
+.*:[0-9]+: Error: .*`jbe'.*
+.*:[0-9]+: Error: .*`ja'.*
+.*:[0-9]+: Error: .*`js'.*
+.*:[0-9]+: Error: .*`jns'.*
+.*:[0-9]+: Error: .*`jp'.*
+.*:[0-9]+: Error: .*`jnp'.*
+.*:[0-9]+: Error: .*`jl'.*
+.*:[0-9]+: Error: .*`jge'.*
+.*:[0-9]+: Error: .*`jle'.*
+.*:[0-9]+: Error: .*`jg'.*
+.*:[0-9]+: Error: .*`jo'.*
+.*:[0-9]+: Error: .*`jno'.*
+.*:[0-9]+: Error: .*`jb'.*
+.*:[0-9]+: Error: .*`jae'.*
+.*:[0-9]+: Error: .*`je'.*
+.*:[0-9]+: Error: .*`jne'.*
+.*:[0-9]+: Error: .*`jbe'.*
+.*:[0-9]+: Error: .*`ja'.*
+.*:[0-9]+: Error: .*`js'.*
+.*:[0-9]+: Error: .*`jns'.*
+.*:[0-9]+: Error: .*`jp'.*
+.*:[0-9]+: Error: .*`jnp'.*
+.*:[0-9]+: Error: .*`jl'.*
+.*:[0-9]+: Error: .*`jge'.*
+.*:[0-9]+: Error: .*`jle'.*
+.*:[0-9]+: Error: .*`jg'.*
+.*:[0-9]+: Error: .*`in'.*
+.*:[0-9]+: Error: .*`in'.*
+.*:[0-9]+: Error: .*`out'.*
+.*:[0-9]+: Error: .*`out'.*
+.*:[0-9]+: Error: .*`jmp'.*
+.*:[0-9]+: Error: .*`loop'.*
+.*:[0-9]+: Error: .*`wrmsr'.*
+.*:[0-9]+: Error: .*`rdtsc'.*
+.*:[0-9]+: Error: .*`rdmsr'.*
+.*:[0-9]+: Error: .*`sysenter'.*
+.*:[0-9]+: Error: .*`sysexit'.*
+.*:[0-9]+: Error: .*`rdpmc'.*
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
index 3b923593a6a..54c17a9eab7 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
@@ -5,3 +5,77 @@  pseudos:
 	{rex} vmovaps %xmm7,%xmm2
 	{rex} vmovaps %xmm17,%xmm2
 	{rex} rorx $7,%eax,%ebx
+	{rex2} vmovaps %xmm7,%xmm2
+	{rex2} xsave (%rax)
+	{rex2} xsaves (%ecx)
+	{rex2} xsaves64 (%ecx)
+	{rex2} xsavec (%ecx)
+	{rex2} xrstors (%ecx)
+	{rex2} xrstors64 (%ecx)
+
+	#All opcodes in the row 0xA* (map0) prefixed REX2 are illegal.
+	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
+	{rex2} mov    0x90909090,%al
+	{rex2} movabs 0x1,%al
+	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
+	{rex2} lodsb
+	{rex2} lods   %ds:(%esi),%al
+	{rex2} lodsb   (%esi)
+	{rex2} movs
+	{rex2} movs   (%esi), (%edi)
+	{rex2} scasl
+	{rex2} scas   %es:(%edi),%eax
+	{rex2} scasb   (%edi)
+	{rex2} stosb
+	{rex2} stosb   (%edi)
+	{rex2} stos   %eax,%es:(%edi)
+
+	#All opcodes in the row 0x7* (map0) and 0x8* (map1) prefixed REX2 are illegal.
+	{rex2} jo     .+2-0x70
+	{rex2} jno    .+2-0x70
+	{rex2} jb     .+2-0x70
+	{rex2} jae    .+2-0x70
+	{rex2} je     .+2-0x70
+	{rex2} jne    .+2-0x70
+	{rex2} jbe    .+2-0x70
+	{rex2} ja     .+2-0x70
+	{rex2} js     .+2-0x70
+	{rex2} jns    .+2-0x70
+	{rex2} jp     .+2-0x70
+	{rex2} jnp    .+2-0x70
+	{rex2} jl     .+2-0x70
+	{rex2} jge    .+2-0x70
+	{rex2} jle    .+2-0x70
+	{rex2} jg     .+2-0x70
+	{rex2} jo     .+6+0x90909090
+	{rex2} jno    .+6+0x90909090
+	{rex2} jb     .+6+0x90909090
+	{rex2} jae    .+6+0x90909090
+	{rex2} je     .+6+0x90909090
+	{rex2} jne    .+6+0x90909090
+	{rex2} jbe    .+6+0x90909090
+	{rex2} ja     .+6+0x90909090
+	{rex2} js     .+6+0x90909090
+	{rex2} jns    .+6+0x90909090
+	{rex2} jp     .+6+0x90909090
+	{rex2} jnp    .+6+0x90909090
+	{rex2} jl     .+6+0x90909090
+	{rex2} jge    .+6+0x90909090
+	{rex2} jle    .+6+0x90909090
+	{rex2} jg     .+6+0x90909090
+
+	#All opcodes in the row 0xE* (map0) prefixed REX2 are illegal.
+	{rex2} in $0x90,%al
+	{rex2} in $0x90
+	{rex2} out $0x90,%al
+	{rex2} out $0x90
+	{rex2} jmp  *%eax
+	{rex2} loop foo
+
+	#All opcodes in the row 0x3* (map1) prefixed REX2 are illegal.
+	{rex2} wrmsr
+	{rex2} rdtsc
+	{rex2} rdmsr
+	{rex2} sysenter
+	{rex2} sysexitl
+	{rex2} rdpmc
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 0cc75ef2457..9c45851cc73 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -404,6 +404,18 @@  Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 d3 e0          	{rex2 0x0} shl %cl,%eax
+ +[a-f0-9]+:	d5 00 38 ca          	{rex2 0x0} cmp %cl,%dl
+ +[a-f0-9]+:	d5 00 b3 01          	{rex2 0x0} mov \$(0x)?1,%bl
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2 0x0} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2 0x1} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2 0x1} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2 0x80} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2 0x84} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2 0x80} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2 0x81} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2 0x80} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
@@ -458,6 +470,15 @@  Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2 0x0} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2 0x1} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2 0x1} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2 0x80} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2 0x84} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2 0x80} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2 0x81} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2 0x80} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index 08fac8381c6..a4582cfa6f1 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -360,6 +360,18 @@  _start:
 	{rex} movaps (%r8),%xmm2
 	{rex} phaddw (%rcx),%mm0
 	{rex} phaddw (%r8),%mm0
+	{rex2} mov %al,%ah
+	{rex2} shl %cl, %eax
+	{rex2} cmp %cl, %dl
+	{rex2} mov $1, %bl
+	{rex2} movl %eax,%ebx
+	{rex2} movl %eax,%r14d
+	{rex2} movl %eax,(%r8)
+	{rex2} movaps %xmm7,%xmm2
+	{rex2} movaps %xmm7,%xmm12
+	{rex2} movaps (%rcx),%xmm2
+	{rex2} movaps (%r8),%xmm2
+	{rex2 0x80} pmullw %mm0,%mm6
 
 	movb (%rbp),%al
 	{disp8} movb (%rbp),%al
@@ -422,6 +434,15 @@  _start:
 	{rex} movaps xmm2,XMMWORD PTR [r8]
 	{rex} phaddw mm0,QWORD PTR [rcx]
 	{rex} phaddw mm0,QWORD PTR [r8]
+	{rex2} mov ah,al
+	{rex2} mov ebx,eax
+	{rex2} mov r14d,eax
+	{rex2} mov DWORD PTR [r8],eax
+	{rex2} movaps xmm2,xmm7
+	{rex2} movaps xmm12,xmm7
+	{rex2} movaps xmm2,XMMWORD PTR [rcx]
+	{rex2} movaps xmm2,XMMWORD PTR [r8]
+	{rex2} pmullw mm6,mm0
 
 	mov al, BYTE PTR [rbp]
 	{disp8} mov al, BYTE PTR [rbp]
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index a7f5547017f..2be0df0e981 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -363,6 +363,8 @@  run_dump_test "x86-64-avx512f-rcigrne-intel"
 run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
+run_list_test "x86-64-apx-egpr-inval"
+run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/include/opcode/i386.h b/include/opcode/i386.h
index dec7652c1cc..a6af3d54da0 100644
--- a/include/opcode/i386.h
+++ b/include/opcode/i386.h
@@ -112,6 +112,8 @@ 
 /* x86-64 extension prefix.  */
 #define REX_OPCODE	0x40
 
+#define REX2_OPCODE	0xd5
+
 /* Non-zero if OPCODE is the rex prefix.  */
 #define REX_PREFIX_P(opcode) (((opcode) & 0xf0) == REX_OPCODE)
 
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index e78a2a9350e..a9fd17621da 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -144,6 +144,12 @@  struct instr_info
   /* Bits of REX we've already used.  */
   uint8_t rex_used;
 
+  /* Record W R4 X4 B4 bits for rex2.  */
+  unsigned char rex2;
+  /* Bits of REX2 we've already used.  */
+  unsigned char rex2_used;
+  unsigned char rex2_payload;
+
   bool need_modrm;
   unsigned char need_vex;
   bool has_sib;
@@ -169,6 +175,7 @@  struct instr_info
   signed char last_data_prefix;
   signed char last_addr_prefix;
   signed char last_rex_prefix;
+  signed char last_rex2_prefix;
   signed char last_seg_prefix;
   signed char fwait_prefix;
   /* The active segment register prefix.  */
@@ -265,8 +272,13 @@  struct dis_private {
   {							\
     if (value)						\
       {							\
-	if ((ins->rex & value))				\
+	if (ins->rex & value)				\
 	  ins->rex_used |= (value) | REX_OPCODE;	\
+	if (ins->rex2 & value)				\
+	  {						\
+	    ins->rex2_used |= value;			\
+	    ins->rex_used |= REX_OPCODE;		\
+	  }						\
       }							\
     else						\
       ins->rex_used |= REX_OPCODE;			\
@@ -276,6 +288,9 @@  struct dis_private {
 #define EVEX_b_used 1
 #define EVEX_len_used 2
 
+/* M0 in rex2 prefix represents map0 or map1.  */
+#define REX2_M 0x8
+
 /* Flags stored in PREFIXES.  */
 #define PREFIX_REPZ 1
 #define PREFIX_REPNZ 2
@@ -289,6 +304,7 @@  struct dis_private {
 #define PREFIX_DATA 0x200
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
+#define PREFIX_REX2 0x1000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -370,6 +386,7 @@  fetch_error (const instr_info *ins)
 #define PREFIX_IGNORED_DATA	(PREFIX_DATA << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_ADDR	(PREFIX_ADDR << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_LOCK	(PREFIX_LOCK << PREFIX_IGNORED_SHIFT)
+#define PREFIX_REX2_ILLEGAL	(PREFIX_REX2 << PREFIX_IGNORED_SHIFT)
 
 /* Opcode prefixes.  */
 #define PREFIX_OPCODE		(PREFIX_REPZ \
@@ -1888,23 +1905,23 @@  static const struct dis386 dis386[] = {
   { "outs{b|}",		{ indirDXr, Xb }, 0 },
   { X86_64_TABLE (X86_64_6F) },
   /* 70 */
-  { "joH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 78 */
-  { "jsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 80 */
   { REG_TABLE (REG_80) },
   { REG_TABLE (REG_81) },
@@ -1942,23 +1959,23 @@  static const struct dis386 dis386[] = {
   { "sahf",		{ XX }, 0 },
   { "lahf",		{ XX }, 0 },
   /* a0 */
-  { "mov%LB",		{ AL, Ob }, 0 },
-  { "mov%LS",		{ eAX, Ov }, 0 },
-  { "mov%LB",		{ Ob, AL }, 0 },
-  { "mov%LS",		{ Ov, eAX }, 0 },
-  { "movs{b|}",		{ Ybr, Xb }, 0 },
-  { "movs{R|}",		{ Yvr, Xv }, 0 },
-  { "cmps{b|}",		{ Xb, Yb }, 0 },
-  { "cmps{R|}",		{ Xv, Yv }, 0 },
+  { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
+  { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "movs{R|}",		{ Yvr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "cmps{b|}",		{ Xb, Yb }, PREFIX_REX2_ILLEGAL },
+  { "cmps{R|}",		{ Xv, Yv }, PREFIX_REX2_ILLEGAL },
   /* a8 */
-  { "testB",		{ AL, Ib }, 0 },
-  { "testS",		{ eAX, Iv }, 0 },
-  { "stosB",		{ Ybr, AL }, 0 },
-  { "stosS",		{ Yvr, eAX }, 0 },
-  { "lodsB",		{ ALr, Xb }, 0 },
-  { "lodsS",		{ eAXr, Xv }, 0 },
-  { "scasB",		{ AL, Yb }, 0 },
-  { "scasS",		{ eAX, Yv }, 0 },
+  { "testB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "testS",		{ eAX, Iv }, PREFIX_REX2_ILLEGAL },
+  { "stosB",		{ Ybr, AL }, PREFIX_REX2_ILLEGAL },
+  { "stosS",		{ Yvr, eAX }, PREFIX_REX2_ILLEGAL },
+  { "lodsB",		{ ALr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "lodsS",		{ eAXr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
+  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
   /* b0 */
   { "movB",		{ RMAL, Ib }, 0 },
   { "movB",		{ RMCL, Ib }, 0 },
@@ -2014,23 +2031,23 @@  static const struct dis386 dis386[] = {
   { FLOAT },
   { FLOAT },
   /* e0 */
-  { "loopneFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "loopeFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "loopFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "jEcxzH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "inB",		{ AL, Ib }, 0 },
-  { "inG",		{ zAX, Ib }, 0 },
-  { "outB",		{ Ib, AL }, 0 },
-  { "outG",		{ Ib, zAX }, 0 },
+  { "loopneFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "loopeFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "loopFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "jEcxzH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "inB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "inG",		{ zAX, Ib }, PREFIX_REX2_ILLEGAL },
+  { "outB",		{ Ib, AL }, PREFIX_REX2_ILLEGAL },
+  { "outG",		{ Ib, zAX }, PREFIX_REX2_ILLEGAL },
   /* e8 */
   { X86_64_TABLE (X86_64_E8) },
   { X86_64_TABLE (X86_64_E9) },
   { X86_64_TABLE (X86_64_EA) },
-  { "jmp",		{ Jb, BND }, 0 },
-  { "inB",		{ AL, indirDX }, 0 },
-  { "inG",		{ zAX, indirDX }, 0 },
-  { "outB",		{ indirDX, AL }, 0 },
-  { "outG",		{ indirDX, zAX }, 0 },
+  { "jmp",		{ Jb, BND }, PREFIX_REX2_ILLEGAL },
+  { "inB",		{ AL, indirDX }, PREFIX_REX2_ILLEGAL },
+  { "inG",		{ zAX, indirDX }, PREFIX_REX2_ILLEGAL },
+  { "outB",		{ indirDX, AL }, PREFIX_REX2_ILLEGAL },
+  { "outG",		{ indirDX, zAX }, PREFIX_REX2_ILLEGAL },
   /* f0 */
   { Bad_Opcode },	/* lock prefix */
   { "int1",		{ XX }, 0 },
@@ -2107,12 +2124,12 @@  static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F2E) },
   { PREFIX_TABLE (PREFIX_0F2F) },
   /* 30 */
-  { "wrmsr",		{ XX }, 0 },
-  { "rdtsc",		{ XX }, 0 },
-  { "rdmsr",		{ XX }, 0 },
-  { "rdpmc",		{ XX }, 0 },
-  { "sysenter",		{ SEP }, 0 },
-  { "sysexit%LQ",	{ SEP }, 0 },
+  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
+  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
   { Bad_Opcode },
   { "getsec",		{ XX }, 0 },
   /* 38 */
@@ -2197,23 +2214,23 @@  static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F7E) },
   { PREFIX_TABLE (PREFIX_0F7F) },
   /* 80 */
-  { "joH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jv, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 88 */
-  { "jsH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jv, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 90 */
   { "seto",		{ Eb }, 0 },
   { "setno",		{ Eb }, 0 },
@@ -2406,22 +2423,30 @@  static const char intel_index16[][6] = {
 
 static const char att_names64[][8] = {
   "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi",
-  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15"
+  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15",
+  "%r16", "%r17", "%r18", "%r19", "%r20", "%r21", "%r22", "%r23",
+  "%r24", "%r25", "%r26", "%r27", "%r28", "%r29", "%r30", "%r31",
 };
 static const char att_names32[][8] = {
   "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi",
-  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d"
+  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d",
+  "%r16d", "%r17d", "%r18d", "%r19d", "%r20d", "%r21d", "%r22d", "%r23d",
+  "%r24d", "%r25d", "%r26d", "%r27d", "%r28d", "%r29d", "%r30d", "%r31d",
 };
 static const char att_names16[][8] = {
   "%ax", "%cx", "%dx", "%bx", "%sp", "%bp", "%si", "%di",
-  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w"
+  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w",
+  "%r16w", "%r17w", "%r18w", "%r19w", "%r20w", "%r21w", "%r22w", "%r23w",
+  "%r24w", "%r25w", "%r26w", "%r27w", "%r28w", "%r29w", "%r30w", "%r31w",
 };
 static const char att_names8[][8] = {
   "%al", "%cl", "%dl", "%bl", "%ah", "%ch", "%dh", "%bh",
 };
 static const char att_names8rex[][8] = {
   "%al", "%cl", "%dl", "%bl", "%spl", "%bpl", "%sil", "%dil",
-  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b"
+  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b",
+  "%r16b", "%r17b", "%r18b", "%r19b", "%r20b", "%r21b", "%r22b", "%r23b",
+  "%r24b", "%r25b", "%r26b", "%r27b", "%r28b", "%r29b", "%r30b", "%r31b",
 };
 static const char att_names_seg[][4] = {
   "%es", "%cs", "%ss", "%ds", "%fs", "%gs", "%?", "%?",
@@ -2810,9 +2835,9 @@  static const struct dis386 reg_table[][8] = {
     { Bad_Opcode },
     { "cmpxchg8b", { { CMPXCHG8B_Fixup, q_mode } }, 0 },
     { Bad_Opcode },
-    { "xrstors", { FXSAVE }, 0 },
-    { "xsavec", { FXSAVE }, 0 },
-    { "xsaves", { FXSAVE }, 0 },
+    { "xrstors", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsavec", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsaves", { FXSAVE }, PREFIX_REX2_ILLEGAL },
     { MOD_TABLE (MOD_0FC7_REG_6) },
     { MOD_TABLE (MOD_0FC7_REG_7) },
   },
@@ -3384,7 +3409,7 @@  static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_4_MOD_0 */
   {
-    { "xsave",	{ FXSAVE }, 0 },
+    { "xsave",	{ FXSAVE }, PREFIX_REX2_ILLEGAL },
     { "ptwrite{%LQ|}", { Edq }, 0 },
   },
 
@@ -3402,7 +3427,7 @@  static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_6_MOD_0 */
   {
-    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE },
+    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { "clrssbsy",	{ Mq }, PREFIX_OPCODE },
     { "clwb",	{ Mb }, PREFIX_OPCODE },
   },
@@ -4196,19 +4221,19 @@  static const struct dis386 x86_64_table[][2] = {
 
   /* X86_64_E8 */
   {
-    { "callP",		{ Jv, BND }, 0 },
-    { "call@",		{ Jv, BND }, 0 }
+    { "callP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
+    { "call@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
   },
 
   /* X86_64_E9 */
   {
-    { "jmpP",		{ Jv, BND }, 0 },
-    { "jmp@",		{ Jv, BND }, 0 }
+    { "jmpP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
+    { "jmp@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
   },
 
   /* X86_64_EA */
   {
-    { "{l|}jmp{P|}", { Ap }, 0 },
+    { "{l|}jmp{P|}", { Ap }, PREFIX_REX2_ILLEGAL },
   },
 
   /* X86_64_0F00_REG_6 */
@@ -8184,7 +8209,7 @@  static const struct dis386 mod_table[][2] = {
   },
   {
     /* MOD_0FAE_REG_5 */
-    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE },
+    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { PREFIX_TABLE (PREFIX_0FAE_REG_5_MOD_3) },
   },
   {
@@ -8387,6 +8412,24 @@  ckprefix (instr_info *ins)
 	    return ckp_okay;
 	  ins->last_rex_prefix = i;
 	  break;
+	/* REX2 must be the last prefix. */
+	case REX2_OPCODE:
+	  if (ins->address_mode == mode_64bit)
+	    {
+	      if (ins->last_rex_prefix >= 0)
+		return ckp_bogus;
+
+	      ins->codep++;
+	      if (!fetch_code (ins->info, ins->codep + 1))
+		return ckp_fetch_error;
+	      ins->rex2_payload = *ins->codep;
+	      ins->rex2 = ins->rex2_payload >> 4;
+	      ins->rex = (ins->rex2_payload & 0xf) | REX_OPCODE;
+	      ins->codep++;
+	      ins->last_rex2_prefix = i;
+	      ins->all_prefixes[i] = REX2_OPCODE;
+	    }
+	  return ckp_okay;
 	case 0xf3:
 	  ins->prefixes |= PREFIX_REPZ;
 	  ins->last_repz_prefix = i;
@@ -8554,6 +8597,8 @@  prefix_name (enum address_mode mode, uint8_t pref, int sizeflag)
       return "bnd";
     case NOTRACK_PREFIX:
       return "notrack";
+    case REX2_OPCODE:
+      return "rex2";
     default:
       return NULL;
     }
@@ -9202,6 +9247,7 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     .last_data_prefix = -1,
     .last_addr_prefix = -1,
     .last_rex_prefix = -1,
+    .last_rex2_prefix = -1,
     .last_seg_prefix = -1,
     .fwait_prefix = -1,
   };
@@ -9367,13 +9413,18 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  if (*ins.codep == 0x0f)
+  /* REX2.M in rex2 prefix represents map0 or map1.  */
+  if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
     {
       unsigned char threebyte;
 
-      ins.codep++;
-      if (!fetch_code (info, ins.codep + 1))
-	goto fetch_error_out;
+      if (!ins.rex2)
+	{
+	  ins.codep++;
+	  if (!fetch_code (info, ins.codep + 1))
+	    goto fetch_error_out;
+	}
+
       threebyte = *ins.codep;
       dp = &dis386_twobyte[threebyte];
       ins.need_modrm = twobyte_has_modrm[threebyte];
@@ -9529,7 +9580,15 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  switch (dp->prefix_requirement)
+  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
+      && ins.last_rex2_prefix >= 0)
+    {
+      i386_dis_printf (info, dis_style_text, "(bad)");
+      ret = ins.end_codep - priv.the_buffer;
+      goto out;
+    }
+
+  switch (dp->prefix_requirement & ~PREFIX_REX2_ILLEGAL)
     {
     case PREFIX_DATA:
       /* If only the data prefix is marked as mandatory, its absence renders
@@ -9588,6 +9647,13 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       && !ins.need_vex && ins.last_rex_prefix >= 0)
     ins.all_prefixes[ins.last_rex_prefix] = 0;
 
+  /* Check if the REX2 prefix is used.  */
+  if (ins.last_rex2_prefix >= 0
+      && ((ins.rex2 & 0x7) ^ (ins.rex2_used & 0x7)) == 0
+      && (ins.rex ^ ins.rex_used) == 0
+      && (ins.rex2 & 0x7))
+    ins.all_prefixes[ins.last_rex2_prefix] = 0;
+
   /* Check if the SEG prefix is used.  */
   if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
 		       | PREFIX_FS | PREFIX_GS)) != 0
@@ -9616,7 +9682,11 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
+	if (ins.all_prefixes[i] == REX2_OPCODE)
+	  i386_dis_printf (info, dis_style_mnemonic, "{%s 0x%x} ", name,
+			   (unsigned int) ins.rex2_payload);
+	else
+	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
@@ -11163,6 +11233,8 @@  print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
   USED_REX (rexmask);
   if (ins->rex & rexmask)
     reg += 8;
+  if (ins->rex2 & rexmask)
+    reg += 16;
 
   switch (bytemode)
     {
@@ -11170,7 +11242,7 @@  print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case b_swap_mode:
       if (reg & 4)
 	USED_REX (0);
-      if (ins->rex)
+      if (ins->rex || ins->rex2)
 	names = att_names8rex;
       else
 	names = att_names8;
@@ -11386,6 +11458,8 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
   int riprel = 0;
   int shift;
 
+  add += (ins->rex2 & REX_B) ? 16 : 0;
+
   if (ins->vex.evex)
     {
 
@@ -11559,6 +11633,9 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		}
 	      break;
 	    default:
+	      if (ins->rex2 & REX_X)
+		vindex += 16;
+
 	      if (vindex != 4)
 		indexes = ins->address_mode == mode_64bit && !addr32flag
 			  ? att_names64 : att_names32;
@@ -11946,7 +12023,7 @@  static bool
 OP_REG (instr_info *ins, int code, int sizeflag)
 {
   const char *s;
-  int add;
+  int add = 0;
 
   switch (code)
     {
@@ -11959,8 +12036,8 @@  OP_REG (instr_info *ins, int code, int sizeflag)
   USED_REX (REX_B);
   if (ins->rex & REX_B)
     add = 8;
-  else
-    add = 0;
+  if (ins->rex2 & REX_B)
+    add += 16;
 
   switch (code)
     {
@@ -12674,6 +12751,8 @@  OP_EX (instr_info *ins, int bytemode, int sizeflag)
   USED_REX (REX_B);
   if (ins->rex & REX_B)
     reg += 8;
+  if (ins->rex2 & REX_B)
+    reg += 16;
   if (ins->vex.evex)
     {
       USED_REX (REX_X);
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index c0223d16964..fb5a4f81288 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -275,6 +275,8 @@  static const dependency isa_dependencies[] =
     "64" },
   { "USER_MSR",
     "64" },
+  { "APX_F",
+    "XSAVE|64" },
 };
 
 /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
@@ -397,6 +399,7 @@  static bitfield cpu_flags[] =
   BITFIELD (FRED),
   BITFIELD (LKGS),
   BITFIELD (USER_MSR),
+  BITFIELD (APX_F),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -484,6 +487,7 @@  static bitfield opcode_modifiers[] =
   BITFIELD (Optimize),
   BITFIELD (Dialect),
   BITFIELD (ISA64),
+  BITFIELD (NoEgpr),
 };
 
 #define CLASS(n) #n, n
@@ -1070,10 +1074,44 @@  get_element_size (char **opnd, int lineno)
   return elem_size;
 }
 
+static bool
+rex2_disallowed (const unsigned long long opcode, unsigned int length,
+		 unsigned int space, const char *cpu_flags)
+{
+  /* Some opcodes encode a ModR/M-like byte directly in the opcode.  */
+  unsigned int base_opcode = opcode >> (8 * length - 8);
+
+  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
+     are reserved under REX2 and triggers #UD when prefixed with REX2 */
+  if (space == 0)
+    switch (base_opcode >> 4)
+      {
+      case 0x4:
+      case 0x7:
+      case 0xA:
+      case 0xE:
+	return true;
+      default:
+	return false;
+    }
+
+  if (space == SPACE_0F)
+    switch (base_opcode >> 4)
+      {
+      case 0x3:
+      case 0x8:
+	return true;
+      default:
+	return false;
+      }
+
+  return false;
+}
+
 static void
 process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 			      unsigned int prefix, const char *extension_opcode,
-			      char **opnd, int lineno)
+			      char **opnd, int lineno, bool rex2_disallowed)
 {
   char *str, *next, *last;
   bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
@@ -1200,6 +1238,12 @@  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 	  || modifiers[SAE].value))
     modifiers[EVex].value = EVEXDYN;
 
+  /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
+     For templates supporting both Vex and EVex allowing EGPR.  */
+  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
+      && !modifiers[EVex].value)
+    modifiers[NoEgpr].value = 1;
+
   output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
 }
 
@@ -1424,7 +1468,9 @@  output_i386_opcode (FILE *table, const char *name, char *str,
   free (ident);
 
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
-				extension_opcode, operand_types, lineno);
+				extension_opcode, operand_types, lineno,
+				rex2_disallowed (opcode, length, space,
+						 cpu_flags));
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index b76f9ecc2a6..8b4a48c8c85 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -319,6 +319,8 @@  enum i386_cpu
   CpuAVX512F,
   /* Intel AVX-512 VL Instructions support required.  */
   CpuAVX512VL,
+  /* Intel APX_F Instructions support required.  */
+  CpuAPX_F,
   /* Not supported in the 64bit mode  */
   CpuNo64,
 
@@ -354,6 +356,7 @@  enum i386_cpu
 		   cpuhle:1, \
 		   cpuavx512f:1, \
 		   cpuavx512vl:1, \
+		   cpuapx_f:1, \
       /* NOTE: This field needs to remain last. */ \
 		   cpuno64:1
 
@@ -745,6 +748,11 @@  enum
 #define INTEL64		2
 #define INTEL64ONLY	3
   ISA64,
+
+  /* egprs (r16-r31) on instruction illegal. We also use it to judge
+     whether the instruction supports pseudo-prefix {rex2}.  */
+  NoEgpr,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -790,6 +798,7 @@  typedef struct i386_opcode_modifier
   unsigned int optimize:1;
   unsigned int dialect:2;
   unsigned int isa64:2;
+  unsigned int noegpr:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -1004,7 +1013,8 @@  typedef struct insn_template
 #define Prefix_VEX3		6	/* {vex3} */
 #define Prefix_EVEX		7	/* {evex} */
 #define Prefix_REX		8	/* {rex} */
-#define Prefix_NoOptimize	9	/* {nooptimize} */
+#define Prefix_REX2		9	/* {rex2} */
+#define Prefix_NoOptimize	10	/* {nooptimize} */
 
   /* the bits in opcode_modifier are used to generate the final opcode from
      the base_opcode.  These bits also are used to detect alternate forms of
@@ -1031,6 +1041,7 @@  typedef struct
 #define RegRex	    0x1  /* Extended register.  */
 #define RegRex64    0x2  /* Extended 8 bit register.  */
 #define RegVRex	    0x4  /* Extended vector register.  */
+#define RegRex2	    0x8  /* Extended GPRs R16–R31 register.  */
   unsigned char reg_num;
 #define RegIP	((unsigned char ) ~0)
 /* EIZ and RIZ are fake index registers.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index fffe05d4399..8c8f6695d22 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -895,7 +895,7 @@  rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
 <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:No64, disp32:Disp32:i386, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:APX_F, nooptimize:NoOptimize:0>
 
 {<pseudopfx>}, PSEUDO_PREFIX/Prefix_<pseudopfx:ident>, <pseudopfx:cpu>, NoSuf|IsPrefix, {}
 
@@ -1428,16 +1428,17 @@  crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Uns
 
 // xsave/xrstor New Instructions.
 
-xsave, 0xfae/4, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xsave64, 0xfae/4, Xsave&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
-xrstor, 0xfae/5, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xrstor64, 0xfae/5, Xsave&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xsave, 0xfae/4, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xsave64, 0xfae/4, Xsave&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
+xrstor, 0xfae/5, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xrstor64, 0xfae/5, Xsave&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 xgetbv, 0xf01d0, Xsave, NoSuf, {}
 xsetbv, 0xf01d1, Xsave, NoSuf, {}
 
 // xsaveopt
-xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+
+xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // AES instructions.
 
@@ -2480,17 +2481,17 @@  clflushopt, 0x660fae/7, ClflushOpt, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex
 
 // XSAVES/XRSTORS instructions.
 
-xrstors, 0xfc7/3, XSAVES, Modrm|NoSuf, { Unspecified|BaseIndex }
-xrstors64, 0xfc7/3, XSAVES&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
-xsaves, 0xfc7/5, XSAVES, Modrm|NoSuf, { Unspecified|BaseIndex }
-xsaves64, 0xfc7/5, XSAVES&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xrstors, 0xfc7/3, XSAVES, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xrstors64, 0xfc7/3, XSAVES&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
+xsaves, 0xfc7/5, XSAVES, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xsaves64, 0xfc7/5, XSAVES&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // XSAVES instructions end.
 
 // XSAVEC instructions.
 
-xsavec, 0xfc7/4, XSAVEC, Modrm|NoSuf, { Unspecified|BaseIndex }
-xsavec64, 0xfc7/4, XSAVEC&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xsavec, 0xfc7/4, XSAVEC, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xsavec64, 0xfc7/4, XSAVEC&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // XSAVEC instructions end.
 
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index 2ac56e3fd0b..8fead35e320 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -43,6 +43,22 @@  r12b, Class=Reg|Byte, RegRex|RegRex64, 4, Dw2Inval, Dw2Inval
 r13b, Class=Reg|Byte, RegRex|RegRex64, 5, Dw2Inval, Dw2Inval
 r14b, Class=Reg|Byte, RegRex|RegRex64, 6, Dw2Inval, Dw2Inval
 r15b, Class=Reg|Byte, RegRex|RegRex64, 7, Dw2Inval, Dw2Inval
+r16b, Class=Reg|Byte, RegRex2|RegRex64, 0, Dw2Inval, Dw2Inval
+r17b, Class=Reg|Byte, RegRex2|RegRex64, 1, Dw2Inval, Dw2Inval
+r18b, Class=Reg|Byte, RegRex2|RegRex64, 2, Dw2Inval, Dw2Inval
+r19b, Class=Reg|Byte, RegRex2|RegRex64, 3, Dw2Inval, Dw2Inval
+r20b, Class=Reg|Byte, RegRex2|RegRex64, 4, Dw2Inval, Dw2Inval
+r21b, Class=Reg|Byte, RegRex2|RegRex64, 5, Dw2Inval, Dw2Inval
+r22b, Class=Reg|Byte, RegRex2|RegRex64, 6, Dw2Inval, Dw2Inval
+r23b, Class=Reg|Byte, RegRex2|RegRex64, 7, Dw2Inval, Dw2Inval
+r24b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 0, Dw2Inval, Dw2Inval
+r25b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 1, Dw2Inval, Dw2Inval
+r26b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 2, Dw2Inval, Dw2Inval
+r27b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 3, Dw2Inval, Dw2Inval
+r28b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 4, Dw2Inval, Dw2Inval
+r29b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 5, Dw2Inval, Dw2Inval
+r30b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 6, Dw2Inval, Dw2Inval
+r31b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 7, Dw2Inval, Dw2Inval
 // 16 bit regs
 ax, Class=Reg|Instance=Accum|Word, 0, 0, Dw2Inval, Dw2Inval
 cx, Class=Reg|Word, 0, 1, Dw2Inval, Dw2Inval
@@ -60,6 +76,22 @@  r12w, Class=Reg|Word, RegRex, 4, Dw2Inval, Dw2Inval
 r13w, Class=Reg|Word, RegRex, 5, Dw2Inval, Dw2Inval
 r14w, Class=Reg|Word, RegRex, 6, Dw2Inval, Dw2Inval
 r15w, Class=Reg|Word, RegRex, 7, Dw2Inval, Dw2Inval
+r16w, Class=Reg|Word, RegRex2, 0, Dw2Inval, Dw2Inval
+r17w, Class=Reg|Word, RegRex2, 1, Dw2Inval, Dw2Inval
+r18w, Class=Reg|Word, RegRex2, 2, Dw2Inval, Dw2Inval
+r19w, Class=Reg|Word, RegRex2, 3, Dw2Inval, Dw2Inval
+r20w, Class=Reg|Word, RegRex2, 4, Dw2Inval, Dw2Inval
+r21w, Class=Reg|Word, RegRex2, 5, Dw2Inval, Dw2Inval
+r22w, Class=Reg|Word, RegRex2, 6, Dw2Inval, Dw2Inval
+r23w, Class=Reg|Word, RegRex2, 7, Dw2Inval, Dw2Inval
+r24w, Class=Reg|Word, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25w, Class=Reg|Word, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26w, Class=Reg|Word, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27w, Class=Reg|Word, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28w, Class=Reg|Word, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29w, Class=Reg|Word, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30w, Class=Reg|Word, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31w, Class=Reg|Word, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 // 32 bit regs
 eax, Class=Reg|Instance=Accum|Dword|BaseIndex, 0, 0, 0, Dw2Inval
 ecx, Class=Reg|Instance=RegC|Dword|BaseIndex, 0, 1, 1, Dw2Inval
@@ -77,6 +109,22 @@  r12d, Class=Reg|Dword|BaseIndex, RegRex, 4, Dw2Inval, Dw2Inval
 r13d, Class=Reg|Dword|BaseIndex, RegRex, 5, Dw2Inval, Dw2Inval
 r14d, Class=Reg|Dword|BaseIndex, RegRex, 6, Dw2Inval, Dw2Inval
 r15d, Class=Reg|Dword|BaseIndex, RegRex, 7, Dw2Inval, Dw2Inval
+r16d, Class=Reg|Dword|BaseIndex, RegRex2, 0, Dw2Inval, Dw2Inval
+r17d, Class=Reg|Dword|BaseIndex, RegRex2, 1, Dw2Inval, Dw2Inval
+r18d, Class=Reg|Dword|BaseIndex, RegRex2, 2, Dw2Inval, Dw2Inval
+r19d, Class=Reg|Dword|BaseIndex, RegRex2, 3, Dw2Inval, Dw2Inval
+r20d, Class=Reg|Dword|BaseIndex, RegRex2, 4, Dw2Inval, Dw2Inval
+r21d, Class=Reg|Dword|BaseIndex, RegRex2, 5, Dw2Inval, Dw2Inval
+r22d, Class=Reg|Dword|BaseIndex, RegRex2, 6, Dw2Inval, Dw2Inval
+r23d, Class=Reg|Dword|BaseIndex, RegRex2, 7, Dw2Inval, Dw2Inval
+r24d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 rax, Class=Reg|Instance=Accum|Qword|BaseIndex, 0, 0, Dw2Inval, 0
 rcx, Class=Reg|Instance=RegC|Qword|BaseIndex, 0, 1, Dw2Inval, 2
 rdx, Class=Reg|Instance=RegD|Qword|BaseIndex, 0, 2, Dw2Inval, 1
@@ -93,6 +141,22 @@  r12, Class=Reg|Qword|BaseIndex, RegRex, 4, Dw2Inval, 12
 r13, Class=Reg|Qword|BaseIndex, RegRex, 5, Dw2Inval, 13
 r14, Class=Reg|Qword|BaseIndex, RegRex, 6, Dw2Inval, 14
 r15, Class=Reg|Qword|BaseIndex, RegRex, 7, Dw2Inval, 15
+r16, Class=Reg|Qword|BaseIndex, RegRex2, 0, Dw2Inval, 130
+r17, Class=Reg|Qword|BaseIndex, RegRex2, 1, Dw2Inval, 131
+r18, Class=Reg|Qword|BaseIndex, RegRex2, 2, Dw2Inval, 132
+r19, Class=Reg|Qword|BaseIndex, RegRex2, 3, Dw2Inval, 133
+r20, Class=Reg|Qword|BaseIndex, RegRex2, 4, Dw2Inval, 134
+r21, Class=Reg|Qword|BaseIndex, RegRex2, 5, Dw2Inval, 135
+r22, Class=Reg|Qword|BaseIndex, RegRex2, 6, Dw2Inval, 136
+r23, Class=Reg|Qword|BaseIndex, RegRex2, 7, Dw2Inval, 137
+r24, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, 138
+r25, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, 139
+r26, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, 140
+r27, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, 141
+r28, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, 142
+r29, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, 143
+r30, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, 144
+r31, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, 145
 // Vector mask registers.
 k0, Class=RegMask, 0, 0, 93, 118
 k1, Class=RegMask, 0, 1, 94, 119