[v4,3/9] Support APX GPR32 with extend evex prefix

Message ID 20231219121218.974012-4-lili.cui@intel.com
State New
Headers
Series Support Intel APX EGPR |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply

Commit Message

Cui, Lili Dec. 19, 2023, 12:12 p.m. UTC
  This patch adds non-ND, non-NF forms of EVEX promotion insn.

EVEX extension of legacy instructions:
  All promoted legacy instructions are placed in EVEX map 4, which is
  currently reserved.
EVEX extension of EVEX instructions:
  All existing EVEX instructions are extended by APX using the extended
  EVEX prefix, so that they can access all 32 GPRs.
EVEX extension of VEX instructions:
  Promoting a VEX instruction into the EVEX space does not change the map
  id, the opcode, or the operand encoding of the VEX instruction.

Note: The promoted versions of MOVBE will be extended to include the “MOVBE
  reg1, reg2”.

  gas/ChangeLog:

  2023-12-19  Lingling Kong <lingling.kong@intel.com>
	      H.J. Lu  <hongjiu.lu@intel.com>
	      Lili Cui <lili.cui@intel.com>
	      Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c (APX_F): New define.
	(install_template): Handled APX combines.
	(is_apx_evex_encoding): Test apx evex encoding.
	(build_apx_evex_prefix): Enabe APX evex prefix.
	(md_assemble): Handle apx with evex encoding.
	(process_suffix): Handle apx map4 prefix.
	(check_register): Assign i.vec_encoding for APX evex instructions.
	* testsuite/gas/i386/x86-64-evex.d: Adjust test cases.
	* testsuite/gas/i386/x86-64.exp: Adjust x86-64-inval-movbe.

opcodes/ChangeLog:

	* i386-dis-evex-len.h: Handle EVEX_LEN_0F38F2, EVEX_LEN_0F38F3.
	* i386-dis-evex-prefix.h: Handle PREFIX_EVEX_0F38F2_L_0,
	PREFIX_EVEX_0F38F3_L_0, PREFIX_EVEX_MAP4_D8,
	PREFIX_EVEX_MAP4_DA, PREFIX_EVEX_MAP4_DB,
	PREFIX_EVEX_MAP4_DC, PREFIX_EVEX_MAP4_DD,
	PREFIX_EVEX_MAP4_DE, PREFIX_EVEX_MAP4_DF,
	PREFIX_EVEX_MAP4_F0, PREFIX_EVEX_MAP4_F1,
	PREFIX_EVEX_MAP4_F2, PREFIX_EVEX_MAP4_F8.
	* i386-dis-evex-reg.h: Handle REG_EVEX_0F38F3_L_0_P_0.
	* i386-dis-evex.h: Add EVEX_MAP4_ for legacy insn
	promote to apx to use gpr32
	* opcodes/i386-dis-evex-x86-64.h: Handle Add X86_64_EVEX_0F90,
	X86_64_EVEX_0F92, X86_64_EVEX_0F93, X86_64_EVEX_0F38F2,
	X86_64_EVEX_0F38F3, X86_64_EVEX_0F38F5, X86_64_EVEX_0F38F6,
	X86_64_EVEX_0F38F7, X86_64_EVEX_0F3AF0, X86_64_EVEX_0F91.
	* i386-dis.c
	(struct instr_info): Deleted bool r.
	(PREFIX_NP_OR_DATA): New.
	(NO_PREFIX): New.
	(putop): Ditto.
	(X86_64_EVEX_FROM_VEX_TABLE): Diito.
	(get_valid_dis386): Decode insn erex in extend evex prefix.
	Handle EVEX_MAP4
	(print_insn): Handle PREFIX_DATA_AND_NP_ONLY.
	(print_register): Handle apx instructions decode.
	(OP_E_memory): Diito.
	(OP_G): Diito.
	(OP_XMM): Diito.
	(DistinctDest_Fixup): Diito.
	* i386-gen.c (process_i386_opcode_modifier): Add EVEXMAP4.
	* i386-opc.h (SPACE_EVEXMAP4): Add legacy insn
	promote to evex.
	* i386-opc.tbl: Handle some legacy and vex insns don't
	support gpr32. And add some legacy insn (map2 / 3) promote
	to evex.
---
 gas/config/tc-i386.c                 |  65 ++++++++++-
 gas/testsuite/gas/i386/x86-64-evex.d |   2 +-
 gas/testsuite/gas/i386/x86-64.exp    |   2 +-
 opcodes/i386-dis-evex-len.h          |  10 ++
 opcodes/i386-dis-evex-prefix.h       |  66 +++++++++++
 opcodes/i386-dis-evex-reg.h          |   7 ++
 opcodes/i386-dis-evex-x86-64.h       |  50 ++++++++
 opcodes/i386-dis-evex.h              |  94 +++++++--------
 opcodes/i386-dis.c                   | 165 +++++++++++++++++++++++----
 opcodes/i386-gen.c                   |   2 +
 opcodes/i386-opc.h                   |   6 +
 opcodes/i386-opc.tbl                 | 109 ++++++++++++------
 12 files changed, 466 insertions(+), 112 deletions(-)
 create mode 100644 opcodes/i386-dis-evex-x86-64.h
  

Comments

Jan Beulich Dec. 22, 2023, 1:49 p.m. UTC | #1
On 19.12.2023 13:12, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -89,6 +89,7 @@
>  /* This matches the C -> StaticRounding alias in the opcode table.  */
>  #define commutative staticrounding
>  
> +#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t, cpuid))

Why is this still here? I said more than once that it's not helpful to
have. As can be seen ...

> @@ -3673,7 +3674,7 @@ install_template (const insn_template *t)
>  
>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> -  {
> +    {
>        if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>  	   || maybe_cpu (t, CpuFMA))
>  	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> @@ -3695,7 +3696,15 @@ install_template (const insn_template *t)
>  		gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
>  	    }
>  	}
> -  }
> +
> +      if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
> +	  || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
> +	  || APX_F(CpuBMI2))

... right here: There's no point in checking CpuAPX_F a whopping 7 times.

> +	if (need_evex_encoding ())
> +	  i.tm.opcode_modifier.vex = 0;
> +	else
> +	  i.tm.opcode_modifier.evex = 0;
> +    }

I'm also pretty sure that I asked before that such nested if/else please
have proper braces for the body of the outer if().

To say it very clearly again: When you submit a new version, _all_ prior
review comments should be addressed. Whether that's verbally (by
explaining why a change cannot be made) or by adjusting the code is
another matter. I said before that reviewing this work has proven
extremely time consuming. I shouldn't be required to needlessly put in
yet more time, just to re-spot and re-comment things already pointed
out.

> @@ -3876,6 +3885,15 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || t->opcode_modifier.evex;
>  }
>  
> +/* We can use this function only when the current encoding is evex.  */
> +static INLINE bool
> +is_apx_evex_encoding (void)
> +{
> +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> +    || (i.vex.register_specifier
> +	&& i.vex.register_specifier->reg_flags & RegRex2);

Nit: Parentheses please around the & expression.

> @@ -8097,7 +8142,11 @@ process_suffix (void)
>  	  if (i.tm.opcode_modifier.jump == JUMP_BYTE) /* jcxz, loop */
>  	    prefix = ADDR_PREFIX_OPCODE;
>  
> -	  if (!add_prefix (prefix))
> +	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
> +	     needs to be adjusted.  */
> +	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
> +	    i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;

Feels like I did ask before: What if i.tm.opcode_modifier.opcodeprefix
is already set? Aiui that would be a bug, but one that's likely easy
to introduce and hard to find. IOW - better assert the field is clear
before filling it?

> @@ -14293,6 +14342,12 @@ static bool check_register (const reg_entry *r)
>        if (!cpu_arch_flags.bitfield.cpuapx_f
>  	  || flag_code != CODE_64BIT)
>  	return false;
> +
> +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
> +	 For the later install_template function.  */
> +      if (current_templates.start->opcode_modifier.vex
> +	  && current_templates.start->opcode_modifier.evex)
> +	i.vec_encoding = vex_encoding_evex;
>      }

Just to state it again - no use of current_templates in this funciton,
please.

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -106,16 +106,6 @@
>  #define HLEPrefixRelease PrefixOk=PrefixHLERelease
>  #define NoTrackPrefixOk  PrefixOk=PrefixNoTrack
>  
> -#define Space0F    OpcodeSpace=SPACE_0F
> -#define Space0F38  OpcodeSpace=SPACE_0F38
> -#define Space0F3A  OpcodeSpace=SPACE_0F3A
> -#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
> -#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
> -#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
> -
> -#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
> -#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
> -

Why are you moving these, leaving ...

>  #define VexMap7 OpcodeSpace=SPACE_VEXMAP7

... this one disconnected? IOW - I see no need for the moving, but
if there is a need, then this one also needs moving (see how it'll
become relevant for USER_MSR+APX_F now). Specifically ...

> @@ -137,11 +127,25 @@
>  #define EVexLIG EVex=EVEXLIG
>  #define EVexDYN EVex=EVEXDYN
>  
> +#define Space0F    OpcodeSpace=SPACE_0F
> +#define Space0F38  OpcodeSpace=SPACE_0F38
> +#define Space0F3A  OpcodeSpace=SPACE_0F3A
> +#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
> +#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
> +#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
> +
> +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex128

... there's no need for this to live after EVex128 was #define-s.

> +#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
> +#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
> +
>  #define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL
>  
>  #define Vsz256 Vsz=VSZ256
>  #define Vsz512 Vsz=VSZ512
>  
> +// The template supports VEX format for cpuid and EVEX format for cpuid & apx_f.
> +#define APX_F(cpuid) cpuid&(cpuid|APX_F)

I think the comment wants to go into further detail. Please can you
go back to read what I said when I suggested this construct, in
particular regarding the stripping then done? However, with you not
having found a need to fiddle with cpu_flags_match(), I wonder if
this construct is needed in the first place. The earlier suggestion
was entirely based on the assumption that stripping similar to that
for other combined VEX/EVEX templates would be needed here, too.

> @@ -2049,13 +2059,20 @@ bndldx, 0x0f1a, MPX, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
>  
>  // SHA instructions.
>  sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
> +sha1rnds4, 0xd4, SHA&APX_F, Modrm|NoSuf|EVexMap4, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
>  sha1nexte, 0xf38c8, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> +sha1nexte, 0xd8, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  sha1msg1, 0xf38c9, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> +sha1msg1, 0xd9, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  sha1msg2, 0xf38ca, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> +sha1msg2, 0xda, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
>  sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }

What about this form? It surely also wants to gain an EVEX counterpart.

Jan
  
Jan Beulich Dec. 22, 2023, 2:19 p.m. UTC | #2
On 19.12.2023 13:12, Cui, Lili wrote:
> --- a/opcodes/i386-dis-evex-prefix.h
> +++ b/opcodes/i386-dis-evex-prefix.h
> @@ -285,6 +285,14 @@
>      { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
>      { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
>    },
> +  /* PREFIX_EVEX_0F38F2_L_0 */
> +  {
> +    { "andnS",	{ Gdq, VexGdq, Edq }, 0 },
> +  },

So not being able to re-use the VEX entry for this and ...

> --- a/opcodes/i386-dis-evex-reg.h
> +++ b/opcodes/i386-dis-evex-reg.h
> @@ -49,3 +49,10 @@
>      { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
>      { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
>    },
> +  /* REG_EVEX_0F38F3_L_0_P_0 */
> +  {
> +    { Bad_Opcode },
> +    { "blsrS",	{ VexGdq, Edq }, 0 },
> +    { "blsmskS",	{ VexGdq, Edq }, 0 },
> +    { "blsiS",	{ VexGdq, Edq }, 0 },
> +  },

... this was due to the VEX entries having PREFIX_OPCODE, which would be
getting in the way? This is the sort of thing that would be useful to
have in the description, to avoid raising the same question again that
(I think) was raised before.

Yet then - why do you strip PREFIX_OPCODE from the VEX entries? If you
do that (as iirc I did suggest), there's no need for having separate EVEX
ones (the suggestion, after all, was to be able to re-use the VEX
entries). That re-work of existing VEX encodings could, btw, also have
been split out quite easily. That way this huge patch would have further
shrunk a little.

> --- /dev/null
> +++ b/opcodes/i386-dis-evex-x86-64.h
> @@ -0,0 +1,50 @@
> +  /* X86_64_EVEX_0F90 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_W_TABLE (VEX_W_0F90_L_0) },
> +  },
> +  /* X86_64_EVEX_0F91 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_W_TABLE (VEX_W_0F91_L_0) },
> +  },
> +  /* X86_64_EVEX_0F92 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_W_TABLE (VEX_W_0F92_L_0) },
> +  },
> +  /* X86_64_EVEX_0F93 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_W_TABLE (VEX_W_0F93_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F2 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F3 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F5 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F6 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },
> +  },
> +  /* X86_64_EVEX_0F38F7 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },
> +  },
> +  /* X86_64_EVEX_0F3AF0 */
> +  {
> +    { Bad_Opcode },
> +    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },
> +  },

Am I misremembering that we had agreed that this new file isn't necessary,
by having USE_X86_64_EVEX_FROM_VEX_TABLE handle the non-64-bit case? At
least I couldn't find a mail from you saying this isn't possible (and why).

Jan
  
Cui, Lili Dec. 25, 2023, 12:23 p.m. UTC | #3
> On 19.12.2023 13:12, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -89,6 +89,7 @@
> >  /* This matches the C -> StaticRounding alias in the opcode table.
> > */  #define commutative staticrounding
> >
> > +#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t,
> > +cpuid))
> 
> Why is this still here? I said more than once that it's not helpful to have. As can
> be seen ...
> 
> > @@ -3673,7 +3674,7 @@ install_template (const insn_template *t)
> >
> >    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
> >    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> > -  {
> > +    {
> >        if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >  	   || maybe_cpu (t, CpuFMA))
> >  	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> @@
> > -3695,7 +3696,15 @@ install_template (const insn_template *t)
> >  		gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
> >  	    }
> >  	}
> > -  }
> > +
> > +      if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) ||
> APX_F(CpuAVX512F)
> > +	  || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) ||
> APX_F(CpuBMI)
> > +	  || APX_F(CpuBMI2))
> 
> ... right here: There's no point in checking CpuAPX_F a whopping 7 times.
> 
> > +	if (need_evex_encoding ())
> > +	  i.tm.opcode_modifier.vex = 0;
> > +	else
> > +	  i.tm.opcode_modifier.evex = 0;
> > +    }
> 
> I'm also pretty sure that I asked before that such nested if/else please have
> proper braces for the body of the outer if().
> 
Sorry, I missed that email.

Changed to:

      if ((maybe_cpu (t, CpuCMPCCXADD) || maybe_cpu (t, CpuAMX_TILE)
           || maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512DQ)
           || maybe_cpu (t, CpuAVX512BW) || maybe_cpu (t, CpuBMI)
           || maybe_cpu (t, CpuBMI2))
          && maybe_cpu (t, CpuAPX_F))
        {
          if (need_evex_encoding () || i.has_egpr)
            i.tm.opcode_modifier.vex = 0;
          else
            i.tm.opcode_modifier.evex = 0;
        }

> To say it very clearly again: When you submit a new version, _all_ prior review
> comments should be addressed. Whether that's verbally (by explaining why a
> change cannot be made) or by adjusting the code is another matter. I said
> before that reviewing this work has proven extremely time consuming. I
> shouldn't be required to needlessly put in yet more time, just to re-spot and
> re-comment things already pointed out.
> 
> > @@ -3876,6 +3885,15 @@ is_any_vex_encoding (const insn_template *t)
> >    return t->opcode_modifier.vex || t->opcode_modifier.evex;  }
> >
> > +/* We can use this function only when the current encoding is evex.
> > +*/ static INLINE bool is_apx_evex_encoding (void) {
> > +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> > +    || (i.vex.register_specifier
> > +	&& i.vex.register_specifier->reg_flags & RegRex2);
> 
> Nit: Parentheses please around the & expression.
> 
Done.

> > @@ -8097,7 +8142,11 @@ process_suffix (void)
> >  	  if (i.tm.opcode_modifier.jump == JUMP_BYTE) /* jcxz, loop */
> >  	    prefix = ADDR_PREFIX_OPCODE;
> >
> > -	  if (!add_prefix (prefix))
> > +	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
> > +	     needs to be adjusted.  */
> > +	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
> > +	    i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
> 
> Feels like I did ask before: What if i.tm.opcode_modifier.opcodeprefix is
> already set? Aiui that would be a bug, but one that's likely easy to introduce
> and hard to find. IOW - better assert the field is clear before filling it?
>
Something like that, so we moved it here from somewhere else.  Added.

          /* The DATA PREFIX of EVEX promoted from legacy APX instructions
             needs to be adjusted.  */
          if (i.tm.opcode_space == SPACE_EVEXMAP4)
            {
              gas_assert (!i.tm.opcode_modifier.opcodeprefix);
              i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
            }

> > @@ -14293,6 +14342,12 @@ static bool check_register (const reg_entry
> *r)
> >        if (!cpu_arch_flags.bitfield.cpuapx_f
> >  	  || flag_code != CODE_64BIT)
> >  	return false;
> > +
> > +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as
> EVEX.
> > +	 For the later install_template function.  */
> > +      if (current_templates.start->opcode_modifier.vex
> > +	  && current_templates.start->opcode_modifier.evex)
> > +	i.vec_encoding = vex_encoding_evex;
> >      }
> 
> Just to state it again - no use of current_templates in this funciton, please.
> 
Removed, mentioned in my previous comment.

> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -106,16 +106,6 @@
> >  #define HLEPrefixRelease PrefixOk=PrefixHLERelease  #define
> > NoTrackPrefixOk  PrefixOk=PrefixNoTrack
> >
> > -#define Space0F    OpcodeSpace=SPACE_0F
> > -#define Space0F38  OpcodeSpace=SPACE_0F38 -#define Space0F3A
> > OpcodeSpace=SPACE_0F3A -#define SpaceXOP08
> OpcodeSpace=SPACE_XOP08
> > -#define SpaceXOP09 OpcodeSpace=SPACE_XOP09 -#define SpaceXOP0A
> > OpcodeSpace=SPACE_XOP0A
> > -
> > -#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5 -#define EVexMap6
> > OpcodeSpace=SPACE_EVEXMAP6
> > -
> 
> Why are you moving these, leaving ...
> 
> >  #define VexMap7 OpcodeSpace=SPACE_VEXMAP7
>
Done.

> ... this one disconnected? IOW - I see no need for the moving, but if there is a
> need, then this one also needs moving (see how it'll become relevant for
> USER_MSR+APX_F now). Specifically ...
> 
> > @@ -137,11 +127,25 @@
> >  #define EVexLIG EVex=EVEXLIG
> >  #define EVexDYN EVex=EVEXDYN
> >
> > +#define Space0F    OpcodeSpace=SPACE_0F
> > +#define Space0F38  OpcodeSpace=SPACE_0F38 #define Space0F3A
> > +OpcodeSpace=SPACE_0F3A #define SpaceXOP08
> OpcodeSpace=SPACE_XOP08
> > +#define SpaceXOP09 OpcodeSpace=SPACE_XOP09 #define SpaceXOP0A
> > +OpcodeSpace=SPACE_XOP0A
> > +
> > +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex128
> 
> ... there's no need for this to live after EVex128 was #define-s.
> 
Yes, done.

> > +#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5 #define EVexMap6
> > +OpcodeSpace=SPACE_EVEXMAP6
> > +
> >  #define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL
> >
> >  #define Vsz256 Vsz=VSZ256
> >  #define Vsz512 Vsz=VSZ512
> >
> > +// The template supports VEX format for cpuid and EVEX format for cpuid &
> apx_f.
> > +#define APX_F(cpuid) cpuid&(cpuid|APX_F)
> 
> I think the comment wants to go into further detail. Please can you go back to
> read what I said when I suggested this construct, in particular regarding the
> stripping then done? However, with you not having found a need to fiddle
> with cpu_flags_match(), I wonder if this construct is needed in the first place.
> The earlier suggestion was entirely based on the assumption that stripping
> similar to that for other combined VEX/EVEX templates would be needed here,
> t
> 

Seeing this, I realized the problem and checked opcodes/i386-tbl.h, for the following entry, we want to set CpuAPX_F and CpuBMI to 1, but gen.c doesn't seem to support the format "cpuid&(cpuid|APX_F)", in fact bzhi sets CpuBMI to 1 and CpuAPX_F to 0 . I'm not familiar with the relevant logic in gen.c and don't know how to debug it. when you have time, could you help take a look ?

bzhi, 0xf5, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }


> > @@ -2049,13 +2059,20 @@ bndldx, 0x0f1a, MPX,
> > Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
> >
> >  // SHA instructions.
> >  sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S,
> > RegXMM|Unspecified|BaseIndex, RegXMM }
> > +sha1rnds4, 0xd4, SHA&APX_F, Modrm|NoSuf|EVexMap4, { Imm8|Imm8S,
> > +RegXMM|Unspecified|BaseIndex, RegXMM }
> >  sha1nexte, 0xf38c8, SHA, Modrm|NoSuf,
> { RegXMM|Unspecified|BaseIndex,
> > RegXMM }
> > +sha1nexte, 0xd8, SHA&APX_F, Modrm|NoSuf|EVexMap4, {
> > +RegXMM|Unspecified|BaseIndex, RegXMM }
> >  sha1msg1, 0xf38c9, SHA, Modrm|NoSuf,
> { RegXMM|Unspecified|BaseIndex,
> > RegXMM }
> > +sha1msg1, 0xd9, SHA&APX_F, Modrm|NoSuf|EVexMap4, {
> > +RegXMM|Unspecified|BaseIndex, RegXMM }
> >  sha1msg2, 0xf38ca, SHA, Modrm|NoSuf,
> { RegXMM|Unspecified|BaseIndex,
> > RegXMM }
> > +sha1msg2, 0xda, SHA&APX_F, Modrm|NoSuf|EVexMap4, {
> > +RegXMM|Unspecified|BaseIndex, RegXMM }
> >  sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword,
> > RegXMM|Unspecified|BaseIndex, RegXMM }
> 
> What about this form? It surely also wants to gain an EVEX counterpart.
> 

Added.

Thanks,
Lili.
  
Cui, Lili Dec. 26, 2023, 7 a.m. UTC | #4
> > --- a/opcodes/i386-dis-evex-prefix.h
> > +++ b/opcodes/i386-dis-evex-prefix.h
> > @@ -285,6 +285,14 @@
> >      { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
> >      { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
> >    },
> > +  /* PREFIX_EVEX_0F38F2_L_0 */
> > +  {
> > +    { "andnS",	{ Gdq, VexGdq, Edq }, 0 },
> > +  },
> 
> So not being able to re-use the VEX entry for this and ...
> 
> > --- a/opcodes/i386-dis-evex-reg.h
> > +++ b/opcodes/i386-dis-evex-reg.h
> > @@ -49,3 +49,10 @@
> >      { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
> >      { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
> >    },
> > +  /* REG_EVEX_0F38F3_L_0_P_0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { "blsrS",	{ VexGdq, Edq }, 0 },
> > +    { "blsmskS",	{ VexGdq, Edq }, 0 },
> > +    { "blsiS",	{ VexGdq, Edq }, 0 },
> > +  },              
> 
> ... this was due to the VEX entries having PREFIX_OPCODE, which would be
> getting in the way? This is the sort of thing that would be useful to have in the
> description, to avoid raising the same question again that (I think) was raised
> before.
> 
> Yet then - why do you strip PREFIX_OPCODE from the VEX entries? If you do
> that (as iirc I did suggest), there's no need for having separate EVEX ones (the
> suggestion, after all, was to be able to re-use the VEX entries). That re-work of
> existing VEX encodings could, btw, also have been split out quite easily. That
> way this huge patch would have further shrunk a little.
> 
I re-used the VEX entry, but forgot to remove the redundant, removed.

> > --- /dev/null
> > +++ b/opcodes/i386-dis-evex-x86-64.h
> > @@ -0,0 +1,50 @@
> > +  /* X86_64_EVEX_0F90 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_W_TABLE (VEX_W_0F90_L_0) },  },
> > +  /* X86_64_EVEX_0F91 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_W_TABLE (VEX_W_0F91_L_0) },  },
> > +  /* X86_64_EVEX_0F92 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_W_TABLE (VEX_W_0F92_L_0) },  },
> > +  /* X86_64_EVEX_0F93 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_W_TABLE (VEX_W_0F93_L_0) },  },
> > +  /* X86_64_EVEX_0F38F2 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },  },
> > +  /* X86_64_EVEX_0F38F3 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },  },
> > +  /* X86_64_EVEX_0F38F5 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },  },
> > +  /* X86_64_EVEX_0F38F6 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },  },
> > +  /* X86_64_EVEX_0F38F7 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },  },
> > +  /* X86_64_EVEX_0F3AF0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },  },
> 
> Am I misremembering that we had agreed that this new file isn't necessary, by
> having USE_X86_64_EVEX_FROM_VEX_TABLE handle the non-64-bit case? At
> least I couldn't find a mail from you saying this isn't possible (and why).
> 
I Prefer not to change the current implement, we need a table that all instructions must go through, it can be x86-64 or len_table,
but I think x86-64 is better. It can reuse more old parts of x86-64 (for example X86_64_VEX_0F38E*) than len_table. If we use len_table instead, we need to let another 18 instructions through the len_table, this will also add the number of entries.

18 instructions are:
X86_64_VEX_0F38E0~ X86_64_VEX_0F38EF, X86_64_VEX_0F3849, and X86_64_VEX_0F384B.

Thanks,
Lili.
  
Jan Beulich Jan. 4, 2024, 9:01 a.m. UTC | #5
On 26.12.2023 08:00, Cui, Lili wrote:
>>> --- /dev/null
>>> +++ b/opcodes/i386-dis-evex-x86-64.h
>>> @@ -0,0 +1,50 @@
>>> +  /* X86_64_EVEX_0F90 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_W_TABLE (VEX_W_0F90_L_0) },  },
>>> +  /* X86_64_EVEX_0F91 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_W_TABLE (VEX_W_0F91_L_0) },  },
>>> +  /* X86_64_EVEX_0F92 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_W_TABLE (VEX_W_0F92_L_0) },  },
>>> +  /* X86_64_EVEX_0F93 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { VEX_W_TABLE (VEX_W_0F93_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F2 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F3 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F5 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F6 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },  },
>>> +  /* X86_64_EVEX_0F38F7 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },  },
>>> +  /* X86_64_EVEX_0F3AF0 */
>>> +  {
>>> +    { Bad_Opcode },
>>> +    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },  },
>>
>> Am I misremembering that we had agreed that this new file isn't necessary, by
>> having USE_X86_64_EVEX_FROM_VEX_TABLE handle the non-64-bit case? At
>> least I couldn't find a mail from you saying this isn't possible (and why).
>>
> I Prefer not to change the current implement, we need a table that all instructions must go through, it can be x86-64 or len_table,
> but I think x86-64 is better. It can reuse more old parts of x86-64 (for example X86_64_VEX_0F38E*) than len_table. If we use len_table instead, we need to let another 18 instructions through the len_table, this will also add the number of entries.
> 
> 18 instructions are:
> X86_64_VEX_0F38E0~ X86_64_VEX_0F38EF, X86_64_VEX_0F3849, and X86_64_VEX_0F384B.

I don't see this as a (necessary) result. Since the patches were (imo
prematurely) committed already, it'll now be (again) on me to see about
cleaning up. Oh well.

Jan
  
Jan Beulich Jan. 4, 2024, 9:08 a.m. UTC | #6
On 25.12.2023 13:23, Cui, Lili wrote:
>> On 19.12.2023 13:12, Cui, Lili wrote:
>>>  #define Vsz256 Vsz=VSZ256
>>>  #define Vsz512 Vsz=VSZ512
>>>
>>> +// The template supports VEX format for cpuid and EVEX format for cpuid &
>> apx_f.
>>> +#define APX_F(cpuid) cpuid&(cpuid|APX_F)
>>
>> I think the comment wants to go into further detail. Please can you go back to
>> read what I said when I suggested this construct, in particular regarding the
>> stripping then done? However, with you not having found a need to fiddle
>> with cpu_flags_match(), I wonder if this construct is needed in the first place.
>> The earlier suggestion was entirely based on the assumption that stripping
>> similar to that for other combined VEX/EVEX templates would be needed here,
>> t
>>
> 
> Seeing this, I realized the problem and checked opcodes/i386-tbl.h, for the following entry, we want to set CpuAPX_F and CpuBMI to 1, but gen.c doesn't seem to support the format "cpuid&(cpuid|APX_F)",

I suppose you mean "cpuid|(cpuid&APX_F)"?

> in fact bzhi sets CpuBMI to 1 and CpuAPX_F to 0 . I'm not familiar with the relevant logic in gen.c and don't know how to debug it. when you have time, could you help take a look ?

Well, now that the patch was committed I of course can easily take a look.
But with such an outstanding issue, how could the patch have been committed
in the first place? And isn't it a requirement for making changes to
i386-gen.c that you sufficiently understand the logic there (which admittedly
has grown non-trivial)? Anyway - I will need to find time to play with what
has been committed, but that's not likely to happen in time for 2.42. Hence
it is quite possible that 2.42 will turn out to have incomplete and/or broken
APX_F support. But perhaps this was intended to be this way by H.J., when he
approved the entire series without even a single comment.

Jan
  
Cui, Lili Jan. 4, 2024, 12:32 p.m. UTC | #7
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, January 4, 2024 5:08 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH v4 3/9] Support APX GPR32 with extend evex prefix
> 
> On 25.12.2023 13:23, Cui, Lili wrote:
> >> On 19.12.2023 13:12, Cui, Lili wrote:
> >>>  #define Vsz256 Vsz=VSZ256
> >>>  #define Vsz512 Vsz=VSZ512
> >>>
> >>> +// The template supports VEX format for cpuid and EVEX format for
> >>> +cpuid &
> >> apx_f.
> >>> +#define APX_F(cpuid) cpuid&(cpuid|APX_F)
> >>
> >> I think the comment wants to go into further detail. Please can you
> >> go back to read what I said when I suggested this construct, in
> >> particular regarding the stripping then done? However, with you not
> >> having found a need to fiddle with cpu_flags_match(), I wonder if this
> construct is needed in the first place.
> >> The earlier suggestion was entirely based on the assumption that
> >> stripping similar to that for other combined VEX/EVEX templates would
> >> be needed here, t
> >>
> >
> > Seeing this, I realized the problem and checked opcodes/i386-tbl.h,
> > for the following entry, we want to set CpuAPX_F and CpuBMI to 1, but
> > gen.c doesn't seem to support the format "cpuid&(cpuid|APX_F)",
> 
> I suppose you mean "cpuid|(cpuid&APX_F)"?
> 

I tried cpuid|(cpuid&APX_F). Currently this format is not supported.

> > in fact bzhi sets CpuBMI to 1 and CpuAPX_F to 0 . I'm not familiar with the
> relevant logic in gen.c and don't know how to debug it. when you have time,
> could you help take a look ?
> 
> Well, now that the patch was committed I of course can easily take a look.
> But with such an outstanding issue, how could the patch have been
> committed in the first place? And isn't it a requirement for making changes to
> i386-gen.c that you sufficiently understand the logic there (which admittedly
> has grown non-trivial)? Anyway - I will need to find time to play with what has
> been committed, but that's not likely to happen in time for 2.42. Hence it is
> quite possible that 2.42 will turn out to have incomplete and/or broken APX_F
> support. But perhaps this was intended to be this way by H.J., when he
> approved the entire series without even a single comment.
> 

Although there is a problem in this place, now apx can handle all cases correctly. On the contrary, when this place is corrected, cpu_flags_match() needs to be adjusted, otherwise there will be testcase fail.

Thanks,
Lili.
  
Cui, Lili Jan. 4, 2024, 12:47 p.m. UTC | #8
> On 26.12.2023 08:00, Cui, Lili wrote:
> >>> --- /dev/null
> >>> +++ b/opcodes/i386-dis-evex-x86-64.h
> >>> @@ -0,0 +1,50 @@
> >>> +  /* X86_64_EVEX_0F90 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_W_TABLE (VEX_W_0F90_L_0) },  },
> >>> +  /* X86_64_EVEX_0F91 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_W_TABLE (VEX_W_0F91_L_0) },  },
> >>> +  /* X86_64_EVEX_0F92 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_W_TABLE (VEX_W_0F92_L_0) },  },
> >>> +  /* X86_64_EVEX_0F93 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { VEX_W_TABLE (VEX_W_0F93_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F2 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F3 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F5 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F6 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },  },
> >>> +  /* X86_64_EVEX_0F38F7 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },  },
> >>> +  /* X86_64_EVEX_0F3AF0 */
> >>> +  {
> >>> +    { Bad_Opcode },
> >>> +    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },  },
> >>
> >> Am I misremembering that we had agreed that this new file isn't
> >> necessary, by having USE_X86_64_EVEX_FROM_VEX_TABLE handle the
> >> non-64-bit case? At least I couldn't find a mail from you saying this isn't
> possible (and why).
> >>
> > I Prefer not to change the current implement, we need a table that all
> > instructions must go through, it can be x86-64 or len_table, but I think x86-
> 64 is better. It can reuse more old parts of x86-64 (for example
> X86_64_VEX_0F38E*) than len_table. If we use len_table instead, we need to
> let another 18 instructions through the len_table, this will also add the
> number of entries.
> >
> > 18 instructions are:
> > X86_64_VEX_0F38E0~ X86_64_VEX_0F38EF, X86_64_VEX_0F3849, and
> X86_64_VEX_0F384B.
> 
> I don't see this as a (necessary) result. Since the patches were (imo
> prematurely) committed already, it'll now be (again) on me to see about
> cleaning up. Oh well.
> 

I don't quite understand, I think this modification is a trade-off.

Lili.
  
Jan Beulich Jan. 4, 2024, 12:55 p.m. UTC | #9
On 04.01.2024 13:32, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, January 4, 2024 5:08 PM
>>
>> On 25.12.2023 13:23, Cui, Lili wrote:
>>>> On 19.12.2023 13:12, Cui, Lili wrote:
>>>>>  #define Vsz256 Vsz=VSZ256
>>>>>  #define Vsz512 Vsz=VSZ512
>>>>>
>>>>> +// The template supports VEX format for cpuid and EVEX format for
>>>>> +cpuid &
>>>> apx_f.
>>>>> +#define APX_F(cpuid) cpuid&(cpuid|APX_F)
>>>>
>>>> I think the comment wants to go into further detail. Please can you
>>>> go back to read what I said when I suggested this construct, in
>>>> particular regarding the stripping then done? However, with you not
>>>> having found a need to fiddle with cpu_flags_match(), I wonder if this
>> construct is needed in the first place.
>>>> The earlier suggestion was entirely based on the assumption that
>>>> stripping similar to that for other combined VEX/EVEX templates would
>>>> be needed here, t
>>>>
>>>
>>> Seeing this, I realized the problem and checked opcodes/i386-tbl.h,
>>> for the following entry, we want to set CpuAPX_F and CpuBMI to 1, but
>>> gen.c doesn't seem to support the format "cpuid&(cpuid|APX_F)",
>>
>> I suppose you mean "cpuid|(cpuid&APX_F)"?
>>
> 
> I tried cpuid|(cpuid&APX_F). Currently this format is not supported.

Right, and it cannot be without adding yet another CPU field to the templates
(which I'd like to avoid). Hence why I thought you mean this form, despite
writing the other (recognized) one in your earlier reply.

Jan
  

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 051cdef2a3a..25cfacce138 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -89,6 +89,7 @@ 
 /* This matches the C -> StaticRounding alias in the opcode table.  */
 #define commutative staticrounding
 
+#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t, cpuid))
 /*
   'templates' is for grouping together 'template' structures for opcodes
   of the same name.  This is only used for storing the insns in the grand
@@ -3673,7 +3674,7 @@  install_template (const insn_template *t)
 
   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
-  {
+    {
       if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
 	   || maybe_cpu (t, CpuFMA))
 	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
@@ -3695,7 +3696,15 @@  install_template (const insn_template *t)
 		gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
 	    }
 	}
-  }
+
+      if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
+	  || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
+	  || APX_F(CpuBMI2))
+	if (need_evex_encoding ())
+	  i.tm.opcode_modifier.vex = 0;
+	else
+	  i.tm.opcode_modifier.evex = 0;
+    }
 
   /* Note that for pseudo prefixes this produces a length of 1. But for them
      the length isn't interesting at all.  */
@@ -3876,6 +3885,15 @@  is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || t->opcode_modifier.evex;
 }
 
+/* We can use this function only when the current encoding is evex.  */
+static INLINE bool
+is_apx_evex_encoding (void)
+{
+  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
+    || (i.vex.register_specifier
+	&& i.vex.register_specifier->reg_flags & RegRex2);
+}
+
 static INLINE bool
 is_apx_rex2_encoding (void)
 {
@@ -4153,6 +4171,27 @@  build_rex2_prefix (void)
 		    | (i.rex2 << 4) | i.rex);
 }
 
+/* Build the EVEX prefix (4-byte) for evex insn
+   | 62h |
+   | `R`X`B`R' | B'mmm |
+   | W | v`v`v`v | `x' | pp |
+   | z| L'L | b | `v | aaa |
+*/
+static void
+build_apx_evex_prefix (void)
+{
+  build_evex_prefix ();
+  if (i.rex2 & REX_R)
+    i.vex.bytes[1] &= ~0x10;
+  if (i.rex2 & REX_B)
+    i.vex.bytes[1] |= 0x08;
+  if (i.rex2 & REX_X)
+    i.vex.bytes[2] &= ~0x04;
+  if (i.vex.register_specifier
+      && i.vex.register_specifier->reg_flags & RegRex2)
+    i.vex.bytes[3] &= ~0x08;
+}
+
 static void
 process_immext (void)
 {
@@ -5642,13 +5681,18 @@  md_assemble (char *line)
 	  return;
 	}
 
-      if (i.tm.opcode_modifier.vex)
+      if (is_apx_evex_encoding ())
+	build_apx_evex_prefix ();
+      else if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
 	build_evex_prefix ();
 
       /* The individual REX.RXBW bits got consumed.  */
       i.rex &= REX_OPCODE;
+
+      /* The rex2 bits got consumed.  */
+      i.rex2 = 0;
     }
 
   /* Handle conversion of 'int $3' --> special int3 insn.  */
@@ -8087,7 +8131,8 @@  process_suffix (void)
       if (i.suffix != QWORD_MNEM_SUFFIX
 	  && i.tm.opcode_modifier.mnemonicsize != IGNORESIZE
 	  && !i.tm.opcode_modifier.floatmf
-	  && !is_any_vex_encoding (&i.tm)
+	  && (!is_any_vex_encoding (&i.tm)
+	      || i.tm.opcode_space == SPACE_EVEXMAP4)
 	  && ((i.suffix == LONG_MNEM_SUFFIX) == (flag_code == CODE_16BIT)
 	      || (flag_code == CODE_64BIT
 		  && i.tm.opcode_modifier.jump == JUMP_BYTE)))
@@ -8097,7 +8142,11 @@  process_suffix (void)
 	  if (i.tm.opcode_modifier.jump == JUMP_BYTE) /* jcxz, loop */
 	    prefix = ADDR_PREFIX_OPCODE;
 
-	  if (!add_prefix (prefix))
+	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
+	     needs to be adjusted.  */
+	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
+	    i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
+	  else if (!add_prefix (prefix))
 	    return 0;
 	}
 
@@ -14293,6 +14342,12 @@  static bool check_register (const reg_entry *r)
       if (!cpu_arch_flags.bitfield.cpuapx_f
 	  || flag_code != CODE_64BIT)
 	return false;
+
+      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
+	 For the later install_template function.  */
+      if (current_templates.start->opcode_modifier.vex
+	  && current_templates.start->opcode_modifier.evex)
+	i.vec_encoding = vex_encoding_evex;
     }
 
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
diff --git a/gas/testsuite/gas/i386/x86-64-evex.d b/gas/testsuite/gas/i386/x86-64-evex.d
index 041747db892..5d974c312da 100644
--- a/gas/testsuite/gas/i386/x86-64-evex.d
+++ b/gas/testsuite/gas/i386/x86-64-evex.d
@@ -17,6 +17,6 @@  Disassembly of section .text:
  +[a-f0-9]+:	62 f1 d6 38 7b f0    	vcvtusi2ss %rax,\{rd-sae\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 57 38 7b f0    	vcvtusi2sd %eax,\{rd-bad\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 d7 38 7b f0    	vcvtusi2sd %rax,\{rd-sae\},%xmm5,%xmm6
- +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,\(bad\)
+ +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,%r16d
  +[a-f0-9]+:	62 e1 7c 08 c2 c0 00 	vcmpeqps %xmm0,%xmm0,\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 2be0df0e981..7a1fef58735 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -250,7 +250,7 @@  run_dump_test "x86-64-sse-noavx"
 run_dump_test "x86-64-movbe"
 run_dump_test "x86-64-movbe-intel"
 run_dump_test "x86-64-movbe-suffix"
-run_list_test "x86-64-inval-movbe" "-al"
+run_list_test "x86-64-inval-movbe" "-march=+noapx_f -al"
 run_dump_test "x86-64-ept"
 run_dump_test "x86-64-ept-intel"
 run_list_test "x86-64-inval-ept" "-al"
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index a02609c50f2..ad59a559e0d 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -62,6 +62,16 @@  static const struct dis386 evex_len_table[][3] = {
     { REG_TABLE (REG_EVEX_0F38C7_L_2) },
   },
 
+  /* EVEX_LEN_0F38F2 */
+  {
+    { PREFIX_TABLE (PREFIX_EVEX_0F38F2_L_0) },
+  },
+
+  /* EVEX_LEN_0F38F3 */
+  {
+    { PREFIX_TABLE (PREFIX_EVEX_0F38F3_L_0) },
+  },
+
   /* EVEX_LEN_0F3A00 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 28da54922c7..b11b7adb443 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -285,6 +285,14 @@ 
     { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
     { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
   },
+  /* PREFIX_EVEX_0F38F2_L_0 */
+  {
+    { "andnS",	{ Gdq, VexGdq, Edq }, 0 },
+  },
+  /* PREFIX_EVEX_0F38F3_L_0 */
+  {
+    { REG_TABLE (REG_EVEX_0F38F3_L_0_P_0) },
+  },
   /* PREFIX_EVEX_0F3A08 */
   {
     { "vrndscalep%XH",  { XM, EXxh, EXxEVexS, Ib }, 0 },
@@ -338,6 +346,64 @@ 
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
     { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
   },
+  /* PREFIX_EVEX_MAP4_D8 */
+  {
+    { "sha1nexte", { XM, EXxmm }, 0 },
+    { REG_TABLE (REG_0F38D8_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DA */
+  {
+    { "sha1msg2", { XM, EXxmm }, 0 },
+    { "encodekey128", { Gd, Rd }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DB */
+  {
+    { "sha256rnds2", { XM, EXxmm, XMM0 }, 0 },
+    { "encodekey256", { Gd, Rd }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DC */
+  {
+    { "sha256msg1", { XM, EXxmm }, 0 },
+    { "aesenc128kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DD */
+  {
+    { "sha256msg2", { XM, EXxmm }, 0 },
+    { "aesdec128kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DE */
+  {
+    { Bad_Opcode },
+    { "aesenc256kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DF */
+  {
+    { Bad_Opcode },
+    { "aesdec256kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F0 */
+  {
+    { "crc32A", { Gdq, Eb }, 0 },
+    { "invept", { Gm, Mo }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F1 */
+  {
+    { "crc32Q", { Gdq, Ev }, 0 },
+    { "invvpid", { Gm, Mo }, 0 },
+    { "crc32Q", { Gdq, Ev }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F2 */
+  {
+    { Bad_Opcode },
+    { "invpcid", { Gm, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F8 */
+  {
+    { Bad_Opcode },
+    { "enqcmds", { Gva, M },  0 },
+    { "movdir64b", { Gva, M }, 0 },
+    { "enqcmd", { Gva, M }, 0 },
+  },
   /* PREFIX_EVEX_MAP5_10 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 2885063628b..8374f0ea93a 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -49,3 +49,10 @@ 
     { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
     { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
   },
+  /* REG_EVEX_0F38F3_L_0_P_0 */
+  {
+    { Bad_Opcode },
+    { "blsrS",	{ VexGdq, Edq }, 0 },
+    { "blsmskS",	{ VexGdq, Edq }, 0 },
+    { "blsiS",	{ VexGdq, Edq }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
new file mode 100644
index 00000000000..0d9d98a7691
--- /dev/null
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -0,0 +1,50 @@ 
+  /* X86_64_EVEX_0F90 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F90_L_0) },
+  },
+  /* X86_64_EVEX_0F91 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F91_L_0) },
+  },
+  /* X86_64_EVEX_0F92 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F92_L_0) },
+  },
+  /* X86_64_EVEX_0F93 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F93_L_0) },
+  },
+  /* X86_64_EVEX_0F38F2 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },
+  },
+  /* X86_64_EVEX_0F38F3 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },
+  },
+  /* X86_64_EVEX_0F38F5 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F5_L_0) },
+  },
+  /* X86_64_EVEX_0F38F6 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE(PREFIX_VEX_0F38F6_L_0) },
+  },
+  /* X86_64_EVEX_0F38F7 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE(PREFIX_VEX_0F38F7_L_0) },
+  },
+  /* X86_64_EVEX_0F3AF0 */
+  {
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 7ad1edbe72d..90c063b2188 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -164,10 +164,10 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 90 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F90) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F91) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F92) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F93) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -375,9 +375,9 @@  static const struct dis386 evex_table[][256] = {
     { "vpsllv%DQ",	{ XM, Vex, EXx }, PREFIX_DATA },
     /* 48 */
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F384B) },
     { "vrcp14p%XW",	{ XM, EXx }, PREFIX_DATA },
     { "vrcp14s%XW",	{ XMScalar, VexScalar, EXdq }, PREFIX_DATA },
     { "vrsqrt14p%XW",	{ XM, EXx }, 0 },
@@ -545,32 +545,32 @@  static const struct dis386 evex_table[][256] = {
     { "%XEvaesdecY",	{ XM, Vex, EXx }, PREFIX_DATA },
     { "%XEvaesdeclastY", { XM, Vex, EXx }, PREFIX_DATA },
     /* E0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E0) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E1) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E3) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E4) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E7) },
     /* E8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E8) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E9) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EA) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EB) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EC) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38ED) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EE) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EF) },
     /* F0 */
     { Bad_Opcode },
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F3) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F7) },
     /* F8 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -854,7 +854,7 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3AF0) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -983,13 +983,13 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 60 */
+    { "movbeS",	{ Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "movbeS",	{ Ev, Gv }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "wrussK",	{ M, Gdq }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_0F38F6) },
     { Bad_Opcode },
     /* 68 */
     { Bad_Opcode },
@@ -1113,19 +1113,19 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "sha1rnds4",	{ XM, EXxmm, Ib }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* D8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_D8) },
+    { "sha1msg1",	{ XM, EXxmm }, NO_PREFIX },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DA) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DB) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DC) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DD) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DE) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DF) },
     /* E0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1145,20 +1145,20 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F0) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F1) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F2) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* F8 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
+    { "movdiri",	{ Mdq, Gdq }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_0F38FC) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 20fe7070b35..5b6f063d016 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -132,6 +132,13 @@  enum x86_64_isa
   intel64
 };
 
+enum evex_type
+{
+  evex_default = 0,
+  evex_from_legacy,
+  evex_from_vex,
+};
+
 struct instr_info
 {
   enum address_mode address_mode;
@@ -212,7 +219,6 @@  struct instr_info
     int ll;
     bool w;
     bool evex;
-    bool r;
     bool v;
     bool zeroing;
     bool b;
@@ -220,6 +226,8 @@  struct instr_info
   }
   vex;
 
+  enum evex_type evex_type;
+
   /* Remember if the current op is a jump instruction.  */
   bool op_is_jump;
 
@@ -305,6 +313,8 @@  struct dis_private {
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
 #define PREFIX_REX2 0x1000
+#define PREFIX_NP_OR_DATA 0x2000
+#define NO_PREFIX   0x4000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -802,6 +812,7 @@  enum
   USE_RM_TABLE,
   USE_PREFIX_TABLE,
   USE_X86_64_TABLE,
+  USE_X86_64_EVEX_FROM_VEX_TABLE,
   USE_3BYTE_TABLE,
   USE_XOP_8F_TABLE,
   USE_VEX_C4_TABLE,
@@ -820,6 +831,8 @@  enum
 #define RM_TABLE(I)		DIS386 (USE_RM_TABLE, (I))
 #define PREFIX_TABLE(I)		DIS386 (USE_PREFIX_TABLE, (I))
 #define X86_64_TABLE(I)		DIS386 (USE_X86_64_TABLE, (I))
+#define X86_64_EVEX_FROM_VEX_TABLE(I) \
+  DIS386 (USE_X86_64_EVEX_FROM_VEX_TABLE, (I))
 #define THREE_BYTE_TABLE(I)	DIS386 (USE_3BYTE_TABLE, (I))
 #define XOP_8F_TABLE()		DIS386 (USE_XOP_8F_TABLE, 0)
 #define VEX_C4_TABLE()		DIS386 (USE_VEX_C4_TABLE, 0)
@@ -868,7 +881,7 @@  enum
   REG_VEX_0F73,
   REG_VEX_0FAE,
   REG_VEX_0F3849_X86_64_L_0_W_0_M_1_P_0,
-  REG_VEX_0F38F3_L_0,
+  REG_VEX_0F38F3_L_0_P_0,
   REG_VEX_MAP7_F8_L_0_W_0,
 
   REG_XOP_09_01_L_0,
@@ -880,7 +893,8 @@  enum
   REG_EVEX_0F72,
   REG_EVEX_0F73,
   REG_EVEX_0F38C6_L_2,
-  REG_EVEX_0F38C7_L_2
+  REG_EVEX_0F38C7_L_2,
+  REG_EVEX_0F38F3_L_0_P_0,
 };
 
 enum
@@ -1096,6 +1110,8 @@  enum
   PREFIX_VEX_0F38CC,
   PREFIX_VEX_0F38CD,
   PREFIX_VEX_0F38DA_W_0,
+  PREFIX_VEX_0F38F2_L_0,
+  PREFIX_VEX_0F38F3_L_0,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1147,6 +1163,8 @@  enum
   PREFIX_EVEX_0F389B,
   PREFIX_EVEX_0F38AA,
   PREFIX_EVEX_0F38AB,
+  PREFIX_EVEX_0F38F2_L_0,
+  PREFIX_EVEX_0F38F3_L_0,
 
   PREFIX_EVEX_0F3A08,
   PREFIX_EVEX_0F3A0A,
@@ -1158,6 +1176,18 @@  enum
   PREFIX_EVEX_0F3A67,
   PREFIX_EVEX_0F3AC2,
 
+  PREFIX_EVEX_MAP4_D8,
+  PREFIX_EVEX_MAP4_DA,
+  PREFIX_EVEX_MAP4_DB,
+  PREFIX_EVEX_MAP4_DC,
+  PREFIX_EVEX_MAP4_DD,
+  PREFIX_EVEX_MAP4_DE,
+  PREFIX_EVEX_MAP4_DF,
+  PREFIX_EVEX_MAP4_F0,
+  PREFIX_EVEX_MAP4_F1,
+  PREFIX_EVEX_MAP4_F2,
+  PREFIX_EVEX_MAP4_F8,
+
   PREFIX_EVEX_MAP5_10,
   PREFIX_EVEX_MAP5_11,
   PREFIX_EVEX_MAP5_1D,
@@ -1269,7 +1299,19 @@  enum
   X86_64_VEX_0F38ED,
   X86_64_VEX_0F38EE,
   X86_64_VEX_0F38EF,
+
   X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
+
+  X86_64_EVEX_0F90,
+  X86_64_EVEX_0F91,
+  X86_64_EVEX_0F92,
+  X86_64_EVEX_0F93,
+  X86_64_EVEX_0F38F2,
+  X86_64_EVEX_0F38F3,
+  X86_64_EVEX_0F38F5,
+  X86_64_EVEX_0F38F6,
+  X86_64_EVEX_0F38F7,
+  X86_64_EVEX_0F3AF0,
 };
 
 enum
@@ -1454,6 +1496,8 @@  enum
   EVEX_LEN_0F385B,
   EVEX_LEN_0F38C6,
   EVEX_LEN_0F38C7,
+  EVEX_LEN_0F38F2,
+  EVEX_LEN_0F38F3,
   EVEX_LEN_0F3A00,
   EVEX_LEN_0F3A01,
   EVEX_LEN_0F3A18,
@@ -2884,12 +2928,12 @@  static const struct dis386 reg_table[][8] = {
   {
     { RM_TABLE (RM_VEX_0F3849_X86_64_L_0_W_0_M_1_P_0_R_0) },
   },
-  /* REG_VEX_0F38F3_L_0 */
+  /* REG_VEX_0F38F3_L_0_P_0 */
   {
     { Bad_Opcode },
-    { "blsrS",		{ VexGdq, Edq }, PREFIX_OPCODE },
-    { "blsmskS",	{ VexGdq, Edq }, PREFIX_OPCODE },
-    { "blsiS",		{ VexGdq, Edq }, PREFIX_OPCODE },
+    { "blsrS",		{ VexGdq, Edq }, 0 },
+    { "blsmskS",	{ VexGdq, Edq }, 0 },
+    { "blsiS",		{ VexGdq, Edq }, 0 },
   },
   /* REG_VEX_MAP7_F8_L_0_W_0 */
   {
@@ -4037,6 +4081,16 @@  static const struct dis386 prefix_table[][4] = {
     { "vsm4rnds4", { XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F38F2_L_0 */
+  {
+    { "andnS",          { Gdq, VexGdq, Edq }, 0 },
+  },
+
+  /* PREFIX_VEX_0F38F3_L_0 */
+  {
+    { REG_TABLE (REG_VEX_0F38F3_L_0_P_0) },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -4529,6 +4583,7 @@  static const struct dis386 x86_64_table[][2] = {
     { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
   },
 
+#include "i386-dis-evex-x86-64.h"
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -7115,12 +7170,12 @@  static const struct dis386 vex_len_table[][2] = {
 
   /* VEX_LEN_0F38F2 */
   {
-    { "andnS",		{ Gdq, VexGdq, Edq }, PREFIX_OPCODE },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F2_L_0) },
   },
 
   /* VEX_LEN_0F38F3 */
   {
-    { REG_TABLE(REG_VEX_0F38F3_L_0) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38F3_L_0) },
   },
 
   /* VEX_LEN_0F38F5 */
@@ -8734,6 +8789,17 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       dp = &prefix_table[dp->op[1].bytemode][vindex];
       break;
 
+    case USE_X86_64_EVEX_FROM_VEX_TABLE:
+      ins->evex_type = evex_from_vex;
+      /* EVEX from VEX instrucions require that EVEX.z, EVEX.L’L, EVEX.b and
+	 the lower 2 bits of EVEX.aaa must be 0.  */
+      if ((ins->vex.mask_register_specifier & 0x3) != 0
+	  || ins->vex.ll != 0
+	  || ins->vex.zeroing != 0
+	  || ins->vex.b)
+	return &bad_opcode;
+
+      /* Fall through.  */
     case USE_X86_64_TABLE:
       vindex = ins->address_mode == mode_64bit ? 1 : 0;
       dp = &x86_64_table[dp->op[1].bytemode][vindex];
@@ -8979,9 +9045,13 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       if (!fetch_code (ins->info, ins->codep + 4))
 	return &err_opcode;
       /* The first byte after 0x62.  */
+      if (*ins->codep & 0x8)
+	ins->rex2 |= REX_B;
+      if (!(*ins->codep & 0x10))
+	ins->rex2 |= REX_R;
+
       ins->rex = ~(*ins->codep >> 5) & 0x7;
-      ins->vex.r = *ins->codep & 0x10;
-      switch ((*ins->codep & 0xf))
+      switch (*ins->codep & 0x7)
 	{
 	default:
 	  return &bad_opcode;
@@ -8994,6 +9064,12 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	case 0x3:
 	  vex_table_index = EVEX_0F3A;
 	  break;
+	case 0x4:
+	  vex_table_index = EVEX_MAP4;
+	  ins->evex_type = evex_from_legacy;
+	  if (ins->address_mode != mode_64bit)
+	    return &bad_opcode;
+	  break;
 	case 0x5:
 	  vex_table_index = EVEX_MAP5;
 	  break;
@@ -9010,9 +9086,8 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       ins->vex.register_specifier = (~(*ins->codep >> 3)) & 0xf;
 
-      /* The U bit.  */
       if (!(*ins->codep & 0x4))
-	return &bad_opcode;
+	ins->rex2 |= REX_X;
 
       switch ((*ins->codep & 0x3))
 	{
@@ -9042,12 +9117,26 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       if (ins->address_mode != mode_64bit)
 	{
+	  /* Report bad for !evex_default and when two fixed values of evex
+	     change..  */
+	  if (ins->evex_type != evex_default
+	      || (ins->rex2 & (REX_B | REX_X)))
+	    return &bad_opcode;
 	  /* In 16/32-bit mode silently ignore following bits.  */
 	  ins->rex &= ~REX_B;
-	  ins->vex.r = true;
+	  ins->rex2 &= ~REX_R;
 	}
 
       ins->need_vex = 4;
+
+      /* EVEX from legacy instructions require that EVEX.z, EVEX.L’L and the
+	 lower 2 bits of EVEX.aaa must be 0.  */
+      if (ins->evex_type == evex_from_legacy
+	  && ((ins->vex.mask_register_specifier & 0x3) != 0
+	      || ins->vex.ll != 0
+	      || ins->vex.zeroing != 0))
+	return &bad_opcode;
+
       ins->codep++;
       vindex = *ins->codep++;
       dp = &evex_table[vex_table_index][vindex];
@@ -9462,6 +9551,13 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       dp = get_valid_dis386 (dp, &ins);
       if (dp == &err_opcode)
 	goto fetch_error_out;
+
+      /* For APX instructions promoted from legacy maps 0/1, embedded prefix
+	 is interpreted as the operand size override.  */
+      if (ins.evex_type == evex_from_legacy
+	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
+	sizeflag ^= DFLAG;
+
       if (dp != NULL && putop (&ins, dp->name, sizeflag) == 0)
 	{
 	  if (!get_sib (&ins, sizeflag))
@@ -9641,6 +9737,25 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       if (ins.last_repnz_prefix >= 0)
 	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
       break;
+
+    case PREFIX_NP_OR_DATA:
+      if (ins.vex.prefix == REPE_PREFIX_OPCODE
+	  || ins.vex.prefix == REPNE_PREFIX_OPCODE)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
+
+    case NO_PREFIX:
+      if (ins.vex.prefix)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
     }
 
   /* Check if the REX prefix is used.  */
@@ -10350,7 +10465,7 @@  putop (instr_info *ins, const char *in_template, int sizeflag)
 		{
 		case 'X':
 		  if (!ins->vex.evex || ins->vex.b || ins->vex.ll >= 2
-		      || !ins->vex.r
+		      || (ins->rex2 & REX_R)
 		      || (ins->modrm.mod == 3 && (ins->rex & REX_X))
 		      || !ins->vex.v || ins->vex.mask_register_specifier)
 		    break;
@@ -11461,7 +11576,7 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
   add += (ins->rex2 & REX_B) ? 16 : 0;
 
-  if (ins->vex.evex)
+  if (ins->vex.evex && ins->evex_type == evex_default)
     {
 
       /* Zeroing-masking is invalid for memory destinations. Set the flag
@@ -11605,6 +11720,13 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		abort ();
 	      if (ins->vex.evex)
 		{
+		  /* S/G EVEX insns require EVEX.X4 not to be set.  */
+		  if (ins->rex2 & REX_X)
+		    {
+		      oappend (ins, "(bad)");
+		      return true;
+		    }
+
 		  if (!ins->vex.v)
 		    vindex += 16;
 		  check_gather = ins->obufp == ins->op_out[1];
@@ -11807,7 +11929,7 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
 	      if (ins->rex & REX_R)
 	        modrm_reg += 8;
-	      if (!ins->vex.r)
+	      if (ins->rex2 & REX_R)
 	        modrm_reg += 16;
 	      if (vindex == modrm_reg)
 		oappend (ins, "/(bad)");
@@ -12013,10 +12135,7 @@  OP_indirE (instr_info *ins, int bytemode, int sizeflag)
 static bool
 OP_G (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.evex && !ins->vex.r && ins->address_mode == mode_64bit)
-    oappend (ins, "(bad)");
-  else
-    print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
+  print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
   return true;
 }
 
@@ -12647,7 +12766,7 @@  OP_XMM (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
     reg += 8;
   if (ins->vex.evex)
     {
-      if (!ins->vex.r)
+      if (ins->rex2 & REX_R)
 	reg += 16;
     }
 
@@ -13654,7 +13773,7 @@  DistinctDest_Fixup (instr_info *ins, int bytemode, int sizeflag)
   /* Calc destination register number.  */
   if (ins->rex & REX_R)
     modrm_reg += 8;
-  if (!ins->vex.r)
+  if (ins->rex2 & REX_R)
     modrm_reg += 16;
 
   /* Calc src1 register number.  */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index fb5a4f81288..c42b05d6c9d 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -488,6 +488,7 @@  static bitfield opcode_modifiers[] =
   BITFIELD (Dialect),
   BITFIELD (ISA64),
   BITFIELD (NoEgpr),
+  BITFIELD (NF),
 };
 
 #define CLASS(n) #n, n
@@ -1121,6 +1122,7 @@  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
     SPACE(0F),
     SPACE(0F38),
     SPACE(0F3A),
+    SPACE(EVEXMAP4),
     SPACE(EVEXMAP5),
     SPACE(EVEXMAP6),
     SPACE(VEXMAP7),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 8b4a48c8c85..68dbeca343d 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -753,6 +753,9 @@  enum
      whether the instruction supports pseudo-prefix {rex2}.  */
   NoEgpr,
 
+  /* No CSPAZO flags update indication.  */
+  NF,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -799,6 +802,7 @@  typedef struct i386_opcode_modifier
   unsigned int dialect:2;
   unsigned int isa64:2;
   unsigned int noegpr:1;
+  unsigned int nf:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -974,6 +978,7 @@  typedef struct insn_template
      1: 0F opcode prefix / space.
      2: 0F38 opcode prefix / space.
      3: 0F3A opcode prefix / space.
+     4: EVEXMAP4 opcode prefix / space.
      5: EVEXMAP5 opcode prefix / space.
      6: EVEXMAP6 opcode prefix / space.
      7: VEXMAP7 opcode prefix / space.
@@ -985,6 +990,7 @@  typedef struct insn_template
 #define SPACE_0F	1
 #define SPACE_0F38	2
 #define SPACE_0F3A	3
+#define SPACE_EVEXMAP4	4
 #define SPACE_EVEXMAP5	5
 #define SPACE_EVEXMAP6	6
 #define SPACE_VEXMAP7	7
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 8c8f6695d22..139ec8ce33a 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -106,16 +106,6 @@ 
 #define HLEPrefixRelease PrefixOk=PrefixHLERelease
 #define NoTrackPrefixOk  PrefixOk=PrefixNoTrack
 
-#define Space0F    OpcodeSpace=SPACE_0F
-#define Space0F38  OpcodeSpace=SPACE_0F38
-#define Space0F3A  OpcodeSpace=SPACE_0F3A
-#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
-#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
-#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
-
-#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
-#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
-
 #define VexMap7 OpcodeSpace=SPACE_VEXMAP7
 
 #define VexW0 VexW=VEXW0
@@ -137,11 +127,25 @@ 
 #define EVexLIG EVex=EVEXLIG
 #define EVexDYN EVex=EVEXDYN
 
+#define Space0F    OpcodeSpace=SPACE_0F
+#define Space0F38  OpcodeSpace=SPACE_0F38
+#define Space0F3A  OpcodeSpace=SPACE_0F3A
+#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
+#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
+#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
+
+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex128
+#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
+#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
+
 #define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL
 
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
+// The template supports VEX format for cpuid and EVEX format for cpuid & apx_f.
+#define APX_F(cpuid) cpuid&(cpuid|APX_F)
+
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
@@ -197,6 +201,7 @@  mov, 0xf24, i386&No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Te
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movbe, 0x60, Movbe&APX_F, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
 movsb, 0xfbe, i386, Modrm|No_bSuf|No_sSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -1318,13 +1323,16 @@  getsec, 0xf37, SMX, NoSuf, {}
 
 invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // INVPCID instruction
 
 invpcid, 0x660f3882, INVPCID&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invpcid, 0x660f3882, INVPCID&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invpcid, 0xf3f2, INVPCID&APX_F, Modrm|NoSuf|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // SSSE3 instructions.
 
@@ -1425,6 +1433,8 @@  pcmpistri<sse42>, 0x660f3a63, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, Reg
 pcmpistrm<sse42>, 0x660f3a62, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 crc32, 0xf20f38f0, SSE4_2, W|Modrm|No_sSuf|No_qSuf, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
 crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
+crc32, 0xf0, APX_F, W|Modrm|No_sSuf|No_qSuf|EVexMap4, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
+crc32, 0xf0, APX_F, W|Modrm|No_wSuf|No_lSuf|No_sSuf|EVexMap4, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
 
 // xsave/xrstor New Instructions.
 
@@ -1839,14 +1849,14 @@  xtest, 0xf01d6, HLE|RTM, NoSuf, {}
 
 // BMI2 instructions.
 
-bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bzhi, 0xf5, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+mulx, 0xf2f6, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+rorx, 0xf2f0, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, APX_F(BMI2), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // FMA4 instructions
 
@@ -1916,11 +1926,11 @@  lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|U
 
 // BMI instructions
 
-andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+andn, 0xf2, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+bextr, 0xf7, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, APX_F(BMI), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 tzcnt, 0xf30fbc, BMI, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // TBM instructions
@@ -2049,13 +2059,20 @@  bndldx, 0x0f1a, MPX, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
 
 // SHA instructions.
 sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1rnds4, 0xd4, SHA&APX_F, Modrm|NoSuf|EVexMap4, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1nexte, 0xf38c8, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1nexte, 0xd8, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg1, 0xf38c9, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg1, 0xd9, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg2, 0xf38ca, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg2, 0xda, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256rnds2, 0xdb, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg1, 0xf38cc, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg1, 0xdc, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg2, 0xdd, SHA&APX_F, Modrm|NoSuf|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
 // SHA512 instructions.
 
@@ -2117,9 +2134,9 @@  kor<bw>, 0x<bw:kpfx>45, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { R
 kxnor<bw>, 0x<bw:kpfx>46, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 kxor<bw>, 0x<bw:kpfx>47, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
-kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
-kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
-kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>, D|Modrm|Vex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
+kmov<bw>, 0x<bw:kpfx>90, APX_F(<bw:kcpu>), Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
+kmov<bw>, 0x<bw:kpfx>91, APX_F(<bw:kcpu>), Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
+kmov<bw>, 0x<bw:kpfx>92, APX_F(<bw:kcpu>), D|Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
 
 knot<bw>, 0x<bw:kpfx>44, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
 kortest<bw>, 0x<bw:kpfx>98, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
@@ -2594,9 +2611,9 @@  vpmovzxdq, 0x6635, AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexW=1|Disp8MemShift
 kadd<dq>, 0x<dq:kpfx>4a, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kand<dq>, 0x<dq:kpfx>41, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
-kmov<dq>, 0x<dq:kpfx>90, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
-kmov<dq>, 0x<dq:kpfx>91, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
-kmov<dq>, 0xf292, AVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
+kmov<dq>, 0x<dq:kpfx>90, APX_F(AVX512BW), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
+kmov<dq>, 0x<dq:kpfx>91, APX_F(AVX512BW), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0xf292, APX_F(AVX512BW), D|Modrm|Vex128|EVex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kortest<dq>, 0x<dq:kpfx>98, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
@@ -2995,9 +3012,13 @@  rdsspq, 0xf30f1e/1, SHSTK&x64, Modrm|NoSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, SHSTK, NoSuf, {}
 rstorssp, 0xf30f01/5, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrssd, 0x66, SHSTK&APX_F, Modrm|IgnoreSize|NoSuf|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrssq, 0x0f38f6, SHSTK&x64, Modrm|NoSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x66, SHSTK&APX_F, Modrm|NoSuf|Size64|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrussd, 0x6665, SHSTK&APX_F, Modrm|IgnoreSize|NoSuf|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrussq, 0x660f38f5, SHSTK&x64, Modrm|NoSuf, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x6665, SHSTK&APX_F, Modrm|NoSuf|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, SHSTK, NoSuf, {}
 clrssbsy, 0xf30fae/6, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, IBT, NoSuf, {}
@@ -3045,7 +3066,9 @@  cldemote, 0x0f1c/0, CLDEMOTE, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 // MOVDIR[I,64B] instructions.
 
 movdiri, 0xf38f9, MOVDIRI, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+movdiri, 0xf9, MOVDIRI&APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 movdir64b, 0x660f38f8, MOVDIR64B, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movdir64b, 0x66f8, MOVDIR64B&APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // MOVEDIR instructions end.
 
@@ -3074,7 +3097,9 @@  vcvtneps2bf16<Vxy>, 0xf372, AVX_NE_CONVERT, Modrm|<Vxy:vex>|Space0F38|VexW0|NoSu
 // ENQCMD instructions.
 
 enqcmd, 0xf20f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmd, 0xf2f8, APX_F(ENQCMD), Modrm|AddrPrefixOpReg|NoSuf|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 enqcmds, 0xf30f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmds, 0xf3f8, APX_F(ENQCMD), Modrm|AddrPrefixOpReg|NoSuf|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // ENQCMD instructions end.
 
@@ -3135,8 +3160,8 @@  xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
 
 // AMX instructions.
 
-ldtilecfg, 0x49/0, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
-sttilecfg, 0x6649/0, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+ldtilecfg, 0x49/0, APX_F(AMX_TILE), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+sttilecfg, 0x6649/0, APX_F(AMX_TILE), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 
 tcmmimfp16ps, 0x666c, AMX_COMPLEX, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tcmmrlfp16ps, 0x6c, AMX_COMPLEX, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
@@ -3148,9 +3173,9 @@  tdpbuud, 0x5e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf,
 tdpbusd, 0x665e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tdpbsud, 0xf35e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
-tileloadd, 0xf24b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tileloaddt1, 0x664b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tilestored, 0xf34b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
+tileloadd, 0xf24b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 0x664b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 0xf34b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
 
 tilerelease, 0x49c0, AMX_TILE, Vex128|Space0F38|VexW0|NoSuf, {}
 
@@ -3162,15 +3187,25 @@  tilezero, 0xf249, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
 
 loadiwkey, 0xf30f38dc, KL, Load|Modrm|NoSuf, { RegXMM, RegXMM }
 encodekey128, 0xf30f38fa, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey128, 0xf3da, KL&APX_F, Modrm|NoSuf|EVexMap4, { Reg32, Reg32 }
 encodekey256, 0xf30f38fb, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey256, 0xf3db, KL&APX_F, Modrm|NoSuf|EVexMap4, { Reg32, Reg32 }
 aesenc128kl, 0xf30f38dc, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc128kl, 0xf3dc, KL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec128kl, 0xf30f38dd, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec128kl, 0xf3dd, KL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesenc256kl, 0xf30f38de, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc256kl, 0xf3de, KL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec256kl, 0xf30f38df, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec256kl, 0xf3df, KL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesencwide128kl, 0xf30f38d8/0, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide128kl, 0xf3d8/0, WideKL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide128kl, 0xf30f38d8/1, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide128kl, 0xf3d8/1, WideKL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex }
 aesencwide256kl, 0xf30f38d8/2, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide256kl, 0xf3d8/2, WideKL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide256kl, 0xf30f38d8/3, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide256kl, 0xf3d8/3, WideKL&APX_F, Modrm|NoSuf|EVexMap4, { Unspecified|BaseIndex }
 
 // KEYLOCKER instructions end.
 
@@ -3318,7 +3353,7 @@  prefetchit1, 0xf18/6, PREFETCHI, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 
 // CMPCCXADD instructions.
 
-cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, APX_F(CMPCCXADD), Modrm|Vex|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // CMPCCXADD instructions end.
 
@@ -3338,9 +3373,13 @@  wrmsrlist, 0xf30f01c6, MSRLIST, NoSuf, {}
 // RAO-INT instructions.
 
 aadd, 0xf38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aadd, 0xfc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aand, 0x660f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aand, 0x66fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aor, 0xf20f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aor, 0xf2fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 axor, 0xf30f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+axor, 0xf3fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // RAO-INT instructions end.