[4/8] Support APX NDD

Message ID 20230919152527.497773-5-lili.cui@intel.com
State New
Headers
Series Support Intel APX EGPR |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Testing passed

Commit Message

Frager, Neal via Binutils Sept. 19, 2023, 3:25 p.m. UTC
  From: konglin1 <lingling.kong@intel.com>

opcodes/ChangeLog:

	* opcodes/i386-dis-evex-prefix.h: Add NDD decode for adox/adcx.
	* opcodes/i386-dis-evex-reg.h: Handle for REG_EVEX_MAP4_80,
	REG_EVEX_MAP4_81, REG_EVEX_MAP4_83, REG_EVEX_MAP4_C0,
	REG_EVEX_MAP4_C1, REG_EVEX_MAP4_D0, REG_EVEX_MAP4_D1,
	REG_EVEX_MAP4_D2, REG_EVEX_MAP4_D3, REG_EVEX_MAP4_F6,
	REG_EVEX_MAP4_F7, REG_EVEX_MAP4_FE, REG_EVEX_MAP4_FF.
	* opcodes/i386-dis-evex.h: Add NDD insn.
	* opcodes/i386-dis.c (VexGb): Add new define.
	(VexGv): Ditto.
	(get_valid_dis386): Change for NDD decode.
	(print_insn): Ditto.
	(print_register): Ditto.
	(intel_operand_size): Ditto.
	(OP_E_memory): Ditto.
	(OP_VEX): Ditto.
	* opcodes/i386-opc.h (Opcode_APX_NDDD): New macro.
	* opcodes/i386-opc.tbl: Add APX NDD instructions.
	* opcodes/i386-tbl.h: Regenerated.

gas/ChangeLog:

	* gas/config/tc-i386.c (is_any_apx_encoding): Add legacy insn
	promote to SPACE_EVEXMAP4.
	(build_legacy_insns_with_apx_encoding): Add ndd bit encode.
	(md_assemble): Change for ndd encode.
	(process_operands): Ditto.
	(build_modrm_byte): Ditto.
	(operand_size_match):
	Support APX NDD that the number of operands is 3.
	(match_template): Support swap the first two operands for
	APX NDD that the number of operands is 3.
	* gas/testsuite/gas/i386/x86-64.exp: Add x86-64-apx-ndd.
	* gas/testsuite/gas/i386/x86-64-apx-ndd.d: New test.
	* gas/testsuite/gas/i386/x86-64-apx-ndd.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
---
 gas/config/tc-i386.c                    |  80 ++++++++----
 gas/testsuite/gas/i386/x86-64-apx-ndd.d | 165 ++++++++++++++++++++++++
 gas/testsuite/gas/i386/x86-64-apx-ndd.s | 156 ++++++++++++++++++++++
 gas/testsuite/gas/i386/x86-64-pseudos.d |  42 ++++++
 gas/testsuite/gas/i386/x86-64-pseudos.s |  43 ++++++
 gas/testsuite/gas/i386/x86-64.exp       |   1 +
 opcodes/i386-dis-evex-prefix.h          |   4 +-
 opcodes/i386-dis-evex-reg.h             | 123 ++++++++++++++++++
 opcodes/i386-dis-evex.h                 | 124 +++++++++---------
 opcodes/i386-dis.c                      |  47 ++++++-
 opcodes/i386-opc.h                      |   1 +
 opcodes/i386-opc.tbl                    |  67 ++++++++++
 12 files changed, 762 insertions(+), 91 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
  

Comments

Jan Beulich Sept. 27, 2023, 2:44 p.m. UTC | #1
On 19.09.2023 17:25, Cui, Lili wrote:
> From: konglin1 <lingling.kong@intel.com>
> 
> opcodes/ChangeLog:
> 
> 	* opcodes/i386-dis-evex-prefix.h: Add NDD decode for adox/adcx.
> 	* opcodes/i386-dis-evex-reg.h: Handle for REG_EVEX_MAP4_80,
> 	REG_EVEX_MAP4_81, REG_EVEX_MAP4_83, REG_EVEX_MAP4_C0,
> 	REG_EVEX_MAP4_C1, REG_EVEX_MAP4_D0, REG_EVEX_MAP4_D1,
> 	REG_EVEX_MAP4_D2, REG_EVEX_MAP4_D3, REG_EVEX_MAP4_F6,
> 	REG_EVEX_MAP4_F7, REG_EVEX_MAP4_FE, REG_EVEX_MAP4_FF.
> 	* opcodes/i386-dis-evex.h: Add NDD insn.
> 	* opcodes/i386-dis.c (VexGb): Add new define.
> 	(VexGv): Ditto.
> 	(get_valid_dis386): Change for NDD decode.
> 	(print_insn): Ditto.
> 	(print_register): Ditto.
> 	(intel_operand_size): Ditto.
> 	(OP_E_memory): Ditto.
> 	(OP_VEX): Ditto.
> 	* opcodes/i386-opc.h (Opcode_APX_NDDD): New macro.
> 	* opcodes/i386-opc.tbl: Add APX NDD instructions.
> 	* opcodes/i386-tbl.h: Regenerated.
> 
> gas/ChangeLog:
> 
> 	* gas/config/tc-i386.c (is_any_apx_encoding): Add legacy insn
> 	promote to SPACE_EVEXMAP4.
> 	(build_legacy_insns_with_apx_encoding): Add ndd bit encode.
> 	(md_assemble): Change for ndd encode.
> 	(process_operands): Ditto.
> 	(build_modrm_byte): Ditto.
> 	(operand_size_match):
> 	Support APX NDD that the number of operands is 3.
> 	(match_template): Support swap the first two operands for
> 	APX NDD that the number of operands is 3.
> 	* gas/testsuite/gas/i386/x86-64.exp: Add x86-64-apx-ndd.
> 	* gas/testsuite/gas/i386/x86-64-apx-ndd.d: New test.
> 	* gas/testsuite/gas/i386/x86-64-apx-ndd.s: Ditto.
> 	* testsuite/gas/i386/x86-64-pseudos.d: Add test.
> 	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
> ---
>  gas/config/tc-i386.c                    |  80 ++++++++----
>  gas/testsuite/gas/i386/x86-64-apx-ndd.d | 165 ++++++++++++++++++++++++
>  gas/testsuite/gas/i386/x86-64-apx-ndd.s | 156 ++++++++++++++++++++++
>  gas/testsuite/gas/i386/x86-64-pseudos.d |  42 ++++++
>  gas/testsuite/gas/i386/x86-64-pseudos.s |  43 ++++++
>  gas/testsuite/gas/i386/x86-64.exp       |   1 +
>  opcodes/i386-dis-evex-prefix.h          |   4 +-
>  opcodes/i386-dis-evex-reg.h             | 123 ++++++++++++++++++
>  opcodes/i386-dis-evex.h                 | 124 +++++++++---------
>  opcodes/i386-dis.c                      |  47 ++++++-
>  opcodes/i386-opc.h                      |   1 +
>  opcodes/i386-opc.tbl                    |  67 ++++++++++
>  12 files changed, 762 insertions(+), 91 deletions(-)
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
> 
> diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
> index 48916bc3846..381e389bb04 100644
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -2261,8 +2261,9 @@ operand_size_match (const insn_template *t)
>        unsigned int given = i.operands - j - 1;
>  
>        /* For FMA4 and XOP insns VEX.W controls just the first two
> -	 register operands.  */
> -      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
> +	 register operands. And APX insns just swap the first operands.  */
> +      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
> +	  || (is_cpu (t,CpuAPX_F) && i.operands == 3))
>  	given = j < 2 ? 1 - j : j;

In the comment, how about "And APX_F insns just swap the two source operands,
with the 3rd one being the destination"?

Is the "i.operands == 3" part of the condition really needed? I.e. are there
any APX_F insns which can make it here but must not take this path? Afaict
2-operand insns are fine to go here, and more-than-3-operand insns don't
come with the D attribute.

Also (nit) there's a missing blank after the comma.

> @@ -3876,6 +3877,7 @@ is_any_apx_encoding (void)
>  {
>    return i.rex2 
>      || i.rex2_encoding
> +    || i.tm.opcode_space == SPACE_EVEXMAP4
>      || (i.vex.register_specifier
>  	&& i.vex.register_specifier->reg_flags & RegRex2);
>  }
> @@ -4204,6 +4206,10 @@ build_legacy_insns_with_apx_encoding (void)
>      }
>  
>    build_evex_insns_with_extend_evex_prefix ();
> +
> +  /* Encode the NDD bit.  */
> +  if (i.vex.register_specifier)
> +    i.vex.bytes[3] |= 0x10;
>  }
>  
>  static void
> @@ -7383,26 +7389,31 @@ match_template (char mnem_suffix)
>  	  overlap1 = operand_type_and (operand_types[0], operand_types[1]);
>  	  if (t->opcode_modifier.d && i.reg_operands == i.operands
>  	      && !operand_type_all_zero (&overlap1))
> -	    switch (i.dir_encoding)
> -	      {
> -	      case dir_encoding_load:
> -		if (operand_type_check (operand_types[i.operands - 1], anymem)
> -		    || t->opcode_modifier.regmem)
> -		  goto check_reverse;
> -		break;
> +	    {
> +	      int isMemOperand = (t->opcode_modifier.vexvvvv
> +				  && t->opcode_space == SPACE_EVEXMAP4)
> +				  ? i.operands - 2 : i.operands - 1;

"is" in the variable name is properly misleading. What you're determining
here is which operand you want to _check_ for being the memory operand.

As to the condition, the two side of && may want swapping: In such a
condition it is generally desirable to have the more restricting part
first. Plus this may be more neat to express without ?: anyway:

i.operands - 1 - (t->opcode_space == SPACE_EVEXMAP4 && t->opcode_modifier.vexvvvv)

(suitably line wrapped of course).

> +	      switch (i.dir_encoding)
> +		{
> +		case dir_encoding_load:
> +		  if (operand_type_check (operand_types[isMemOperand], anymem)
> +		      || t->opcode_modifier.regmem)
> +		    goto check_reverse;
> +		  break;
>  
> -	      case dir_encoding_store:
> -		if (!operand_type_check (operand_types[i.operands - 1], anymem)
> -		    && !t->opcode_modifier.regmem)
> -		  goto check_reverse;
> -		break;
> +		case dir_encoding_store:
> +		  if (!operand_type_check (operand_types[isMemOperand], anymem)
> +		      && !t->opcode_modifier.regmem)
> +		    goto check_reverse;
> +		  break;
>  
> -	      case dir_encoding_swap:
> -		goto check_reverse;
> +		case dir_encoding_swap:
> +		  goto check_reverse;
>  
> -	      case dir_encoding_default:
> -		break;
> -	      }
> +		case dir_encoding_default:
> +		  break;
> +		}
> +	    }
>  	  /* If we want store form, we skip the current load.  */
>  	  if ((i.dir_encoding == dir_encoding_store
>  	       || i.dir_encoding == dir_encoding_swap)
> @@ -7432,11 +7443,13 @@ match_template (char mnem_suffix)
>  		continue;
>  	      /* Try reversing direction of operands.  */
>  	      j = is_cpu (t, CpuFMA4)
> -		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
> +		  || is_cpu (t, CpuXOP)
> +		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
>  	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
>  	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
>  	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
> -	      gas_assert (t->operands != 3 || !check_register);
> +	      gas_assert (t->operands != 3 || !check_register
> +			  || is_cpu (t,CpuAPX_F));

Nit: Missing blank again.

> @@ -7471,6 +7484,12 @@ match_template (char mnem_suffix)
>  		  found_reverse_match = Opcode_VexW;
>  		  goto check_operands_345;
>  		}
> +	      else if (is_cpu (t,CpuAPX_F)
> +		       && i.operands == 3)
> +		{
> +		  found_reverse_match = Opcode_APX_NDDD;
> +		  goto check_operands_345;
> +		}
>  	      else if (t->opcode_space != SPACE_BASE
>  		       && (t->opcode_space != SPACE_0F
>  			   /* MOV to/from CR/DR/TR, as an exception, follow
> @@ -7636,6 +7655,15 @@ match_template (char mnem_suffix)
>  	 flipping VEX.W.  */
>        i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
>  
> +      j = i.tm.operand_types[0].bitfield.imm8;
> +      i.tm.operand_types[j] = operand_types[j + 1];
> +      i.tm.operand_types[j + 1] = operand_types[j];
> +      break;

I'm not overly happy to see this code getting duplicated. Are there any
encodings at all which have D and and immediate operand? I don't think
so, in which case this at least wants simplifying. But read on.

> +    case Opcode_APX_NDDD:
> +      /* Only the first two register operands need reversing.  */
> +      i.tm.base_opcode ^= 0x2;

I think you mean Opcode_D here?

>        j = i.tm.operand_types[0].bitfield.imm8;
>        i.tm.operand_types[j] = operand_types[j + 1];
>        i.tm.operand_types[j + 1] = operand_types[j];

Taking both remarks together, do we need Opcode_APX_NDDD at all? Can't
you use the ordinary Opcode_D, with

    default:
      /* If we found a reverse match we must alter the opcode direction
	 bit and clear/flip the regmem modifier one.  found_reverse_match
	 holds bits to change (different for int & float insns).  */

      i.tm.base_opcode ^= found_reverse_match;

      if (i.tm.opcode_space == SPACE_EVEXMAP4 && i.operands == 3)
        goto swap_first_2;
    ...
    swap_first_2:
      j = i.tm.operand_types[0].bitfield.imm8;
      i.tm.operand_types[j] = operand_types[j + 1];
      i.tm.operand_types[j + 1] = operand_types[j];
      break;

? (I'm not convinced the i.operands == 3 part of the condition is needed;
if at all possible it wants omitting.)

> @@ -8462,8 +8490,8 @@ process_operands (void)
>    const reg_entry *default_seg = NULL;
>  
>    /* We only need to check those implicit registers for instructions
> -     with 3 operands or less.  */
> -  if (i.operands <= 3)
> +     with 4 operands or less.  */
> +  if (i.operands <= 4)
>      for (unsigned int j = 0; j < i.operands; j++)
>        if (i.types[j].bitfield.instance != InstanceNone)
>  	i.reg_operands--;

How useful is it to keep the outer if() when 4-operand insns now also
need checking? There are extremely few 5-operand ones ...

> @@ -8825,6 +8853,9 @@ build_modrm_byte (void)
>        break;
>    if (v >= dest)
>      v = ~0;
> +  if (i.tm.opcode_space == SPACE_EVEXMAP4
> +      && i.tm.opcode_modifier.vexvvvv)
> +    v = dest;
>    if (i.tm.extension_opcode != None)
>      {
>        if (dest != source)
> @@ -9088,6 +9119,9 @@ build_modrm_byte (void)
>        set_rex_vrex (i.op[op].regs, REX_B, false);
>  	}
>  
> +      if (i.tm.opcode_space == SPACE_EVEXMAP4
> +	  && i.tm.opcode_modifier.vexvvvv)
> +	dest--;
>        if (op == dest)
>  	dest = ~0;
>        if (op == source)

These two changes are at the very least problematic with .insn, whose
behavior may not change. I'd also prefer if we could get away with just
one change to the function. Did you consider alternatives? We could re-
widen VexVVVV, such that the value 2 indicates that the destination is
encoded there. That then also has no chance of conflicting with .insn.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> @@ -0,0 +1,156 @@
> +# Check 64bit APX NDD instructions with evex prefix encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +inc    %rax,%rbx
> +inc    %r31,%r8
> +inc    %r31,%r16
> +add    %r31b,%r8b,%r16b
> +addb    %r31b,%r8b,%r16b
> +add    %r31,%r8,%r16
> +addq    %r31,%r8,%r16
> +add    %r31d,%r8d,%r16d
> +addl    %r31d,%r8d,%r16d
> +add    %r31w,%r8w,%r16w
> +addw    %r31w,%r8w,%r16w
> +{store} add    %r31,%r8,%r16
> +{load}  add    %r31,%r8,%r16
> +add    %r31,(%r8),%r16
> +add    (%r31),%r8,%r16
> +add    0x9090(%r31,%r16,1),%r8,%r16
> +add    %r31,(%r8,%r16,8),%r16
> +add    $0x34,%r13b,%r17b
> +addl   $0x11,(%r19,%rax,4),%r20d
> +add    $0x1234,%ax,%r30w
> +add    $0x12344433,%r15,%r16
> +addq   $0x12344433,(%r15,%rcx,4),%r16
> +add    $0xfffffffff4332211,%rax,%r8
> +dec    %rax,%r17
> +decb   (%r31,%r12,1),%r8b
> +not    %rax,%r17
> +notb   (%r31,%r12,1),%r8b
> +neg    %rax,%r17
> +negb   (%r31,%r12,1),%r8b
> +sub    %r15b,%r17b,%r18b
> +sub    %r15d,(%r8),%r18d
> +sub    (%r15,%rax,1),%r16b,%r8b
> +sub    (%r15,%rax,1),%r16w,%r8w
> +subl   $0x11,(%r19,%rax,4),%r20d
> +sub    $0x1234,%ax,%r30w
> +sbb    %r15b,%r17b,%r18b
> +sbb    %r15d,(%r8),%r18d
> +sbb    (%r15,%rax,1),%r16b,%r8b
> +sbb    (%r15,%rax,1),%r16w,%r8w
> +sbbl   $0x11,(%r19,%rax,4),%r20d
> +sbb    $0x1234,%ax,%r30w
> +adc    %r15b,%r17b,%r18b
> +adc    %r15d,(%r8),%r18d
> +adc    (%r15,%rax,1),%r16b,%r8b
> +adc    (%r15,%rax,1),%r16w,%r8w
> +adcl   $0x11,(%r19,%rax,4),%r20d
> +adc    $0x1234,%ax,%r30w
> +or     %r15b,%r17b,%r18b
> +or     %r15d,(%r8),%r18d
> +or     (%r15,%rax,1),%r16b,%r8b
> +or     (%r15,%rax,1),%r16w,%r8w
> +orl    $0x11,(%r19,%rax,4),%r20d
> +or     $0x1234,%ax,%r30w
> +xor    %r15b,%r17b,%r18b
> +xor    %r15d,(%r8),%r18d
> +xor    (%r15,%rax,1),%r16b,%r8b
> +xor    (%r15,%rax,1),%r16w,%r8w
> +xorl   $0x11,(%r19,%rax,4),%r20d
> +xor    $0x1234,%ax,%r30w
> +and    %r15b,%r17b,%r18b
> +and    %r15d,(%r8),%r18d
> +and    (%r15,%rax,1),%r16b,%r8b
> +and    (%r15,%rax,1),%r16w,%r8w
> +andl   $0x11,(%r19,%rax,4),%r20d
> +and    $0x1234,%ax,%r30w
> +rorb   (%rax),%r31b
> +ror    $0x2,%r12b,%r31b
> +rorl   $0x2,(%rax),%r31d
> +rorw   (%rax),%r31w
> +ror    %cl,%r16b,%r8b
> +rorw   %cl,(%r19,%rax,4),%r31w
> +rolb   (%rax),%r31b
> +rol    $0x2,%r12b,%r31b
> +roll   $0x2,(%rax),%r31d
> +rolw   (%rax),%r31w
> +rol    %cl,%r16b,%r8b
> +rolw   %cl,(%r19,%rax,4),%r31w
> +rcrb   (%rax),%r31b
> +rcr    $0x2,%r12b,%r31b
> +rcrl   $0x2,(%rax),%r31d
> +rcrw   (%rax),%r31w
> +rcr    %cl,%r16b,%r8b
> +rcrw   %cl,(%r19,%rax,4),%r31w
> +rclb   (%rax),%r31b
> +rcl    $0x2,%r12b,%r31b
> +rcll   $0x2,(%rax),%r31d
> +rclw   (%rax),%r31w
> +rcl    %cl,%r16b,%r8b
> +rclw   %cl,(%r19,%rax,4),%r31w
> +shlb   (%rax),%r31b
> +shl    $0x2,%r12b,%r31b
> +shll   $0x2,(%rax),%r31d
> +shlw   (%rax),%r31w
> +shl    %cl,%r16b,%r8b
> +shlw   %cl,(%r19,%rax,4),%r31w
> +sarb   (%rax),%r31b
> +sar    $0x2,%r12b,%r31b
> +sarl   $0x2,(%rax),%r31d
> +sarw   (%rax),%r31w
> +sar    %cl,%r16b,%r8b
> +sarw   %cl,(%r19,%rax,4),%r31w
> +shlb   (%rax),%r31b
> +shl    $0x2,%r12b,%r31b
> +shll   $0x2,(%rax),%r31d
> +shlw   (%rax),%r31w
> +shl    %cl,%r16b,%r8b
> +shlw   %cl,(%r19,%rax,4),%r31w
> +shrb   (%rax),%r31b
> +shr    $0x2,%r12b,%r31b
> +shrl   $0x2,(%rax),%r31d
> +shrw   (%rax),%r31w
> +shr    %cl,%r16b,%r8b
> +shrw   %cl,(%r19,%rax,4),%r31w
> +shld   $0x1,%r12,(%rax),%r31
> +shld   $0x2,%r8w,%r12w,%r31w
> +shld   $0x2,%r15d,(%rax),%r31d
> +shld   %cl,%r9w,(%rax),%r31w
> +shld   %cl,%r12,%r16,%r8
> +shld   %cl,%r13w,(%r19,%rax,4),%r31w
> +shrd   $0x1,%r12,(%rax),%r31
> +shrd   $0x2,%r8w,%r12w,%r31w
> +shrd   $0x2,%r15d,(%rax),%r31d
> +shrd   %cl,%r9w,(%rax),%r31w
> +shrd   %cl,%r12,%r16,%r8
> +shrd   %cl,%r13w,(%r19,%rax,4),%r31w
> +adcx   %r15d,%r8d,%r18d
> +adcx   (%r15,%r31,1),%r8d,%r18d
> +adcx   (%r15,%r31,1),%r8
> +adox   %r15d,%r8d,%r18d
> +adox   (%r15,%r31,1),%r8d,%r18d
> +adox   (%r15,%r31,1),%r8
> +cmovo  0x90909090(%eax),%edx,%r8d
> +cmovno 0x90909090(%eax),%edx,%r8d
> +cmovb  0x90909090(%eax),%edx,%r8d
> +cmovae 0x90909090(%eax),%edx,%r8d
> +cmove  0x90909090(%eax),%edx,%r8d
> +cmovne 0x90909090(%eax),%edx,%r8d
> +cmovbe 0x90909090(%eax),%edx,%r8d
> +cmova  0x90909090(%eax),%edx,%r8d
> +cmovs  0x90909090(%eax),%edx,%r8d
> +cmovns 0x90909090(%eax),%edx,%r8d
> +cmovp  0x90909090(%eax),%edx,%r8d
> +cmovnp 0x90909090(%eax),%edx,%r8d
> +cmovl  0x90909090(%eax),%edx,%r8d
> +cmovge 0x90909090(%eax),%edx,%r8d
> +cmovle 0x90909090(%eax),%edx,%r8d
> +cmovg  0x90909090(%eax),%edx,%r8d
> +imul   0x90909(%eax),%edx,%r8d
> +imul   0x909(%rax,%r31,8),%rdx,%r25

What about imul by immediate? The present spec is quite unclear there:
The insn page says {ND=ZU} and the table says 0/1 in the ND column.

> +.byte 0x62,0xf4,0xfc,0x08,0xff,0xc0  #inc %rax
> +.byte 0x62,0xf4,0xec,0x08,0xff,0xc0  #bad

As before, please avoid .byte whenever possible. And please have a more
detailed comment as to what is being encoded, when .byte cannot be
avoided. Plus, if at all possible, have "bad" tests live in separate
testcases from "good" ones.

> --- a/opcodes/i386-dis-evex-prefix.h
> +++ b/opcodes/i386-dis-evex-prefix.h

Once again I'll reply to disassembler changes separately.

> --- a/opcodes/i386-opc.h
> +++ b/opcodes/i386-opc.h
> @@ -960,6 +960,7 @@ typedef struct insn_template
>  /* The next value is arbitrary, as long as it's non-zero and distinct
>     from all other values above.  */
>  #define Opcode_VexW	0xf /* Operand order controlled by VEX.W. */
> +#define Opcode_APX_NDDD	0x11 /* Direction bit for APX NDD insns. */

The comment talks of a single bit, but the value has two bits set.
Plus in the code you also don't use this constant as described by the
comment. Aiui like for Opcode_VexW the value is really arbitrary, just
as long as it's different from others. In which case I'd rather
suggest using e.g. 0xe (if, unlike suggested above, Opcode_D cannot be
re-used).

Also I don't think there's a need for three D-s in the name.

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl

Comments given on the earlier patch apply here (and elsewhere) as well.

Jan
  
Jan Beulich Sept. 28, 2023, 7:57 a.m. UTC | #2
On 19.09.2023 17:25, Cui, Lili wrote:
> --- a/opcodes/i386-dis-evex-prefix.h
> +++ b/opcodes/i386-dis-evex-prefix.h
> @@ -353,8 +353,8 @@
>    /* PREFIX_EVEX_MAP4_66 */
>    {
>      { MOD_TABLE (MOD_EVEX_MAP4_66_PREFIX_0) },
> -    { "adoxS",	{ Gdq, Edq }, 0 },
> -    { "adcxS",	{ Gdq, Edq }, 0 },
> +    { "adoxS",	{ VexGdq, Gdq, Edq }, 0 },
> +    { "adcxS",	{ VexGdq, Gdq, Edq }, 0 },

With the OP_VEX() change moved to the earlier patch, you wouldn't need to
alter again right away what was just added there.

> --- a/opcodes/i386-dis-evex-reg.h
> +++ b/opcodes/i386-dis-evex-reg.h
> @@ -56,6 +56,105 @@
>      { "blsmskS",	{ VexGdq, Edq }, 0 },
>      { "blsiS",		{ VexGdq, Edq }, 0 },
>    },
> +  /* REG_EVEX_MAP4_80 */
> +  {
> +    { "addA",	{ VexGb, Eb, Ib }, 0 },
> +    { "orA",	{ VexGb, Eb, Ib }, 0 },
> +    { "adcA",	{ VexGb, Eb, Ib }, 0 },
> +    { "sbbA",	{ VexGb, Eb, Ib }, 0 },

Aren't these two and other adc/sbb entries required already in the earlier
patch, for consistency with what you add there on the assembler side?

> +    { "andA",	{ VexGb, Eb, Ib }, 0 },
> +    { "subA",	{ VexGb, Eb, Ib }, 0 },
> +    { "xorA",	{ VexGb, Eb, Ib }, 0 },
> +    { Bad_Opcode },
> +  },
> +  /* REG_EVEX_MAP4_81 */
> +  {
> +    { "addQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "orQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "adcQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "sbbQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "andQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "subQ",	{ VexGv, Ev, Iv }, 0 },
> +    { "xorQ",	{ VexGv, Ev, Iv }, 0 },
> +    { Bad_Opcode },
> +  },
> +  /* REG_EVEX_MAP4_83 */
> +  {
> +    { "addQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "orQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "adcQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "sbbQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "andQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "subQ",	{ VexGv, Ev, sIb }, 0 },
> +    { "xorQ",	{ VexGv, Ev, sIb }, 0 },
> +    { Bad_Opcode },
> +  },

While these are needed because of the differences (from existing table
entries we have) in the last entry (albeit I don't think those last
entries need to actually be spelled out; you don't spell out trailing
invalid entries further down, and we don't do so elsewhere either), ...

> +  /* REG_EVEX_MAP4_C0 */
> +  {
> +    { "rolA",	{ VexGb, Eb, Ib }, 0 },
> +    { "rorA",	{ VexGb, Eb, Ib }, 0 },
> +    { "rclA",	{ VexGb, Eb, Ib }, 0 },
> +    { "rcrA",	{ VexGb, Eb, Ib }, 0 },
> +    { "shlA",	{ VexGb, Eb, Ib }, 0 },
> +    { "shrA",	{ VexGb, Eb, Ib }, 0 },
> +    { "shlA",	{ VexGb, Eb, Ib }, 0 },
> +    { "sarA",	{ VexGb, Eb, Ib }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_C1 */
> +  {
> +    { "rolQ",	{ VexGv, Ev, Ib }, 0 },
> +    { "rorQ",	{ VexGv, Ev, Ib }, 0 },
> +    { "rclQ",	{ VexGv, Ev, Ib }, 0 },
> +    { "rcrQ",	{ VexGv, Ev, Ib }, 0 },
> +    { "shlQ",	{ VexGv, Ev, Ib }, 0 },
> +    { "shrQ",	{ VexGv, Ev, Ib }, 0 },
> +    { "shlQ",	{ VexGv, Ev, Ib }, 0 },
> +    { "sarQ",	{ VexGv, Ev, Ib }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_D0 */
> +  {
> +    { "rolA",	{ VexGb, Eb, I1 }, 0 },
> +    { "rorA",	{ VexGb, Eb, I1 }, 0 },
> +    { "rclA",	{ VexGb, Eb, I1 }, 0 },
> +    { "rcrA",	{ VexGb, Eb, I1 }, 0 },
> +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> +    { "shrA",	{ VexGb, Eb, I1 }, 0 },
> +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> +    { "sarA",	{ VexGb, Eb, I1 }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_D1 */
> +  {
> +    { "rolQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "rorQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "rclQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "rcrQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "shrQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> +    { "sarQ",	{ VexGv, Ev, I1 }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_D2 */
> +  {
> +    { "rolA",	{ VexGb, Eb, CL }, 0 },
> +    { "rorA",	{ VexGb, Eb, CL }, 0 },
> +    { "rclA",	{ VexGb, Eb, CL }, 0 },
> +    { "rcrA",	{ VexGb, Eb, CL }, 0 },
> +    { "shlA",	{ VexGb, Eb, CL }, 0 },
> +    { "shrA",	{ VexGb, Eb, CL }, 0 },
> +    { "shlA",	{ VexGb, Eb, CL }, 0 },
> +    { "sarA",	{ VexGb, Eb, CL }, 0 },
> +  },
> +  /* REG_EVEX_MAP4_D3 */
> +  {
> +    { "rolQ",	{ VexGv, Ev, CL }, 0 },
> +    { "rorQ",	{ VexGv, Ev, CL }, 0 },
> +    { "rclQ",	{ VexGv, Ev, CL }, 0 },
> +    { "rcrQ",	{ VexGv, Ev, CL }, 0 },
> +    { "shlQ",	{ VexGv, Ev, CL }, 0 },
> +    { "shrQ",	{ VexGv, Ev, CL }, 0 },
> +    { "shlQ",	{ VexGv, Ev, CL }, 0 },
> +    { "sarQ",	{ VexGv, Ev, CL }, 0 },
> +  },

... do we really need all these new entries? OP_VEX() checks need_vex first
thing, so simply adjusting and then re-using the existing entries would seem
possible.

> @@ -9070,6 +9085,14 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  	  ins->rex &= ~REX_B;
>  	  ins->rex2 &= ~REX_R;
>  	}
> +      if (ins->evex_type == evex_from_legacy)
> +	{
> +	  if (ins->vex.ll || ins->vex.zeroing
> +	      || (!ins->vex.b && (ins->vex.register_specifier
> +				  || !ins->vex.v)))
> +	    return &bad_opcode;

Don't these checks also apply to evex_from_vex?

> +	  ins->rex |= REX_OPCODE;

This, otoh, may truly be evex_from_legacy-only (but I'm not entirely
certain).

> @@ -9080,7 +9103,7 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  	return &err_opcode;
>  
>        /* Set vector length.  */
> -      if (ins->modrm.mod == 3 && ins->vex.b)
> +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type == evex_default)
>  	ins->vex.length = 512;

Down from here, still in the same function, there's another vex.b
check which I think also wants qualifying.

> @@ -10994,7 +11017,7 @@ print_displacement (instr_info *ins, bfd_signed_vma val)
>  static void
>  intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>  {
> -  if (ins->vex.b)
> +  if (ins->vex.b && ins->evex_type != evex_from_legacy)

Wouldn't this better be ins->evex_type == evex_default, ...

> @@ -11928,7 +11951,8 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>  	ins->vex.no_broadcast = true;
>  
>        if (!ins->vex.no_broadcast
> -	  && (!ins->intel_syntax || !(ins->evex_used & EVEX_len_used)))
> +	  && (!ins->intel_syntax || !(ins->evex_used & EVEX_len_used))
> +	  && ins->evex_type == evex_default)

... just like you have it here?

However, for this change, doesn't this need moving to the enclosing if()?
You should accidentally set EVEX_b_used here for APX insns.

> @@ -13280,6 +13304,14 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>    if (!ins->need_vex)
>      return true;
>  
> +  if (ins->evex_type == evex_from_legacy)
> +    {
> +      if (ins->vex.b)
> +	ins->evex_used |= EVEX_b_used;
> +      else
> +	 return true;
> +    }

When you reuse fields or definitions in places where their names don't
match their purpose (the field dealt with here is "nd" after all, not
"b"), a comment wants adding. There's also something odd with
indentation here, but I suppose an if/else construct isn't needed in
the first place.

Jan
  
Cui, Lili Oct. 22, 2023, 2:05 p.m. UTC | #3
> >  gas/config/tc-i386.c                    |  80 ++++++++----
> >  gas/testsuite/gas/i386/x86-64-apx-ndd.d | 165
> > ++++++++++++++++++++++++  gas/testsuite/gas/i386/x86-64-apx-ndd.s |
> > 156 ++++++++++++++++++++++  gas/testsuite/gas/i386/x86-64-pseudos.d |
> > 42 ++++++  gas/testsuite/gas/i386/x86-64-pseudos.s |  43 ++++++
> >  gas/testsuite/gas/i386/x86-64.exp       |   1 +
> >  opcodes/i386-dis-evex-prefix.h          |   4 +-
> >  opcodes/i386-dis-evex-reg.h             | 123 ++++++++++++++++++
> >  opcodes/i386-dis-evex.h                 | 124 +++++++++---------
> >  opcodes/i386-dis.c                      |  47 ++++++-
> >  opcodes/i386-opc.h                      |   1 +
> >  opcodes/i386-opc.tbl                    |  67 ++++++++++
> >  12 files changed, 762 insertions(+), 91 deletions(-)  create mode
> > 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
> >
> > diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index
> > 48916bc3846..381e389bb04 100644
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -2261,8 +2261,9 @@ operand_size_match (const insn_template *t)
> >        unsigned int given = i.operands - j - 1;
> >
> >        /* For FMA4 and XOP insns VEX.W controls just the first two
> > -	 register operands.  */
> > -      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
> > +	 register operands. And APX insns just swap the first operands.  */
> > +      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
> > +	  || (is_cpu (t,CpuAPX_F) && i.operands == 3))
> >  	given = j < 2 ? 1 - j : j;
> 
> In the comment, how about "And APX_F insns just swap the two source
> operands, with the 3rd one being the destination"?
>
 
Done.

> Is the "i.operands == 3" part of the condition really needed? I.e. are there any
> APX_F insns which can make it here but must not take this path? Afaict 2-
> operand insns are fine to go here, and more-than-3-operand insns don't come
> with the D attribute.
> 
You're right, deleted "i.operands == 3".

> Also (nit) there's a missing blank after the comma.
> 

Done.

> > @@ -3876,6 +3877,7 @@ is_any_apx_encoding (void)  {
> >    return i.rex2
> >      || i.rex2_encoding
> > +    || i.tm.opcode_space == SPACE_EVEXMAP4
> >      || (i.vex.register_specifier
> >  	&& i.vex.register_specifier->reg_flags & RegRex2);  } @@ -4204,6
> > +4206,10 @@ build_legacy_insns_with_apx_encoding (void)
> >      }
> >
> >    build_evex_insns_with_extend_evex_prefix ();
> > +
> > +  /* Encode the NDD bit.  */
> > +  if (i.vex.register_specifier)
> > +    i.vex.bytes[3] |= 0x10;
> >  }
> >
> >  static void
> > @@ -7383,26 +7389,31 @@ match_template (char mnem_suffix)
> >  	  overlap1 = operand_type_and (operand_types[0],
> operand_types[1]);
> >  	  if (t->opcode_modifier.d && i.reg_operands == i.operands
> >  	      && !operand_type_all_zero (&overlap1))
> > -	    switch (i.dir_encoding)
> > -	      {
> > -	      case dir_encoding_load:
> > -		if (operand_type_check (operand_types[i.operands - 1],
> anymem)
> > -		    || t->opcode_modifier.regmem)
> > -		  goto check_reverse;
> > -		break;
> > +	    {
> > +	      int isMemOperand = (t->opcode_modifier.vexvvvv
> > +				  && t->opcode_space == SPACE_EVEXMAP4)
> > +				  ? i.operands - 2 : i.operands - 1;
> 
> "is" in the variable name is properly misleading. What you're determining
> here is which operand you want to _check_ for being the memory operand.
> 

Changed to MemOperand.
 
> > +				
> As to the condition, the two side of && may want swapping: In such a
> condition it is generally desirable to have the more restricting part first. Plus
> this may be more neat to express without ?: anyway:
> 
> i.operands - 1 - (t->opcode_space == SPACE_EVEXMAP4 && t-
> >opcode_modifier.vexvvvv)
> 
> (suitably line wrapped of course).
> 

Done.

> > +	      switch (i.dir_encoding)
> > +		{
> > +		case dir_encoding_load:
> > +		  if (operand_type_check (operand_types[isMemOperand],
> anymem)
> > +		      || t->opcode_modifier.regmem)
> > +		    goto check_reverse;
> > +		  break;
> >
> > -	      case dir_encoding_store:
> > -		if (!operand_type_check (operand_types[i.operands - 1],
> anymem)
> > -		    && !t->opcode_modifier.regmem)
> > -		  goto check_reverse;
> > -		break;
> > +		case dir_encoding_store:
> > +		  if (!operand_type_check (operand_types[isMemOperand],
> anymem)
> > +		      && !t->opcode_modifier.regmem)
> > +		    goto check_reverse;
> > +		  break;
> >
> > -	      case dir_encoding_swap:
> > -		goto check_reverse;
> > +		case dir_encoding_swap:
> > +		  goto check_reverse;
> >
> > -	      case dir_encoding_default:
> > -		break;
> > -	      }
> > +		case dir_encoding_default:
> > +		  break;
> > +		}
> > +	    }
> >  	  /* If we want store form, we skip the current load.  */
> >  	  if ((i.dir_encoding == dir_encoding_store
> >  	       || i.dir_encoding == dir_encoding_swap) @@ -7432,11 +7443,13
> > @@ match_template (char mnem_suffix)
> >  		continue;
> >  	      /* Try reversing direction of operands.  */
> >  	      j = is_cpu (t, CpuFMA4)
> > -		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
> > +		  || is_cpu (t, CpuXOP)
> > +		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
> >  	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
> >  	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
> >  	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
> > -	      gas_assert (t->operands != 3 || !check_register);
> > +	      gas_assert (t->operands != 3 || !check_register
> > +			  || is_cpu (t,CpuAPX_F));
> 
> Nit: Missing blank again.

Done.

> 
> > @@ -7471,6 +7484,12 @@ match_template (char mnem_suffix)
> >  		  found_reverse_match = Opcode_VexW;
> >  		  goto check_operands_345;
> >  		}
> > +	      else if (is_cpu (t,CpuAPX_F)
> > +		       && i.operands == 3)
> > +		{
> > +		  found_reverse_match = Opcode_APX_NDDD;
> > +		  goto check_operands_345;
> > +		}
> >  	      else if (t->opcode_space != SPACE_BASE
> >  		       && (t->opcode_space != SPACE_0F
> >  			   /* MOV to/from CR/DR/TR, as an exception, follow
> @@ -7636,6
> > +7655,15 @@ match_template (char mnem_suffix)
> >  	 flipping VEX.W.  */
> >        i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
> >
> > +      j = i.tm.operand_types[0].bitfield.imm8;
> > +      i.tm.operand_types[j] = operand_types[j + 1];
> > +      i.tm.operand_types[j + 1] = operand_types[j];
> > +      break;
> 
> I'm not overly happy to see this code getting duplicated. Are there any
> encodings at all which have D and and immediate operand? I don't think so, in
> which case this at least wants simplifying. But read on.
> 
> > +    case Opcode_APX_NDDD:
> > +      /* Only the first two register operands need reversing.  */
> > +      i.tm.base_opcode ^= 0x2;
> 
> I think you mean Opcode_D here?
> 
> >        j = i.tm.operand_types[0].bitfield.imm8;
> >        i.tm.operand_types[j] = operand_types[j + 1];
> >        i.tm.operand_types[j + 1] = operand_types[j];
> 
> Taking both remarks together, do we need Opcode_APX_NDDD at all? Can't
> you use the ordinary Opcode_D, with
> 
>     default:
>       /* If we found a reverse match we must alter the opcode direction
> 	 bit and clear/flip the regmem modifier one.  found_reverse_match
> 	 holds bits to change (different for int & float insns).  */
> 
>       i.tm.base_opcode ^= found_reverse_match;
> 
>       if (i.tm.opcode_space == SPACE_EVEXMAP4 && i.operands == 3)
>         goto swap_first_2;
>     ...
>     swap_first_2:
>       j = i.tm.operand_types[0].bitfield.imm8;
>       i.tm.operand_types[j] = operand_types[j + 1];
>       i.tm.operand_types[j + 1] = operand_types[j];
>       break;
> 
> ? (I'm not convinced the i.operands == 3 part of the condition is needed; if at
> all possible it wants omitting.)
> 

Your suggestion is indeed better than before. It worked without "i.operands == 3".

> > @@ -8462,8 +8490,8 @@ process_operands (void)
> >    const reg_entry *default_seg = NULL;
> >
> >    /* We only need to check those implicit registers for instructions
> > -     with 3 operands or less.  */
> > -  if (i.operands <= 3)
> > +     with 4 operands or less.  */
> > +  if (i.operands <= 4)
> >      for (unsigned int j = 0; j < i.operands; j++)
> >        if (i.types[j].bitfield.instance != InstanceNone)
> >  	i.reg_operands--;
> 
> How useful is it to keep the outer if() when 4-operand insns now also need
> checking? There are extremely few 5-operand ones ...
> 

Deleted it.

> > @@ -8825,6 +8853,9 @@ build_modrm_byte (void)
> >        break;
> >    if (v >= dest)
> >      v = ~0;
> > +  if (i.tm.opcode_space == SPACE_EVEXMAP4
> > +      && i.tm.opcode_modifier.vexvvvv)
> > +    v = dest;
> >    if (i.tm.extension_opcode != None)
> >      {
> >        if (dest != source)
> > @@ -9088,6 +9119,9 @@ build_modrm_byte (void)
> >        set_rex_vrex (i.op[op].regs, REX_B, false);
> >  	}
> >
> > +      if (i.tm.opcode_space == SPACE_EVEXMAP4
> > +	  && i.tm.opcode_modifier.vexvvvv)
> > +	dest--;
> >        if (op == dest)
> >  	dest = ~0;
> >        if (op == source)
> 
> These two changes are at the very least problematic with .insn, whose
> behavior may not change. I'd also prefer if we could get away with just one
> change to the function. Did you consider alternatives? We could re- widen
> VexVVVV, such that the value 2 indicates that the destination is encoded there.
> That then also has no chance of conflicting with .insn.
> 
I added value 2 for NDD, if it's ok, I will create another patch to move  (i.tm.extension_opcode != None) to VexVVVVDEST branch, and use value 3 instead of SWAP_SOURCES, maybe name it VexVVVVSRC1, or just VexVVVVOP1, VexVVVVOP2 and VexVVVVOP3?

  /* How to encode VEX.vvvv:
     0: VEX.vvvv must be 1111b.
     1: VEX.vvvv encodes one of the register operands.
     2: VEX.vvvv encodes as the dest register operands.
   */
#define VexVVVVSRC   1
#define VexVVVVDEST  2
  VexVVVV,


  if (i.tm.opcode_modifier.vexvvvv == VexVVVVDEST)
    {
      v = dest;
      dest-- ;
    }
  else if (i.tm.opcode_modifier.vexvvvv == VexVVVVSRC)
    {
      v = source + 1;
      for (v = source + 1; v < dest; ++v)
        if (v != reg_slot)
          break;
      if (i.tm.extension_opcode != None)
        {
          if (dest != source)
            v = dest;
          dest = ~0;
        }
      gas_assert (source < dest);
      if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
          && source != op)
        {
          unsigned int tmp = source;

          source = v;
          v = tmp;
        }
    }
  else
    v = ~0; 

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> > @@ -0,0 +1,156 @@
> > +# Check 64bit APX NDD instructions with evex prefix encoding
> > +
> > +	.allow_index_reg
> > +	.text
> > +_start:
> > +cmovge 0x90909090(%eax),%edx,%r8d
> > +cmovle 0x90909090(%eax),%edx,%r8d
> > +cmovg  0x90909090(%eax),%edx,%r8d
> > +imul   0x90909(%eax),%edx,%r8d
> > +imul   0x909(%rax,%r31,8),%rdx,%r25
> 
> What about imul by immediate? The present spec is quite unclear there:
> The insn page says {ND=ZU} and the table says 0/1 in the ND column.
>

We don't support it yet, I put it in RFC.
...
2. Support APX ZU   -- In progress
3. Support APX CCMP and CTEST -- In progress
...

About 0/1 in the ND column, it means ZU can be 0/1.

IMUL with opcodes 0x69 and 0x6B in map 0 and SETcc instructions
Although these instructions do not support NDD, the EVEX.ND bit is used to control whether its
destination register has its upper bits (namely, bits [63:OSIZE]) zeroed when OSIZE is 8b or 16b.
That is, if EVEX.ND = 1, the upper bits are always zeroed; otherwise, they keep the old values
when OSIZE is 8b or 16b. For these instructions, EVEX.[V4,V3,V2,V1,V0] must be all zero.

> > +.byte 0x62,0xf4,0xfc,0x08,0xff,0xc0  #inc %rax .byte
> > +0x62,0xf4,0xec,0x08,0xff,0xc0  #bad
> 
> As before, please avoid .byte whenever possible. And please have a more
> detailed comment as to what is being encoded, when .byte cannot be avoided.
> Plus, if at all possible, have "bad" tests live in separate testcases from "good"
> ones.
> 

This case wants to test that inc supports evex format without GPR32,  patch part II 1/6 will cover it. The first case has been removed and the second case has been added to x86-64-apx-evex-promoted-bad.s.
 
> > --- a/opcodes/i386-dis-evex-prefix.h
> > +++ b/opcodes/i386-dis-evex-prefix.h
> 
> Once again I'll reply to disassembler changes separately.
> 

Ok.

> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -960,6 +960,7 @@ typedef struct insn_template
> >  /* The next value is arbitrary, as long as it's non-zero and distinct
> >     from all other values above.  */
> >  #define Opcode_VexW	0xf /* Operand order controlled by VEX.W. */
> > +#define Opcode_APX_NDDD	0x11 /* Direction bit for APX NDD insns. */
> 
> The comment talks of a single bit, but the value has two bits set.
> Plus in the code you also don't use this constant as described by the
> comment. Aiui like for Opcode_VexW the value is really arbitrary, just as long
> as it's different from others. In which case I'd rather suggest using e.g. 0xe (if,
> unlike suggested above, Opcode_D cannot be re-used).
> 
> Also I don't think there's a need for three D-s in the name.
> 

Deleted Opcode_APX_NDDD.

> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> 
> Comments given on the earlier patch apply here (and elsewhere) as well.
> 
> Jan
  
Cui, Lili Oct. 22, 2023, 2:57 p.m. UTC | #4
> Subject: Re: [PATCH 4/8] Support APX NDD
> 
> On 19.09.2023 17:25, Cui, Lili wrote:
> > --- a/opcodes/i386-dis-evex-prefix.h
> > +++ b/opcodes/i386-dis-evex-prefix.h
> > @@ -353,8 +353,8 @@
> >    /* PREFIX_EVEX_MAP4_66 */
> >    {
> >      { MOD_TABLE (MOD_EVEX_MAP4_66_PREFIX_0) },
> > -    { "adoxS",	{ Gdq, Edq }, 0 },
> > -    { "adcxS",	{ Gdq, Edq }, 0 },
> > +    { "adoxS",	{ VexGdq, Gdq, Edq }, 0 },
> > +    { "adcxS",	{ VexGdq, Gdq, Edq }, 0 },
> 
> With the OP_VEX() change moved to the earlier patch, you wouldn't need to
> alter again right away what was just added there.
>
Hi Jan,

I just wanted to move the instructions in EVEX involving NDD into this patch. I think they should belong to this NDD patch.  As in your second comment, I also found some instructions for the disassembler missing in the APX evex patch. 

Lili.

> > --- a/opcodes/i386-dis-evex-reg.h
> > +++ b/opcodes/i386-dis-evex-reg.h
> > @@ -56,6 +56,105 @@
> >      { "blsmskS",	{ VexGdq, Edq }, 0 },
> >      { "blsiS",		{ VexGdq, Edq }, 0 },
> >    },
> > +  /* REG_EVEX_MAP4_80 */
> > +  {
> > +    { "addA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "orA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "adcA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "sbbA",	{ VexGb, Eb, Ib }, 0 },
> 
> Aren't these two and other adc/sbb entries required already in the earlier
> patch, for consistency with what you add there on the assembler side?
>
  
Jan Beulich Oct. 23, 2023, 7:12 a.m. UTC | #5
On 22.10.2023 16:05, Cui, Lili wrote:
>>> @@ -8825,6 +8853,9 @@ build_modrm_byte (void)
>>>        break;
>>>    if (v >= dest)
>>>      v = ~0;
>>> +  if (i.tm.opcode_space == SPACE_EVEXMAP4
>>> +      && i.tm.opcode_modifier.vexvvvv)
>>> +    v = dest;
>>>    if (i.tm.extension_opcode != None)
>>>      {
>>>        if (dest != source)
>>> @@ -9088,6 +9119,9 @@ build_modrm_byte (void)
>>>        set_rex_vrex (i.op[op].regs, REX_B, false);
>>>  	}
>>>
>>> +      if (i.tm.opcode_space == SPACE_EVEXMAP4
>>> +	  && i.tm.opcode_modifier.vexvvvv)
>>> +	dest--;
>>>        if (op == dest)
>>>  	dest = ~0;
>>>        if (op == source)
>>
>> These two changes are at the very least problematic with .insn, whose
>> behavior may not change. I'd also prefer if we could get away with just one
>> change to the function. Did you consider alternatives? We could re- widen
>> VexVVVV, such that the value 2 indicates that the destination is encoded there.
>> That then also has no chance of conflicting with .insn.
>>
> I added value 2 for NDD, if it's ok, I will create another patch to move  (i.tm.extension_opcode != None) to VexVVVVDEST branch, and use value 3 instead of SWAP_SOURCES, maybe name it VexVVVVSRC1, or just VexVVVVOP1, VexVVVVOP2 and VexVVVVOP3?

Hard to tell without actually seeing the code. What you did quote below
used just SRC/DEST.

>   /* How to encode VEX.vvvv:
>      0: VEX.vvvv must be 1111b.
>      1: VEX.vvvv encodes one of the register operands.
>      2: VEX.vvvv encodes as the dest register operands.
>    */
> #define VexVVVVSRC   1
> #define VexVVVVDEST  2
>   VexVVVV,

For readability maybe insert underscores (and if SRC remains un-numbered,
maybe also DST in place of DEST)?

>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
>>> @@ -0,0 +1,156 @@
>>> +# Check 64bit APX NDD instructions with evex prefix encoding
>>> +
>>> +	.allow_index_reg
>>> +	.text
>>> +_start:
>>> +cmovge 0x90909090(%eax),%edx,%r8d
>>> +cmovle 0x90909090(%eax),%edx,%r8d
>>> +cmovg  0x90909090(%eax),%edx,%r8d
>>> +imul   0x90909(%eax),%edx,%r8d
>>> +imul   0x909(%rax,%r31,8),%rdx,%r25
>>
>> What about imul by immediate? The present spec is quite unclear there:
>> The insn page says {ND=ZU} and the table says 0/1 in the ND column.
>>
> 
> We don't support it yet, I put it in RFC.
> ...
> 2. Support APX ZU   -- In progress
> 3. Support APX CCMP and CTEST -- In progress
> ...
> 
> About 0/1 in the ND column, it means ZU can be 0/1.
> 
> IMUL with opcodes 0x69 and 0x6B in map 0 and SETcc instructions
> Although these instructions do not support NDD, the EVEX.ND bit is used to control whether its
> destination register has its upper bits (namely, bits [63:OSIZE]) zeroed when OSIZE is 8b or 16b.
> That is, if EVEX.ND = 1, the upper bits are always zeroed; otherwise, they keep the old values
> when OSIZE is 8b or 16b. For these instructions, EVEX.[V4,V3,V2,V1,V0] must be all zero.

So ZU indeed isn't just a typo there. For 32- and 64-bit forms, is EVEX.ND
then simply being ignored? The ZU really is meaningful only for 16-bit forms,
aiui ...

>>> +.byte 0x62,0xf4,0xfc,0x08,0xff,0xc0  #inc %rax .byte
>>> +0x62,0xf4,0xec,0x08,0xff,0xc0  #bad
>>
>> As before, please avoid .byte whenever possible. And please have a more
>> detailed comment as to what is being encoded, when .byte cannot be avoided.
>> Plus, if at all possible, have "bad" tests live in separate testcases from "good"
>> ones.
>>
> 
> This case wants to test that inc supports evex format without GPR32,  patch part II 1/6 will cover it. The first case has been removed and the second case has been added to x86-64-apx-evex-promoted-bad.s.

That's all fine, but still wants expressing with .insn rather than .byte,
if at all possible.

Jan
  
Cui, Lili Oct. 24, 2023, 11:39 a.m. UTC | #6
> Subject: Re: [PATCH 4/8] Support APX NDD
> 
> On 19.09.2023 17:25, Cui, Lili wrote:
> > --- a/opcodes/i386-dis-evex-prefix.h
> > +++ b/opcodes/i386-dis-evex-prefix.h
> > @@ -353,8 +353,8 @@
> >    /* PREFIX_EVEX_MAP4_66 */
> >    {
> >      { MOD_TABLE (MOD_EVEX_MAP4_66_PREFIX_0) },
> > -    { "adoxS",	{ Gdq, Edq }, 0 },
> > -    { "adcxS",	{ Gdq, Edq }, 0 },
> > +    { "adoxS",	{ VexGdq, Gdq, Edq }, 0 },
> > +    { "adcxS",	{ VexGdq, Gdq, Edq }, 0 },
> 
> With the OP_VEX() change moved to the earlier patch, you wouldn't need to
> alter again right away what was just added there.
> 

As I commented in another email, I want to move instructions like adcx into this NDD patch. Do you think it is ok?

> > --- a/opcodes/i386-dis-evex-reg.h
> > +++ b/opcodes/i386-dis-evex-reg.h
> > @@ -56,6 +56,105 @@
> >      { "blsmskS",	{ VexGdq, Edq }, 0 },
> >      { "blsiS",		{ VexGdq, Edq }, 0 },
> >    },
> > +  /* REG_EVEX_MAP4_80 */
> > +  {
> > +    { "addA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "orA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "adcA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "sbbA",	{ VexGb, Eb, Ib }, 0 },
> 
> Aren't these two and other adc/sbb entries required already in the earlier
> patch, for consistency with what you add there on the assembler side?
> 
> > +    { "andA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "subA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "xorA",	{ VexGb, Eb, Ib }, 0 },
> > +    { Bad_Opcode },
> > +  },
> > +  /* REG_EVEX_MAP4_81 */
> > +  {
> > +    { "addQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "orQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "adcQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "sbbQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "andQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "subQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { "xorQ",	{ VexGv, Ev, Iv }, 0 },
> > +    { Bad_Opcode },
> > +  },
> > +  /* REG_EVEX_MAP4_83 */
> > +  {
> > +    { "addQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "orQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "adcQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "sbbQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "andQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "subQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { "xorQ",	{ VexGv, Ev, sIb }, 0 },
> > +    { Bad_Opcode },
> > +  },
> 
> While these are needed because of the differences (from existing table entries
> we have) in the last entry (albeit I don't think those last entries need to
> actually be spelled out; you don't spell out trailing invalid entries further
> down, and we don't do so elsewhere either), ...

Done.

> 
> > +  /* REG_EVEX_MAP4_C0 */
> > +  {
> > +    { "rolA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "rorA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "rclA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "rcrA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "shlA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "shrA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "shlA",	{ VexGb, Eb, Ib }, 0 },
> > +    { "sarA",	{ VexGb, Eb, Ib }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_C1 */
> > +  {
> > +    { "rolQ",	{ VexGv, Ev, Ib }, 0 },
> > +    { "rorQ",	{ VexGv, Ev, Ib }, 0 },
> > +    { "rclQ",	{ VexGv, Ev, Ib }, 0 },
> > +    { "rcrQ",	{ VexGv, Ev, Ib }, 0 },
> > +    { "shlQ",	{ VexGv, Ev, Ib }, 0 },
> > +    { "shrQ",	{ VexGv, Ev, Ib }, 0 },
> > +    { "shlQ",	{ VexGv, Ev, Ib }, 0 },
> > +    { "sarQ",	{ VexGv, Ev, Ib }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_D0 */
> > +  {
> > +    { "rolA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "rorA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "rclA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "rcrA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "shrA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "shlA",	{ VexGb, Eb, I1 }, 0 },
> > +    { "sarA",	{ VexGb, Eb, I1 }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_D1 */
> > +  {
> > +    { "rolQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "rorQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "rclQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "rcrQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "shrQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
> > +    { "sarQ",	{ VexGv, Ev, I1 }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_D2 */
> > +  {
> > +    { "rolA",	{ VexGb, Eb, CL }, 0 },
> > +    { "rorA",	{ VexGb, Eb, CL }, 0 },
> > +    { "rclA",	{ VexGb, Eb, CL }, 0 },
> > +    { "rcrA",	{ VexGb, Eb, CL }, 0 },
> > +    { "shlA",	{ VexGb, Eb, CL }, 0 },
> > +    { "shrA",	{ VexGb, Eb, CL }, 0 },
> > +    { "shlA",	{ VexGb, Eb, CL }, 0 },
> > +    { "sarA",	{ VexGb, Eb, CL }, 0 },
> > +  },
> > +  /* REG_EVEX_MAP4_D3 */
> > +  {
> > +    { "rolQ",	{ VexGv, Ev, CL }, 0 },
> > +    { "rorQ",	{ VexGv, Ev, CL }, 0 },
> > +    { "rclQ",	{ VexGv, Ev, CL }, 0 },
> > +    { "rcrQ",	{ VexGv, Ev, CL }, 0 },
> > +    { "shlQ",	{ VexGv, Ev, CL }, 0 },
> > +    { "shrQ",	{ VexGv, Ev, CL }, 0 },
> > +    { "shlQ",	{ VexGv, Ev, CL }, 0 },
> > +    { "sarQ",	{ VexGv, Ev, CL }, 0 },
> > +  },
> 
> ... do we really need all these new entries? OP_VEX() checks need_vex first
> thing, so simply adjusting and then re-using the existing entries would seem
> possible.
> 

Done.

> > @@ -9070,6 +9085,14 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >  	  ins->rex &= ~REX_B;
> >  	  ins->rex2 &= ~REX_R;
> >  	}
> > +      if (ins->evex_type == evex_from_legacy)
> > +	{
> > +	  if (ins->vex.ll || ins->vex.zeroing
> > +	      || (!ins->vex.b && (ins->vex.register_specifier
> > +				  || !ins->vex.v)))
> > +	    return &bad_opcode;
> 
> Don't these checks also apply to evex_from_vex?

Oh, I've duplicated some of the checks in the above code and removed the duplicated parts.

     if (ins->evex_type == evex_from_legacy)
        {
          /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
             all bits of EVEX.vvvv and EVEX.V' must be 1.  */
          if (!ins->vex.b && (ins->vex.register_specifier
                                  || !ins->vex.v))
            return &bad_opcode;
          ins->rex |= REX_OPCODE;
        }

      /* EVEX from legacy instructions require that EVEX.L’L, EVEX.z and the
         lower 2 bits of EVEX.aaa must be 0.
         EVEX from evex instrucions require that EVEX.L’L and the lower 2 bits of
         EVEX.aaa must be 0.  */
      if (ins->evex_type == evex_from_legacy || ins->evex_type == evex_from_vex)
        {
          if ((*ins->codep & 0x3) != 0
              || (*ins->codep >> 6 & 0x3) != 0
              || (ins->evex_type == evex_from_legacy
                  && (*ins->codep >> 5 & 0x1) != 0)
              || (ins->evex_type == evex_from_vex
                  && !ins->vex.b))
            return &bad_opcode;
        }
> 
> > +	  ins->rex |= REX_OPCODE;
> 
> This, otoh, may truly be evex_from_legacy-only (but I'm not entirely certain).
>

Added after separation.
 
> > @@ -9080,7 +9103,7 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >  	return &err_opcode;
> >
> >        /* Set vector length.  */
> > -      if (ins->modrm.mod == 3 && ins->vex.b)
> > +      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type ==
> > + evex_default)
> >  	ins->vex.length = 512;
> 
> Down from here, still in the same function, there's another vex.b check which I
> think also wants qualifying.
> 

Not found in this function, but found a place that needs to be qualified in print_insn.

          /* Check whether rounding control was enabled for an insn not
             supporting it.  */
          if (ins.modrm.mod == 3 && ins.vex.b
              && !(ins.evex_used & EVEX_b_used))
            {
              for (i = 0; i < MAX_OPERANDS; ++i)
                {
                  ins.obufp = ins.op_out[i];
                  if (*ins.obufp)
                    continue;
                  oappend (&ins, names_rounding[ins.vex.ll]);
                  oappend (&ins, "bad}");
                  break;
                }
            }

> > @@ -10994,7 +11017,7 @@ print_displacement (instr_info *ins,
> > bfd_signed_vma val)  static void  intel_operand_size (instr_info *ins,
> > int bytemode, int sizeflag)  {
> > -  if (ins->vex.b)
> > +  if (ins->vex.b && ins->evex_type != evex_from_legacy)
> 
> Wouldn't this better be ins->evex_type == evex_default, ...
> 

Done.

> > @@ -11928,7 +11951,8 @@ OP_E_memory (instr_info *ins, int bytemode,
> int sizeflag)
> >  	ins->vex.no_broadcast = true;
> >
> >        if (!ins->vex.no_broadcast
> > -	  && (!ins->intel_syntax || !(ins->evex_used & EVEX_len_used)))
> > +	  && (!ins->intel_syntax || !(ins->evex_used & EVEX_len_used))
> > +	  && ins->evex_type == evex_default)
> 
> ... just like you have it here?
> 
> However, for this change, doesn't this need moving to the enclosing if()?
> You should accidentally set EVEX_b_used here for APX insns.
> 

Sure, moved it.

> > @@ -13280,6 +13304,14 @@ OP_VEX (instr_info *ins, int bytemode, int
> sizeflag ATTRIBUTE_UNUSED)
> >    if (!ins->need_vex)
> >      return true;
> >
> > +  if (ins->evex_type == evex_from_legacy)
> > +    {
> > +      if (ins->vex.b)
> > +	ins->evex_used |= EVEX_b_used;
> > +      else
> > +	 return true;
> > +    }
> 
> When you reuse fields or definitions in places where their names don't match
> their purpose (the field dealt with here is "nd" after all, not "b"), a comment
> wants adding. There's also something odd with indentation here, but I
> suppose an if/else construct isn't needed in the first place.
> 
Added comment and adjusted the formatting, I think we need if/else like "adox " which supports EVEX.ND=0/1.

/* PREFIX_0F38F6 */
  {
    { "wrssK",  { M, Gdq }, 0 },
    { "adoxS",  { VexGdq, Gdq, Edq}, 0 },

adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }


Thanks,
Lili.
  
Jan Beulich Oct. 24, 2023, 11:58 a.m. UTC | #7
On 24.10.2023 13:39, Cui, Lili wrote:
>> Subject: Re: [PATCH 4/8] Support APX NDD
>>
>> On 19.09.2023 17:25, Cui, Lili wrote:
>>> --- a/opcodes/i386-dis-evex-prefix.h
>>> +++ b/opcodes/i386-dis-evex-prefix.h
>>> @@ -353,8 +353,8 @@
>>>    /* PREFIX_EVEX_MAP4_66 */
>>>    {
>>>      { MOD_TABLE (MOD_EVEX_MAP4_66_PREFIX_0) },
>>> -    { "adoxS",	{ Gdq, Edq }, 0 },
>>> -    { "adcxS",	{ Gdq, Edq }, 0 },
>>> +    { "adoxS",	{ VexGdq, Gdq, Edq }, 0 },
>>> +    { "adcxS",	{ VexGdq, Gdq, Edq }, 0 },
>>
>> With the OP_VEX() change moved to the earlier patch, you wouldn't need to
>> alter again right away what was just added there.
>>
> 
> As I commented in another email, I want to move instructions like adcx into this NDD patch. Do you think it is ok?

Yeah, having ND-capable insns in the ND patch makes sense. I don't care
overly much about the actual split. It wants to be half-way consistent,
and beyond that - as said - it helps if code/data added by earlier
patches doesn't need re-writing altogether in later ones.

>>> @@ -13280,6 +13304,14 @@ OP_VEX (instr_info *ins, int bytemode, int
>> sizeflag ATTRIBUTE_UNUSED)
>>>    if (!ins->need_vex)
>>>      return true;
>>>
>>> +  if (ins->evex_type == evex_from_legacy)
>>> +    {
>>> +      if (ins->vex.b)
>>> +	ins->evex_used |= EVEX_b_used;
>>> +      else
>>> +	 return true;
>>> +    }
>>
>> When you reuse fields or definitions in places where their names don't match
>> their purpose (the field dealt with here is "nd" after all, not "b"), a comment
>> wants adding. There's also something odd with indentation here, but I
>> suppose an if/else construct isn't needed in the first place.
>>
> Added comment and adjusted the formatting, I think we need if/else like "adox " which supports EVEX.ND=0/1.

You will need if(), sure, but I'm not convinced of "else". Constructs
like

    if (cond)
      return;
    else
      whatever;

bother me in general, as adding (even if just slightly) more mental load
for the reader. Just

    if (cond)
      return;
    whatever;

is all you need: Less text, less indentation. In the specific case
above yet another possibility may exist (without having actually tried
it):

  if (ins->evex_type == evex_from_legacy)
    {
      ins->evex_used |= EVEX_b_used;
      if (!ins->vex.b)
	 return true;
    }

I.e. this would be as long as setting EVEX_b_used without EVEX.b set
doesn't cause any issues.

Jan
  
Cui, Lili Oct. 25, 2023, 8:10 a.m. UTC | #8
> Subject: Re: [PATCH 4/8] Support APX NDD
> 
> On 22.10.2023 16:05, Cui, Lili wrote:
> >>> @@ -8825,6 +8853,9 @@ build_modrm_byte (void)
> >>>        break;
> >>>    if (v >= dest)
> >>>      v = ~0;
> >>> +  if (i.tm.opcode_space == SPACE_EVEXMAP4
> >>> +      && i.tm.opcode_modifier.vexvvvv)
> >>> +    v = dest;
> >>>    if (i.tm.extension_opcode != None)
> >>>      {
> >>>        if (dest != source)
> >>> @@ -9088,6 +9119,9 @@ build_modrm_byte (void)
> >>>        set_rex_vrex (i.op[op].regs, REX_B, false);
> >>>  	}
> >>>
> >>> +      if (i.tm.opcode_space == SPACE_EVEXMAP4
> >>> +	  && i.tm.opcode_modifier.vexvvvv)
> >>> +	dest--;
> >>>        if (op == dest)
> >>>  	dest = ~0;
> >>>        if (op == source)
> >>
> >> These two changes are at the very least problematic with .insn, whose
> >> behavior may not change. I'd also prefer if we could get away with
> >> just one change to the function. Did you consider alternatives? We
> >> could re- widen VexVVVV, such that the value 2 indicates that the
> destination is encoded there.
> >> That then also has no chance of conflicting with .insn.
> >>
> > I added value 2 for NDD, if it's ok, I will create another patch to move
> (i.tm.extension_opcode != None) to VexVVVVDEST branch, and use value 3
> instead of SWAP_SOURCES, maybe name it VexVVVVSRC1, or just VexVVVVOP1,
> VexVVVVOP2 and VexVVVVOP3?
> 
> Hard to tell without actually seeing the code. What you did quote below used
> just SRC/DEST.
> 

Ok. Let review the current changes first.

> >   /* How to encode VEX.vvvv:
> >      0: VEX.vvvv must be 1111b.
> >      1: VEX.vvvv encodes one of the register operands.
> >      2: VEX.vvvv encodes as the dest register operands.
> >    */
> > #define VexVVVVSRC   1
> > #define VexVVVVDEST  2
> >   VexVVVV,
> 
> For readability maybe insert underscores (and if SRC remains un-numbered,
> maybe also DST in place of DEST)?

Done.

> 
> >>> --- /dev/null
> >>> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> >>> @@ -0,0 +1,156 @@
> >>> +# Check 64bit APX NDD instructions with evex prefix encoding
> >>> +
> >>> +	.allow_index_reg
> >>> +	.text
> >>> +_start:
> >>> +cmovge 0x90909090(%eax),%edx,%r8d
> >>> +cmovle 0x90909090(%eax),%edx,%r8d
> >>> +cmovg  0x90909090(%eax),%edx,%r8d
> >>> +imul   0x90909(%eax),%edx,%r8d
> >>> +imul   0x909(%rax,%r31,8),%rdx,%r25
> >>
> >> What about imul by immediate? The present spec is quite unclear there:
> >> The insn page says {ND=ZU} and the table says 0/1 in the ND column.
> >>
> >
> > We don't support it yet, I put it in RFC.
> > ...
> > 2. Support APX ZU   -- In progress
> > 3. Support APX CCMP and CTEST -- In progress ...
> >
> > About 0/1 in the ND column, it means ZU can be 0/1.
> >
> > IMUL with opcodes 0x69 and 0x6B in map 0 and SETcc instructions
> > Although these instructions do not support NDD, the EVEX.ND bit is
> > used to control whether its destination register has its upper bits (namely,
> bits [63:OSIZE]) zeroed when OSIZE is 8b or 16b.
> > That is, if EVEX.ND = 1, the upper bits are always zeroed; otherwise,
> > they keep the old values when OSIZE is 8b or 16b. For these instructions,
> EVEX.[V4,V3,V2,V1,V0] must be all zero.
> 
> So ZU indeed isn't just a typo there. For 32- and 64-bit forms, is EVEX.ND then
> simply being ignored? The ZU really is meaningful only for 16-bit forms, aiui ...
> 

EVEX.ZU should be ignored for 32-bit and 64-bit forms. For imul (in spec 6.30 IMUL), EVEX.ND stands for ND or ZU. I think ZU makes sense for both the 16-bit form (imul) and the 8-bit form (setcc, I'm not sure if imul supports it yet).

> >>> +.byte 0x62,0xf4,0xfc,0x08,0xff,0xc0  #inc %rax .byte
> >>> +0x62,0xf4,0xec,0x08,0xff,0xc0  #bad
> >>
> >> As before, please avoid .byte whenever possible. And please have a
> >> more detailed comment as to what is being encoded, when .byte cannot be
> avoided.
> >> Plus, if at all possible, have "bad" tests live in separate testcases from
> "good"
> >> ones.
> >>
> >
> > This case wants to test that inc supports evex format without GPR32,  patch
> part II 1/6 will cover it. The first case has been removed and the second case
> has been added to x86-64-apx-evex-promoted-bad.s.
> 
> That's all fine, but still wants expressing with .insn rather than .byte, if at all
> possible.
> 

Got your point, for the first case we can use .insn instead, for the second case need .byte. 

Thanks,
Lili.
  
Jan Beulich Oct. 25, 2023, 8:47 a.m. UTC | #9
On 25.10.2023 10:10, Cui, Lili wrote:
>> On 22.10.2023 16:05, Cui, Lili wrote:
>>>>> --- /dev/null
>>>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
>>>>> @@ -0,0 +1,156 @@
>>>>> +# Check 64bit APX NDD instructions with evex prefix encoding
>>>>> +
>>>>> +	.allow_index_reg
>>>>> +	.text
>>>>> +_start:
>>>>> +cmovge 0x90909090(%eax),%edx,%r8d
>>>>> +cmovle 0x90909090(%eax),%edx,%r8d
>>>>> +cmovg  0x90909090(%eax),%edx,%r8d
>>>>> +imul   0x90909(%eax),%edx,%r8d
>>>>> +imul   0x909(%rax,%r31,8),%rdx,%r25
>>>>
>>>> What about imul by immediate? The present spec is quite unclear there:
>>>> The insn page says {ND=ZU} and the table says 0/1 in the ND column.
>>>>
>>>
>>> We don't support it yet, I put it in RFC.
>>> ...
>>> 2. Support APX ZU   -- In progress
>>> 3. Support APX CCMP and CTEST -- In progress ...
>>>
>>> About 0/1 in the ND column, it means ZU can be 0/1.
>>>
>>> IMUL with opcodes 0x69 and 0x6B in map 0 and SETcc instructions
>>> Although these instructions do not support NDD, the EVEX.ND bit is
>>> used to control whether its destination register has its upper bits (namely,
>> bits [63:OSIZE]) zeroed when OSIZE is 8b or 16b.
>>> That is, if EVEX.ND = 1, the upper bits are always zeroed; otherwise,
>>> they keep the old values when OSIZE is 8b or 16b. For these instructions,
>> EVEX.[V4,V3,V2,V1,V0] must be all zero.
>>
>> So ZU indeed isn't just a typo there. For 32- and 64-bit forms, is EVEX.ND then
>> simply being ignored? The ZU really is meaningful only for 16-bit forms, aiui ...
>>
> 
> EVEX.ZU should be ignored for 32-bit and 64-bit forms. For imul (in spec 6.30 IMUL), EVEX.ND stands for ND or ZU.

In cases like this, where ignoring bits is kind of unexpected, the spec
would better say explicitly (on the instruction page) when a meaningless
bit is indeed ignored, rather than being reserved and causing #UD. Note
how even the text in the APX-EVEX-INT section leaves open (or at least
ambiguous, by not mentioning the case) whether SETcc with a memory
operand ignores EVEX.ND or causes #UD when the bit is set.

> I think ZU makes sense for both the 16-bit form (imul) and the 8-bit form (setcc, I'm not sure if imul supports it yet).

No, IMUL by immediate (or actually any IMUL with multiple operands) doesn't
support byte register operands. For SETcc the ZU aspect is pretty clear
and doesn't even need expressing by new syntax in (dis)assembly - you can
simply distinguish the two forms by using either 8-bit registers (no ZU)
or 32-/64-bit ones (with ZU). In principle that's possible with IMUL as
well, of course, but it may be deemed a little odd:

	imul	$17, %dx, %cx
	imul	$17, %dx, %ecx

Yet personally I'd still prefer this over adding e.g. {zu} on either the
mnemonic or the destination operand. Question (as with the way to express
{nf}) is how other assemblers are going to handle it. (Would be quite nice
if the spec could at least give more clear hints towards suggested syntax,
but that hadn't been the case already with the syntax extensions needed
for AVX512.)

Jan
  
Cui, Lili Oct. 25, 2023, 3:29 p.m. UTC | #10
> Subject: Re: [PATCH 4/8] Support APX NDD
> 
> On 24.10.2023 13:39, Cui, Lili wrote:
> >> Subject: Re: [PATCH 4/8] Support APX NDD
> >>
> >> On 19.09.2023 17:25, Cui, Lili wrote:
> >>> --- a/opcodes/i386-dis-evex-prefix.h
> >>> +++ b/opcodes/i386-dis-evex-prefix.h
> >>> @@ -353,8 +353,8 @@
> >>>    /* PREFIX_EVEX_MAP4_66 */
> >>>    {
> >>>      { MOD_TABLE (MOD_EVEX_MAP4_66_PREFIX_0) },
> >>> -    { "adoxS",	{ Gdq, Edq }, 0 },
> >>> -    { "adcxS",	{ Gdq, Edq }, 0 },
> >>> +    { "adoxS",	{ VexGdq, Gdq, Edq }, 0 },
> >>> +    { "adcxS",	{ VexGdq, Gdq, Edq }, 0 },
> >>
> >> With the OP_VEX() change moved to the earlier patch, you wouldn't
> >> need to alter again right away what was just added there.
> >>
> >
> > As I commented in another email, I want to move instructions like adcx into
> this NDD patch. Do you think it is ok?
> 
> Yeah, having ND-capable insns in the ND patch makes sense. I don't care overly
> much about the actual split. It wants to be half-way consistent, and beyond
> that - as said - it helps if code/data added by earlier patches doesn't need re-
> writing altogether in later ones.
> 

Ok.

> >>> @@ -13280,6 +13304,14 @@ OP_VEX (instr_info *ins, int bytemode, int
> >> sizeflag ATTRIBUTE_UNUSED)
> >>>    if (!ins->need_vex)
> >>>      return true;
> >>>
> >>> +  if (ins->evex_type == evex_from_legacy)
> >>> +    {
> >>> +      if (ins->vex.b)
> >>> +	ins->evex_used |= EVEX_b_used;
> >>> +      else
> >>> +	 return true;
> >>> +    }
> >>
> >> When you reuse fields or definitions in places where their names
> >> don't match their purpose (the field dealt with here is "nd" after
> >> all, not "b"), a comment wants adding. There's also something odd
> >> with indentation here, but I suppose an if/else construct isn't needed in the
> first place.
> >>
> > Added comment and adjusted the formatting, I think we need if/else like
> "adox " which supports EVEX.ND=0/1.
> 
> You will need if(), sure, but I'm not convinced of "else". Constructs like
> 
>     if (cond)
>       return;
>     else
>       whatever;
> 
> bother me in general, as adding (even if just slightly) more mental load for the
> reader. Just
> 
>     if (cond)
>       return;
>     whatever;
> 
> is all you need: Less text, less indentation. In the specific case above yet
> another possibility may exist (without having actually tried
> it):
> 
>   if (ins->evex_type == evex_from_legacy)
>     {
>       ins->evex_used |= EVEX_b_used;
>       if (!ins->vex.b)
> 	 return true;
>     }
> 
> I.e. this would be as long as setting EVEX_b_used without EVEX.b set doesn't
> cause any issues.
> 

Done.

Thanks,
Lili.
  
Cui, Lili Oct. 25, 2023, 3:49 p.m. UTC | #11
> Subject: Re: [PATCH 4/8] Support APX NDD
> 
> On 25.10.2023 10:10, Cui, Lili wrote:
> >> On 22.10.2023 16:05, Cui, Lili wrote:
> >>>>> --- /dev/null
> >>>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> >>>>> @@ -0,0 +1,156 @@
> >>>>> +# Check 64bit APX NDD instructions with evex prefix encoding
> >>>>> +
> >>>>> +	.allow_index_reg
> >>>>> +	.text
> >>>>> +_start:
> >>>>> +cmovge 0x90909090(%eax),%edx,%r8d cmovle
> >>>>> +0x90909090(%eax),%edx,%r8d cmovg  0x90909090(%eax),%edx,%r8d
> >>>>> +imul   0x90909(%eax),%edx,%r8d
> >>>>> +imul   0x909(%rax,%r31,8),%rdx,%r25
> >>>>
> >>>> What about imul by immediate? The present spec is quite unclear there:
> >>>> The insn page says {ND=ZU} and the table says 0/1 in the ND column.
> >>>>
> >>>
> >>> We don't support it yet, I put it in RFC.
> >>> ...
> >>> 2. Support APX ZU   -- In progress
> >>> 3. Support APX CCMP and CTEST -- In progress ...
> >>>
> >>> About 0/1 in the ND column, it means ZU can be 0/1.
> >>>
> >>> IMUL with opcodes 0x69 and 0x6B in map 0 and SETcc instructions
> >>> Although these instructions do not support NDD, the EVEX.ND bit is
> >>> used to control whether its destination register has its upper bits
> >>> (namely,
> >> bits [63:OSIZE]) zeroed when OSIZE is 8b or 16b.
> >>> That is, if EVEX.ND = 1, the upper bits are always zeroed;
> >>> otherwise, they keep the old values when OSIZE is 8b or 16b. For
> >>> these instructions,
> >> EVEX.[V4,V3,V2,V1,V0] must be all zero.
> >>
> >> So ZU indeed isn't just a typo there. For 32- and 64-bit forms, is
> >> EVEX.ND then simply being ignored? The ZU really is meaningful only for
> 16-bit forms, aiui ...
> >>
> >
> > EVEX.ZU should be ignored for 32-bit and 64-bit forms. For imul (in spec 6.30
> IMUL), EVEX.ND stands for ND or ZU.
> 
> In cases like this, where ignoring bits is kind of unexpected, the spec would
> better say explicitly (on the instruction page) when a meaningless bit is indeed
> ignored, rather than being reserved and causing #UD. Note how even the text
> in the APX-EVEX-INT section leaves open (or at least ambiguous, by not
> mentioning the case) whether SETcc with a memory operand ignores EVEX.ND
> or causes #UD when the bit is set.
> 

Sorry, my previous answer was inaccurate, EVEX.ZU will not be ignored in 32-bit and 64-bit forms.

Prior to Intel® APX, the following rules apply in 64-bit mode when an instruction’s destination is a GPR and
OSIZE < 64b:
1. If OSIZE is 32b, the destination GPR gets the instruction’s result in bits [31:0] and all zeros in bits
[63:32].
2. If OSIZE is 8b or 16b, the destination GPR gets the instruction’s result in bits [OSIZE-1:0] but keep its
old value in bits [63:OSIZE].

The ZU indication described in items 2.(b) of Section 3.1.2.3.1 does not introduce an NDD. For those
instructions, EVEX.ND=0 keeps the current x86 behavior, but EVEX.ND=1 forces the zeroing of bits
[63:OSIZE] for any OSIZE < 64b

> > I think ZU makes sense for both the 16-bit form (imul) and the 8-bit form
> (setcc, I'm not sure if imul supports it yet).
> 
> No, IMUL by immediate (or actually any IMUL with multiple operands) doesn't
> support byte register operands. For SETcc the ZU aspect is pretty clear and
> doesn't even need expressing by new syntax in (dis)assembly - you can simply
> distinguish the two forms by using either 8-bit registers (no ZU) or 32-/64-bit
> ones (with ZU). In principle that's possible with IMUL as well, of course, but it
> may be deemed a little odd:
> 
> 	imul	$17, %dx, %cx
> 	imul	$17, %dx, %ecx
> 
> Yet personally I'd still prefer this over adding e.g. {zu} on either the mnemonic
> or the destination operand. Question (as with the way to express
> {nf}) is how other assemblers are going to handle it. (Would be quite nice if
> the spec could at least give more clear hints towards suggested syntax, but
> that hadn't been the case already with the syntax extensions needed for
> AVX512.)
> 
 
I will add suffix “zx” (for the Intel syntax) or “zwq” (for the AT&T syntax) to the mnemonic:

Intel syntax                                              AT&T syntax
imulzx rax, word ptr[ rbx ], 0xab         imulzwq $0xab, (%rbx), %rax

Lili.
  
Jan Beulich Oct. 25, 2023, 3:59 p.m. UTC | #12
On 25.10.2023 17:49, Cui, Lili wrote:
>> Subject: Re: [PATCH 4/8] Support APX NDD
>>
>> On 25.10.2023 10:10, Cui, Lili wrote:
>>>> On 22.10.2023 16:05, Cui, Lili wrote:
>>>>>>> --- /dev/null
>>>>>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
>>>>>>> @@ -0,0 +1,156 @@
>>>>>>> +# Check 64bit APX NDD instructions with evex prefix encoding
>>>>>>> +
>>>>>>> +	.allow_index_reg
>>>>>>> +	.text
>>>>>>> +_start:
>>>>>>> +cmovge 0x90909090(%eax),%edx,%r8d cmovle
>>>>>>> +0x90909090(%eax),%edx,%r8d cmovg  0x90909090(%eax),%edx,%r8d
>>>>>>> +imul   0x90909(%eax),%edx,%r8d
>>>>>>> +imul   0x909(%rax,%r31,8),%rdx,%r25
>>>>>>
>>>>>> What about imul by immediate? The present spec is quite unclear there:
>>>>>> The insn page says {ND=ZU} and the table says 0/1 in the ND column.
>>>>>>
>>>>>
>>>>> We don't support it yet, I put it in RFC.
>>>>> ...
>>>>> 2. Support APX ZU   -- In progress
>>>>> 3. Support APX CCMP and CTEST -- In progress ...
>>>>>
>>>>> About 0/1 in the ND column, it means ZU can be 0/1.
>>>>>
>>>>> IMUL with opcodes 0x69 and 0x6B in map 0 and SETcc instructions
>>>>> Although these instructions do not support NDD, the EVEX.ND bit is
>>>>> used to control whether its destination register has its upper bits
>>>>> (namely,
>>>> bits [63:OSIZE]) zeroed when OSIZE is 8b or 16b.
>>>>> That is, if EVEX.ND = 1, the upper bits are always zeroed;
>>>>> otherwise, they keep the old values when OSIZE is 8b or 16b. For
>>>>> these instructions,
>>>> EVEX.[V4,V3,V2,V1,V0] must be all zero.
>>>>
>>>> So ZU indeed isn't just a typo there. For 32- and 64-bit forms, is
>>>> EVEX.ND then simply being ignored? The ZU really is meaningful only for
>> 16-bit forms, aiui ...
>>>>
>>>
>>> EVEX.ZU should be ignored for 32-bit and 64-bit forms. For imul (in spec 6.30
>> IMUL), EVEX.ND stands for ND or ZU.
>>
>> In cases like this, where ignoring bits is kind of unexpected, the spec would
>> better say explicitly (on the instruction page) when a meaningless bit is indeed
>> ignored, rather than being reserved and causing #UD. Note how even the text
>> in the APX-EVEX-INT section leaves open (or at least ambiguous, by not
>> mentioning the case) whether SETcc with a memory operand ignores EVEX.ND
>> or causes #UD when the bit is set.
>>
> 
> Sorry, my previous answer was inaccurate, EVEX.ZU will not be ignored in 32-bit and 64-bit forms.
> 
> Prior to Intel® APX, the following rules apply in 64-bit mode when an instruction’s destination is a GPR and
> OSIZE < 64b:
> 1. If OSIZE is 32b, the destination GPR gets the instruction’s result in bits [31:0] and all zeros in bits
> [63:32].
> 2. If OSIZE is 8b or 16b, the destination GPR gets the instruction’s result in bits [OSIZE-1:0] but keep its
> old value in bits [63:OSIZE].
> 
> The ZU indication described in items 2.(b) of Section 3.1.2.3.1 does not introduce an NDD. For those
> instructions, EVEX.ND=0 keeps the current x86 behavior, but EVEX.ND=1 forces the zeroing of bits
> [63:OSIZE] for any OSIZE < 64b

While described differently, that's still the same behavior as before for
OSIZE > 16b, isn't it? Which still means the EVEX.ND is effectively ignored
in those cases (and could hence as well be reserved).

>>> I think ZU makes sense for both the 16-bit form (imul) and the 8-bit form
>> (setcc, I'm not sure if imul supports it yet).
>>
>> No, IMUL by immediate (or actually any IMUL with multiple operands) doesn't
>> support byte register operands. For SETcc the ZU aspect is pretty clear and
>> doesn't even need expressing by new syntax in (dis)assembly - you can simply
>> distinguish the two forms by using either 8-bit registers (no ZU) or 32-/64-bit
>> ones (with ZU). In principle that's possible with IMUL as well, of course, but it
>> may be deemed a little odd:
>>
>> 	imul	$17, %dx, %cx
>> 	imul	$17, %dx, %ecx
>>
>> Yet personally I'd still prefer this over adding e.g. {zu} on either the mnemonic
>> or the destination operand. Question (as with the way to express
>> {nf}) is how other assemblers are going to handle it. (Would be quite nice if
>> the spec could at least give more clear hints towards suggested syntax, but
>> that hadn't been the case already with the syntax extensions needed for
>> AVX512.)
>>
>  
> I will add suffix “zx” (for the Intel syntax) or “zwq” (for the AT&T syntax) to the mnemonic:
> 
> Intel syntax                                              AT&T syntax
> imulzx rax, word ptr[ rbx ], 0xab         imulzwq $0xab, (%rbx), %rax

For Intel syntax, unless you happen to know that MASM is going to go
that route, the "word ptr" is sufficient for disambiguation, and no
suffixes should be accepted (gas) or be output (objdump). For AT&T
syntax the case with a memory operand indeed requires some means to
disambiguate; as asked before, I wonder if your approach matches with
what other assemblers are going to do.

Jan
  

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 48916bc3846..381e389bb04 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -2261,8 +2261,9 @@  operand_size_match (const insn_template *t)
       unsigned int given = i.operands - j - 1;
 
       /* For FMA4 and XOP insns VEX.W controls just the first two
-	 register operands.  */
-      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
+	 register operands. And APX insns just swap the first operands.  */
+      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
+	  || (is_cpu (t,CpuAPX_F) && i.operands == 3))
 	given = j < 2 ? 1 - j : j;
 
       if (t->operand_types[j].bitfield.class == Reg
@@ -3876,6 +3877,7 @@  is_any_apx_encoding (void)
 {
   return i.rex2 
     || i.rex2_encoding
+    || i.tm.opcode_space == SPACE_EVEXMAP4
     || (i.vex.register_specifier
 	&& i.vex.register_specifier->reg_flags & RegRex2);
 }
@@ -4204,6 +4206,10 @@  build_legacy_insns_with_apx_encoding (void)
     }
 
   build_evex_insns_with_extend_evex_prefix ();
+
+  /* Encode the NDD bit.  */
+  if (i.vex.register_specifier)
+    i.vex.bytes[3] |= 0x10;
 }
 
 static void
@@ -7383,26 +7389,31 @@  match_template (char mnem_suffix)
 	  overlap1 = operand_type_and (operand_types[0], operand_types[1]);
 	  if (t->opcode_modifier.d && i.reg_operands == i.operands
 	      && !operand_type_all_zero (&overlap1))
-	    switch (i.dir_encoding)
-	      {
-	      case dir_encoding_load:
-		if (operand_type_check (operand_types[i.operands - 1], anymem)
-		    || t->opcode_modifier.regmem)
-		  goto check_reverse;
-		break;
+	    {
+	      int isMemOperand = (t->opcode_modifier.vexvvvv
+				  && t->opcode_space == SPACE_EVEXMAP4)
+				  ? i.operands - 2 : i.operands - 1;
+	      switch (i.dir_encoding)
+		{
+		case dir_encoding_load:
+		  if (operand_type_check (operand_types[isMemOperand], anymem)
+		      || t->opcode_modifier.regmem)
+		    goto check_reverse;
+		  break;
 
-	      case dir_encoding_store:
-		if (!operand_type_check (operand_types[i.operands - 1], anymem)
-		    && !t->opcode_modifier.regmem)
-		  goto check_reverse;
-		break;
+		case dir_encoding_store:
+		  if (!operand_type_check (operand_types[isMemOperand], anymem)
+		      && !t->opcode_modifier.regmem)
+		    goto check_reverse;
+		  break;
 
-	      case dir_encoding_swap:
-		goto check_reverse;
+		case dir_encoding_swap:
+		  goto check_reverse;
 
-	      case dir_encoding_default:
-		break;
-	      }
+		case dir_encoding_default:
+		  break;
+		}
+	    }
 	  /* If we want store form, we skip the current load.  */
 	  if ((i.dir_encoding == dir_encoding_store
 	       || i.dir_encoding == dir_encoding_swap)
@@ -7432,11 +7443,13 @@  match_template (char mnem_suffix)
 		continue;
 	      /* Try reversing direction of operands.  */
 	      j = is_cpu (t, CpuFMA4)
-		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
+		  || is_cpu (t, CpuXOP)
+		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
 	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
 	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
 	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
-	      gas_assert (t->operands != 3 || !check_register);
+	      gas_assert (t->operands != 3 || !check_register
+			  || is_cpu (t,CpuAPX_F));
 	      if (!operand_type_match (overlap0, i.types[0])
 		  || !operand_type_match (overlap1, i.types[j])
 		  || (t->operands == 3
@@ -7471,6 +7484,12 @@  match_template (char mnem_suffix)
 		  found_reverse_match = Opcode_VexW;
 		  goto check_operands_345;
 		}
+	      else if (is_cpu (t,CpuAPX_F)
+		       && i.operands == 3)
+		{
+		  found_reverse_match = Opcode_APX_NDDD;
+		  goto check_operands_345;
+		}
 	      else if (t->opcode_space != SPACE_BASE
 		       && (t->opcode_space != SPACE_0F
 			   /* MOV to/from CR/DR/TR, as an exception, follow
@@ -7636,6 +7655,15 @@  match_template (char mnem_suffix)
 	 flipping VEX.W.  */
       i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
 
+      j = i.tm.operand_types[0].bitfield.imm8;
+      i.tm.operand_types[j] = operand_types[j + 1];
+      i.tm.operand_types[j + 1] = operand_types[j];
+      break;
+
+    case Opcode_APX_NDDD:
+      /* Only the first two register operands need reversing.  */
+      i.tm.base_opcode ^= 0x2;
+
       j = i.tm.operand_types[0].bitfield.imm8;
       i.tm.operand_types[j] = operand_types[j + 1];
       i.tm.operand_types[j + 1] = operand_types[j];
@@ -8462,8 +8490,8 @@  process_operands (void)
   const reg_entry *default_seg = NULL;
 
   /* We only need to check those implicit registers for instructions
-     with 3 operands or less.  */
-  if (i.operands <= 3)
+     with 4 operands or less.  */
+  if (i.operands <= 4)
     for (unsigned int j = 0; j < i.operands; j++)
       if (i.types[j].bitfield.instance != InstanceNone)
 	i.reg_operands--;
@@ -8825,6 +8853,9 @@  build_modrm_byte (void)
       break;
   if (v >= dest)
     v = ~0;
+  if (i.tm.opcode_space == SPACE_EVEXMAP4
+      && i.tm.opcode_modifier.vexvvvv)
+    v = dest;
   if (i.tm.extension_opcode != None)
     {
       if (dest != source)
@@ -9088,6 +9119,9 @@  build_modrm_byte (void)
       set_rex_vrex (i.op[op].regs, REX_B, false);
 	}
 
+      if (i.tm.opcode_space == SPACE_EVEXMAP4
+	  && i.tm.opcode_modifier.vexvvvv)
+	dest--;
       if (op == dest)
 	dest = ~0;
       if (op == source)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd.d b/gas/testsuite/gas/i386/x86-64-apx-ndd.d
new file mode 100644
index 00000000000..debb99f2ff9
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.d
@@ -0,0 +1,165 @@ 
+#as:
+#objdump: -dw
+#name: x86-64 APX NDD instructions with evex prefix encoding
+#source: x86-64-apx-ndd.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 e4 18 ff c0\s+inc    %rax,%rbx
+\s*[a-f0-9]+:\s*62 dc bc 18 ff c7\s+inc    %r31,%r8
+\s*[a-f0-9]+:\s*62 dc fc 10 ff c7\s+inc    %r31,%r16
+\s*[a-f0-9]+:\s*62 44 7c 10 00 f8\s+add    %r31b,%r8b,%r16b
+\s*[a-f0-9]+:\s*62 44 7c 10 00 f8\s+add    %r31b,%r8b,%r16b
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8\s+add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8\s+add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 7c 10 01 f8\s+add    %r31d,%r8d,%r16d
+\s*[a-f0-9]+:\s*62 44 7c 10 01 f8\s+add    %r31d,%r8d,%r16d
+\s*[a-f0-9]+:\s*62 44 7d 10 01 f8\s+add    %r31w,%r8w,%r16w
+\s*[a-f0-9]+:\s*62 44 7d 10 01 f8\s+add    %r31w,%r8w,%r16w
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8\s+add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 5c fc 10 03 c7\s+add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 fc 10 01 38\s+add    %r31,\(%r8\),%r16
+\s*[a-f0-9]+:\s*62 5c fc 10 03 07\s+add    \(%r31\),%r8,%r16
+\s*[a-f0-9]+:\s*62 5c f8 10 03 84 07 90 90 00 00\s+add\s+0x9090\(%r31,%r16,1\),%r8,%r16
+\s*[a-f0-9]+:\s*62 44 f8 10 01 3c c0\s+add    %r31,\(%r8,%r16,8\),%r16
+\s*[a-f0-9]+:\s*62 d4 74 10 80 c5 34\s+add    \$0x34,%r13b,%r17b
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 04 83 11\s+addl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 c0 34 12\s+add    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 d4 fc 10 81 c7 33 44 34 12\s+add    \$0x12344433,%r15,%r16
+\s*[a-f0-9]+:\s*62 d4 fc 10 81 04 8f 33 44 34 12\s+addq   \$0x12344433,\(%r15,%rcx,4\),%r16
+\s*[a-f0-9]+:\s*62 f4 bc 18 81 c0 11 22 33 f4\s+add    \$0xfffffffff4332211,%rax,%r8
+\s*[a-f0-9]+:\s*62 f4 f4 10 ff c8    	dec    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 fe 0c 27 	decb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 f4 f4 10 f7 d0    	not    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 f6 14 27 	notb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 f4 f4 10 f7 d8    	neg    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 f6 1c 27 	negb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 7c 6c 10 28 f9    	sub    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 2a 04 07 	sub    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 2b 04 07 	sub    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 2c 83 11 	subl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 e8 34 12 	sub    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 18 f9    	sbb    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 1a 04 07 	sbb    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 1b 04 07 	sbb    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 1c 83 11 	sbbl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 d8 34 12 	sbb    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 10 f9    	adc    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 12 04 07 	adc    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 13 04 07 	adc    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 14 83 11 	adcl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 d0 34 12 	adc    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 08 f9    	or     %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 0a 04 07 	or     \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 0b 04 07 	or     \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 0c 83 11 	orl    \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 c8 34 12 	or     \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 30 f9    	xor    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 32 04 07 	xor    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 33 04 07 	xor    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 34 83 11 	xorl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 f0 34 12 	xor    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 20 f9    	and    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 22 04 07 	and    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 23 04 07 	and    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 24 83 11 	andl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 e0 34 12 	and    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 08    	rorb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 cc 02 	ror    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 08 02 	rorl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 08    	rorw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 c8    	ror    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 0c 83 	rorw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 00    	rolb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 c4 02 	rol    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 00 02 	roll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 00    	rolw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 c0    	rol    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 04 83 	rolw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 18    	rcrb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 dc 02 	rcr    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 18 02 	rcrl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 18    	rcrw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 d8    	rcr    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 1c 83 	rcrw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 10    	rclb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 d4 02 	rcl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 10 02 	rcll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 10    	rclw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 d0    	rcl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 14 83 	rclw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 20    	shlb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 e4 02 	shl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 20 02 	shll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 20    	shlw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e0    	shl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 24 83 	shlw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 38    	sarb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 fc 02 	sar    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 38 02 	sarl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 38    	sarw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 f8    	sar    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 3c 83 	sarw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 20    	shlb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 e4 02 	shl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 20 02 	shll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 20    	shlw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e0    	shl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 24 83 	shlw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 28    	shrb   \(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 ec 02 	shr    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 28 02 	shrl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 28    	shrw   \(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e8    	shr    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 2c 83 	shrw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 74 84 10 24 20 01 	shld   \$0x1,%r12,\(%rax\),%r31
+\s*[a-f0-9]+:\s*62 54 05 10 24 c4 02 	shld   \$0x2,%r8w,%r12w,%r31w
+\s*[a-f0-9]+:\s*62 74 04 10 24 38 02 	shld   \$0x2,%r15d,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 74 05 10 a5 08    	shld   %cl,%r9w,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 7c bc 18 a5 e0\s+shld   %cl,%r12,%r16,%r8
+\s*[a-f0-9]+:\s*62 7c 05 10 a5 2c 83\s+shld   %cl,%r13w,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 74 84 10 2c 20 01 	shrd   \$0x1,%r12,\(%rax\),%r31
+\s*[a-f0-9]+:\s*62 54 05 10 2c c4 02 	shrd   \$0x2,%r8w,%r12w,%r31w
+\s*[a-f0-9]+:\s*62 74 04 10 2c 38 02 	shrd   \$0x2,%r15d,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 74 05 10 ad 08\s+shrd   %cl,%r9w,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 7c bc 18 ad e0\s+shrd   %cl,%r12,%r16,%r8
+\s*[a-f0-9]+:\s*62 7c 05 10 ad 2c 83\s+shrd   %cl,%r13w,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 54 6d 10 66 c7    	adcx   %r15d,%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 69 10 66 04 3f 	adcx   \(%r15,%r31,1\),%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 f9 08 66 04 3f 	adcx   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*62 54 6e 10 66 c7    	adox   %r15d,%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 6a 10 66 04 3f 	adox   \(%r15,%r31,1\),%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 fa 08 66 04 3f 	adox   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 40 90 90 90 90 90 	cmovo  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 41 90 90 90 90 90 	cmovno -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 42 90 90 90 90 90 	cmovb  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 43 90 90 90 90 90 	cmovae -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 44 90 90 90 90 90 	cmove  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 45 90 90 90 90 90 	cmovne -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 46 90 90 90 90 90 	cmovbe -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 47 90 90 90 90 90 	cmova  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 48 90 90 90 90 90 	cmovs  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 49 90 90 90 90 90 	cmovns -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4c 90 90 90 90 90 	cmovl  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4d 90 90 90 90 90 	cmovge -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4e 90 90 90 90 90 	cmovle -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4f 90 90 90 90 90 	cmovg  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 af 90 09 09 09 00 	imul   0x90909\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*62 b4 b0 10 af 94 f8 09 09 00 00 	imul   0x909\(%rax,%r31,8\),%rdx,%r25
+\s*[a-f0-9]+:\s*62 f4 fc 08 ff c0\s+inc    %rax
+\s*[a-f0-9]+:\s*62 f4 ec\s+\(bad\)
+\s*[a-f0-9]+:\s*08 ff\s+or     %bh,%bh
+\s*[a-f0-9]+:\s*c0\s+\.byte 0xc0
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd.s b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
new file mode 100644
index 00000000000..8c86989ffe5
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
@@ -0,0 +1,156 @@ 
+# Check 64bit APX NDD instructions with evex prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+inc    %rax,%rbx
+inc    %r31,%r8
+inc    %r31,%r16
+add    %r31b,%r8b,%r16b
+addb    %r31b,%r8b,%r16b
+add    %r31,%r8,%r16
+addq    %r31,%r8,%r16
+add    %r31d,%r8d,%r16d
+addl    %r31d,%r8d,%r16d
+add    %r31w,%r8w,%r16w
+addw    %r31w,%r8w,%r16w
+{store} add    %r31,%r8,%r16
+{load}  add    %r31,%r8,%r16
+add    %r31,(%r8),%r16
+add    (%r31),%r8,%r16
+add    0x9090(%r31,%r16,1),%r8,%r16
+add    %r31,(%r8,%r16,8),%r16
+add    $0x34,%r13b,%r17b
+addl   $0x11,(%r19,%rax,4),%r20d
+add    $0x1234,%ax,%r30w
+add    $0x12344433,%r15,%r16
+addq   $0x12344433,(%r15,%rcx,4),%r16
+add    $0xfffffffff4332211,%rax,%r8
+dec    %rax,%r17
+decb   (%r31,%r12,1),%r8b
+not    %rax,%r17
+notb   (%r31,%r12,1),%r8b
+neg    %rax,%r17
+negb   (%r31,%r12,1),%r8b
+sub    %r15b,%r17b,%r18b
+sub    %r15d,(%r8),%r18d
+sub    (%r15,%rax,1),%r16b,%r8b
+sub    (%r15,%rax,1),%r16w,%r8w
+subl   $0x11,(%r19,%rax,4),%r20d
+sub    $0x1234,%ax,%r30w
+sbb    %r15b,%r17b,%r18b
+sbb    %r15d,(%r8),%r18d
+sbb    (%r15,%rax,1),%r16b,%r8b
+sbb    (%r15,%rax,1),%r16w,%r8w
+sbbl   $0x11,(%r19,%rax,4),%r20d
+sbb    $0x1234,%ax,%r30w
+adc    %r15b,%r17b,%r18b
+adc    %r15d,(%r8),%r18d
+adc    (%r15,%rax,1),%r16b,%r8b
+adc    (%r15,%rax,1),%r16w,%r8w
+adcl   $0x11,(%r19,%rax,4),%r20d
+adc    $0x1234,%ax,%r30w
+or     %r15b,%r17b,%r18b
+or     %r15d,(%r8),%r18d
+or     (%r15,%rax,1),%r16b,%r8b
+or     (%r15,%rax,1),%r16w,%r8w
+orl    $0x11,(%r19,%rax,4),%r20d
+or     $0x1234,%ax,%r30w
+xor    %r15b,%r17b,%r18b
+xor    %r15d,(%r8),%r18d
+xor    (%r15,%rax,1),%r16b,%r8b
+xor    (%r15,%rax,1),%r16w,%r8w
+xorl   $0x11,(%r19,%rax,4),%r20d
+xor    $0x1234,%ax,%r30w
+and    %r15b,%r17b,%r18b
+and    %r15d,(%r8),%r18d
+and    (%r15,%rax,1),%r16b,%r8b
+and    (%r15,%rax,1),%r16w,%r8w
+andl   $0x11,(%r19,%rax,4),%r20d
+and    $0x1234,%ax,%r30w
+rorb   (%rax),%r31b
+ror    $0x2,%r12b,%r31b
+rorl   $0x2,(%rax),%r31d
+rorw   (%rax),%r31w
+ror    %cl,%r16b,%r8b
+rorw   %cl,(%r19,%rax,4),%r31w
+rolb   (%rax),%r31b
+rol    $0x2,%r12b,%r31b
+roll   $0x2,(%rax),%r31d
+rolw   (%rax),%r31w
+rol    %cl,%r16b,%r8b
+rolw   %cl,(%r19,%rax,4),%r31w
+rcrb   (%rax),%r31b
+rcr    $0x2,%r12b,%r31b
+rcrl   $0x2,(%rax),%r31d
+rcrw   (%rax),%r31w
+rcr    %cl,%r16b,%r8b
+rcrw   %cl,(%r19,%rax,4),%r31w
+rclb   (%rax),%r31b
+rcl    $0x2,%r12b,%r31b
+rcll   $0x2,(%rax),%r31d
+rclw   (%rax),%r31w
+rcl    %cl,%r16b,%r8b
+rclw   %cl,(%r19,%rax,4),%r31w
+shlb   (%rax),%r31b
+shl    $0x2,%r12b,%r31b
+shll   $0x2,(%rax),%r31d
+shlw   (%rax),%r31w
+shl    %cl,%r16b,%r8b
+shlw   %cl,(%r19,%rax,4),%r31w
+sarb   (%rax),%r31b
+sar    $0x2,%r12b,%r31b
+sarl   $0x2,(%rax),%r31d
+sarw   (%rax),%r31w
+sar    %cl,%r16b,%r8b
+sarw   %cl,(%r19,%rax,4),%r31w
+shlb   (%rax),%r31b
+shl    $0x2,%r12b,%r31b
+shll   $0x2,(%rax),%r31d
+shlw   (%rax),%r31w
+shl    %cl,%r16b,%r8b
+shlw   %cl,(%r19,%rax,4),%r31w
+shrb   (%rax),%r31b
+shr    $0x2,%r12b,%r31b
+shrl   $0x2,(%rax),%r31d
+shrw   (%rax),%r31w
+shr    %cl,%r16b,%r8b
+shrw   %cl,(%r19,%rax,4),%r31w
+shld   $0x1,%r12,(%rax),%r31
+shld   $0x2,%r8w,%r12w,%r31w
+shld   $0x2,%r15d,(%rax),%r31d
+shld   %cl,%r9w,(%rax),%r31w
+shld   %cl,%r12,%r16,%r8
+shld   %cl,%r13w,(%r19,%rax,4),%r31w
+shrd   $0x1,%r12,(%rax),%r31
+shrd   $0x2,%r8w,%r12w,%r31w
+shrd   $0x2,%r15d,(%rax),%r31d
+shrd   %cl,%r9w,(%rax),%r31w
+shrd   %cl,%r12,%r16,%r8
+shrd   %cl,%r13w,(%r19,%rax,4),%r31w
+adcx   %r15d,%r8d,%r18d
+adcx   (%r15,%r31,1),%r8d,%r18d
+adcx   (%r15,%r31,1),%r8
+adox   %r15d,%r8d,%r18d
+adox   (%r15,%r31,1),%r8d,%r18d
+adox   (%r15,%r31,1),%r8
+cmovo  0x90909090(%eax),%edx,%r8d
+cmovno 0x90909090(%eax),%edx,%r8d
+cmovb  0x90909090(%eax),%edx,%r8d
+cmovae 0x90909090(%eax),%edx,%r8d
+cmove  0x90909090(%eax),%edx,%r8d
+cmovne 0x90909090(%eax),%edx,%r8d
+cmovbe 0x90909090(%eax),%edx,%r8d
+cmova  0x90909090(%eax),%edx,%r8d
+cmovs  0x90909090(%eax),%edx,%r8d
+cmovns 0x90909090(%eax),%edx,%r8d
+cmovp  0x90909090(%eax),%edx,%r8d
+cmovnp 0x90909090(%eax),%edx,%r8d
+cmovl  0x90909090(%eax),%edx,%r8d
+cmovge 0x90909090(%eax),%edx,%r8d
+cmovle 0x90909090(%eax),%edx,%r8d
+cmovg  0x90909090(%eax),%edx,%r8d
+imul   0x90909(%eax),%edx,%r8d
+imul   0x909(%rax,%r31,8),%rdx,%r25
+.byte 0x62,0xf4,0xfc,0x08,0xff,0xc0  #inc %rax
+.byte 0x62,0xf4,0xec,0x08,0xff,0xc0  #bad
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 8cc4040cb77..f6257c01792 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -137,6 +137,48 @@  Disassembly of section .text:
  +[a-f0-9]+:	33 07                	xor    \(%rdi\),%eax
  +[a-f0-9]+:	31 07                	xor    %eax,\(%rdi\)
  +[a-f0-9]+:	33 07                	xor    \(%rdi\),%eax
+ +[a-f0-9]+:	62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+ +[a-f0-9]+:	62 44 fc 10 03 38    	add    \(%r8\),%r31,%r16
+ +[a-f0-9]+:	62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+ +[a-f0-9]+:	62 44 fc 10 03 38    	add    \(%r8\),%r31,%r16
+ +[a-f0-9]+:	62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 2b 38    	sub    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 2b 38    	sub    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 1b 38    	sbb    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 1b 38    	sbb    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 23 38    	and    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 23 38    	and    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 0b 38    	or     \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 0b 38    	or     \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 33 38    	xor    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 33 38    	xor    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 13 38    	adc    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 13 38    	adc    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+ +[a-f0-9]+:	62 5c fc 10 03 c7    	add    %r31,%r8,%r16
+ +[a-f0-9]+:	62 7c 6c 10 28 f9    	sub    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 2a cf    	sub    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 18 f9    	sbb    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 1a cf    	sbb    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 20 f9    	and    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 22 cf    	and    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 08 f9    	or     %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 0a cf    	or     %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 30 f9    	xor    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 32 cf    	xor    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 10 f9    	adc    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 12 cf    	adc    %r15b,%r17b,%r18b
  +[a-f0-9]+:	b0 12                	mov    \$0x12,%al
  +[a-f0-9]+:	b8 45 03 00 00       	mov    \$0x345,%eax
  +[a-f0-9]+:	b0 12                	mov    \$0x12,%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index eb25f2a8fbf..9f756e5ba04 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -134,6 +134,49 @@  _start:
 	{load} xor (%rdi), %eax
 	{store} xor %eax, (%rdi)
 	{store} xor (%rdi), %eax
+	{load}  add    %r31,(%r8),%r16
+	{load}	add    (%r8),%r31,%r16
+	{store} add    %r31,(%r8),%r16
+	{store}	add    (%r8),%r31,%r16
+	{load} 	sub    %r15d,(%r8),%r18d
+	{load}	sub    (%r8),%r15d,%r18d
+	{store} sub    %r15d,(%r8),%r18d
+	{store} sub    (%r8),%r15d,%r18d
+	{load} 	sbb    %r15d,(%r8),%r18d
+	{load}	sbb    (%r8),%r15d,%r18d
+	{store} sbb    %r15d,(%r8),%r18d
+	{store} sbb    (%r8),%r15d,%r18d
+	{load} 	and    %r15d,(%r8),%r18d
+	{load}	and    (%r8),%r15d,%r18d
+	{store} and    %r15d,(%r8),%r18d
+	{store} and    (%r8),%r15d,%r18d
+	{load} 	or     %r15d,(%r8),%r18d
+	{load}	or     (%r8),%r15d,%r18d
+	{store} or     %r15d,(%r8),%r18d
+	{store} or     (%r8),%r15d,%r18d
+	{load} 	xor    %r15d,(%r8),%r18d
+	{load}	xor    (%r8),%r15d,%r18d
+	{store} xor    %r15d,(%r8),%r18d
+	{store} xor    (%r8),%r15d,%r18d
+	{load} 	adc    %r15d,(%r8),%r18d
+	{load}	adc    (%r8),%r15d,%r18d
+	{store} adc    %r15d,(%r8),%r18d
+	{store} adc    (%r8),%r15d,%r18d
+
+	{store} add    %r31,%r8,%r16
+	{load}  add    %r31,%r8,%r16
+	{store} sub    %r15b,%r17b,%r18b
+	{load}	sub    %r15b,%r17b,%r18b
+	{store}	sbb    %r15b,%r17b,%r18b
+	{load}	sbb    %r15b,%r17b,%r18b
+	{store}	and    %r15b,%r17b,%r18b
+	{load}	and    %r15b,%r17b,%r18b
+	{store}	or     %r15b,%r17b,%r18b
+	{load}	or     %r15b,%r17b,%r18b
+	{store}	xor    %r15b,%r17b,%r18b
+	{load}	xor    %r15b,%r17b,%r18b
+	{store}	adc    %r15b,%r17b,%r18b
+	{load}	adc    %r15b,%r17b,%r18b
 
 	.irp m, mov, adc, add, and, cmp, or, sbb, sub, test, xor
 	\m	$0x12, %al
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 46fb3681528..ca1583c6f88 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -367,6 +367,7 @@  run_dump_test "x86-64-apx-rex2-inval"
 run_dump_test "x86-64-apx-evex-promoted"
 run_dump_test "x86-64-apx-evex-promoted-intel"
 run_dump_test "x86-64-apx-evex-egpr"
+run_dump_test "x86-64-apx-ndd"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index f6f02de6c47..210783d7e88 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -353,8 +353,8 @@ 
   /* PREFIX_EVEX_MAP4_66 */
   {
     { MOD_TABLE (MOD_EVEX_MAP4_66_PREFIX_0) },
-    { "adoxS",	{ Gdq, Edq }, 0 },
-    { "adcxS",	{ Gdq, Edq }, 0 },
+    { "adoxS",	{ VexGdq, Gdq, Edq }, 0 },
+    { "adcxS",	{ VexGdq, Gdq, Edq }, 0 },
   },
   /* PREFIX_EVEX_MAP4_D8 */
   {
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index c3b4f083346..d00c2843e12 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -56,6 +56,105 @@ 
     { "blsmskS",	{ VexGdq, Edq }, 0 },
     { "blsiS",		{ VexGdq, Edq }, 0 },
   },
+  /* REG_EVEX_MAP4_80 */
+  {
+    { "addA",	{ VexGb, Eb, Ib }, 0 },
+    { "orA",	{ VexGb, Eb, Ib }, 0 },
+    { "adcA",	{ VexGb, Eb, Ib }, 0 },
+    { "sbbA",	{ VexGb, Eb, Ib }, 0 },
+    { "andA",	{ VexGb, Eb, Ib }, 0 },
+    { "subA",	{ VexGb, Eb, Ib }, 0 },
+    { "xorA",	{ VexGb, Eb, Ib }, 0 },
+    { Bad_Opcode },
+  },
+  /* REG_EVEX_MAP4_81 */
+  {
+    { "addQ",	{ VexGv, Ev, Iv }, 0 },
+    { "orQ",	{ VexGv, Ev, Iv }, 0 },
+    { "adcQ",	{ VexGv, Ev, Iv }, 0 },
+    { "sbbQ",	{ VexGv, Ev, Iv }, 0 },
+    { "andQ",	{ VexGv, Ev, Iv }, 0 },
+    { "subQ",	{ VexGv, Ev, Iv }, 0 },
+    { "xorQ",	{ VexGv, Ev, Iv }, 0 },
+    { Bad_Opcode },
+  },
+  /* REG_EVEX_MAP4_83 */
+  {
+    { "addQ",	{ VexGv, Ev, sIb }, 0 },
+    { "orQ",	{ VexGv, Ev, sIb }, 0 },
+    { "adcQ",	{ VexGv, Ev, sIb }, 0 },
+    { "sbbQ",	{ VexGv, Ev, sIb }, 0 },
+    { "andQ",	{ VexGv, Ev, sIb }, 0 },
+    { "subQ",	{ VexGv, Ev, sIb }, 0 },
+    { "xorQ",	{ VexGv, Ev, sIb }, 0 },
+    { Bad_Opcode },
+  },
+  /* REG_EVEX_MAP4_C0 */
+  {
+    { "rolA",	{ VexGb, Eb, Ib }, 0 },
+    { "rorA",	{ VexGb, Eb, Ib }, 0 },
+    { "rclA",	{ VexGb, Eb, Ib }, 0 },
+    { "rcrA",	{ VexGb, Eb, Ib }, 0 },
+    { "shlA",	{ VexGb, Eb, Ib }, 0 },
+    { "shrA",	{ VexGb, Eb, Ib }, 0 },
+    { "shlA",	{ VexGb, Eb, Ib }, 0 },
+    { "sarA",	{ VexGb, Eb, Ib }, 0 },
+  },
+  /* REG_EVEX_MAP4_C1 */
+  {
+    { "rolQ",	{ VexGv, Ev, Ib }, 0 },
+    { "rorQ",	{ VexGv, Ev, Ib }, 0 },
+    { "rclQ",	{ VexGv, Ev, Ib }, 0 },
+    { "rcrQ",	{ VexGv, Ev, Ib }, 0 },
+    { "shlQ",	{ VexGv, Ev, Ib }, 0 },
+    { "shrQ",	{ VexGv, Ev, Ib }, 0 },
+    { "shlQ",	{ VexGv, Ev, Ib }, 0 },
+    { "sarQ",	{ VexGv, Ev, Ib }, 0 },
+  },
+  /* REG_EVEX_MAP4_D0 */
+  {
+    { "rolA",	{ VexGb, Eb, I1 }, 0 },
+    { "rorA",	{ VexGb, Eb, I1 }, 0 },
+    { "rclA",	{ VexGb, Eb, I1 }, 0 },
+    { "rcrA",	{ VexGb, Eb, I1 }, 0 },
+    { "shlA",	{ VexGb, Eb, I1 }, 0 },
+    { "shrA",	{ VexGb, Eb, I1 }, 0 },
+    { "shlA",	{ VexGb, Eb, I1 }, 0 },
+    { "sarA",	{ VexGb, Eb, I1 }, 0 },
+  },
+  /* REG_EVEX_MAP4_D1 */
+  {
+    { "rolQ",	{ VexGv, Ev, I1 }, 0 },
+    { "rorQ",	{ VexGv, Ev, I1 }, 0 },
+    { "rclQ",	{ VexGv, Ev, I1 }, 0 },
+    { "rcrQ",	{ VexGv, Ev, I1 }, 0 },
+    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
+    { "shrQ",	{ VexGv, Ev, I1 }, 0 },
+    { "shlQ",	{ VexGv, Ev, I1 }, 0 },
+    { "sarQ",	{ VexGv, Ev, I1 }, 0 },
+  },
+  /* REG_EVEX_MAP4_D2 */
+  {
+    { "rolA",	{ VexGb, Eb, CL }, 0 },
+    { "rorA",	{ VexGb, Eb, CL }, 0 },
+    { "rclA",	{ VexGb, Eb, CL }, 0 },
+    { "rcrA",	{ VexGb, Eb, CL }, 0 },
+    { "shlA",	{ VexGb, Eb, CL }, 0 },
+    { "shrA",	{ VexGb, Eb, CL }, 0 },
+    { "shlA",	{ VexGb, Eb, CL }, 0 },
+    { "sarA",	{ VexGb, Eb, CL }, 0 },
+  },
+  /* REG_EVEX_MAP4_D3 */
+  {
+    { "rolQ",	{ VexGv, Ev, CL }, 0 },
+    { "rorQ",	{ VexGv, Ev, CL }, 0 },
+    { "rclQ",	{ VexGv, Ev, CL }, 0 },
+    { "rcrQ",	{ VexGv, Ev, CL }, 0 },
+    { "shlQ",	{ VexGv, Ev, CL }, 0 },
+    { "shrQ",	{ VexGv, Ev, CL }, 0 },
+    { "shlQ",	{ VexGv, Ev, CL }, 0 },
+    { "sarQ",	{ VexGv, Ev, CL }, 0 },
+  },
   /* REG_EVEX_MAP4_D8_PREFIX_1 */
   {
     { "aesencwide128kl",	{ M }, 0 },
@@ -63,3 +162,27 @@ 
     { "aesencwide256kl",	{ M }, 0 },
     { "aesdecwide256kl",	{ M }, 0 },
   },
+  /* REG_EVEX_MAP4_F6 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "notA",	{ VexGb, Eb }, 0 },
+    { "negA",	{ VexGb, Eb }, 0 },
+  },
+  /* REG_EVEX_MAP4_F7 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "notQ",	{ VexGv, Ev }, 0 },
+    { "negQ",	{ VexGv, Ev }, 0 },
+  },
+  /* REG_EVEX_MAP4_FE */
+  {
+    { "incA",   { VexGb ,Eb }, 0 },
+    { "decA",   { VexGb ,Eb }, 0 },
+  },
+  /* REG_EVEX_MAP4_FF */
+  {
+    { "incQ",   { VexGv ,Ev }, 0 },
+    { "decQ",   { VexGv ,Ev }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 2a8c80c5200..1787be6dbf0 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -875,64 +875,64 @@  static const struct dis386 evex_table[][256] = {
   /* EVEX_MAP4_ */
   {
     /* 00 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "addB",             { VexGb, Eb, Gb }, 0  },
+    { "addS",             { VexGv, Ev, Gv }, 0 },
+    { "addB",             { VexGb, Gb, EbS }, 0 },
+    { "addS",             { VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 08 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "orB",		{ VexGb, Eb, Gb }, 0 },
+    { "orS",		{ VexGv, Ev, Gv }, 0 },
+    { "orB",		{ VexGb, Gb, EbS }, 0 },
+    { "orS",		{ VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 10 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "adcB",		{ VexGb, Eb, Gb }, 0 },
+    { "adcS",		{ VexGv, Ev, Gv }, 0 },
+    { "adcB",		{ VexGb, Gb, EbS }, 0 },
+    { "adcS",		{ VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 18 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "sbbB",		{ VexGb, Eb, Gb }, 0 },
+    { "sbbS",		{ VexGv, Ev, Gv }, 0 },
+    { "sbbB",		{ VexGb, Gb, EbS }, 0 },
+    { "sbbS",		{ VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 20 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "andB",		{ VexGb, Eb, Gb }, 0 },
+    { "andS",		{ VexGv, Ev, Gv }, 0 },
+    { "andB",		{ VexGb, Gb, EbS }, 0 },
+    { "andS",		{ VexGv, Gv, EvS }, 0 },
+    { "shldS",		{ VexGv, Ev, Gv, Ib }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 28 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "subB",		{ VexGb, Eb, Gb }, 0 },
+    { "subS",		{ VexGv, Ev, Gv }, 0 },
+    { "subB",		{ VexGb, Gb, EbS }, 0 },
+    { "subS",		{ VexGv, Gv, EvS }, 0 },
+    { "shrdS",		{ VexGv, Ev, Gv, Ib }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 30 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "xorB",		{ VexGb, Eb, Gb }, 0 },
+    { "xorS",		{ VexGv, Ev, Gv }, 0 },
+    { "xorB",		{ VexGb, Gb, EbS }, 0 },
+    { "xorS",		{ VexGv, Gv, EvS }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -947,23 +947,23 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 40 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "cmovoS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovnoS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovbS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovaeS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmoveS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovneS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovbeS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovaS",		{ VexGv, Gv, Ev }, 0 },
     /* 48 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "cmovsS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovnsS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovpS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovnpS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovlS",		{ VexGv, Gv, Ev }, 0 },
+    { "cmovgeS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovleS",	{ VexGv, Gv, Ev }, 0 },
+    { "cmovgS",		{ VexGv, Gv, Ev }, 0 },
     /* 50 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1019,10 +1019,10 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 80 */
+    { REG_TABLE (REG_EVEX_MAP4_80) },
+    { REG_TABLE (REG_EVEX_MAP4_81) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_83) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1060,7 +1060,7 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "shldS",		{ VexGv, Ev, Gv, CL }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
     /* A8 */
@@ -1069,9 +1069,9 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
+    { "shrdS",		{ VexGv, Ev, Gv, CL }, 0 },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "imulS",		{ VexGv, Gv, Ev }, 0 },
     /* B0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1091,8 +1091,8 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* C0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_C0) },
+    { REG_TABLE (REG_EVEX_MAP4_C1) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1109,10 +1109,10 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* D0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_D0) },
+    { REG_TABLE (REG_EVEX_MAP4_D1) },
+    { REG_TABLE (REG_EVEX_MAP4_D2) },
+    { REG_TABLE (REG_EVEX_MAP4_D3) },
     { "sha1rnds4", { XM, EXxmm, Ib }, 0 },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1151,8 +1151,8 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_F6) },
+    { REG_TABLE (REG_EVEX_MAP4_F7) },
     /* F8 */
     { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
     { MOD_TABLE (MOD_EVEX_MAP4_F9) },
@@ -1160,8 +1160,8 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { PREFIX_TABLE (PREFIX_EVEX_MAP4_FC) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_FE) },
+    { REG_TABLE (REG_EVEX_MAP4_FF) },
   },
   /* EVEX_MAP5_ */
   {
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index c8f3cfb8149..c702fd9e756 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -571,6 +571,8 @@  fetch_error (const instr_info *ins)
 #define VexGatherD { OP_VEX, vex_vsib_d_w_dq_mode }
 #define VexGatherQ { OP_VEX, vex_vsib_q_w_dq_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define VexGb { OP_VEX, b_mode }
+#define VexGv { OP_VEX, v_mode }
 #define VexTmm { OP_VEX, tmm_mode }
 #define XMVexI4 { OP_REG_VexI4, x_mode }
 #define XMVexScalarI4 { OP_REG_VexI4, scalar_mode }
@@ -883,7 +885,20 @@  enum
   REG_EVEX_0F38C6_L_2,
   REG_EVEX_0F38C7_L_2,
   REG_EVEX_0F38F3_L_0,
-  REG_EVEX_MAP4_D8_PREFIX_1
+  REG_EVEX_MAP4_80,
+  REG_EVEX_MAP4_81,
+  REG_EVEX_MAP4_83,
+  REG_EVEX_MAP4_C0,
+  REG_EVEX_MAP4_C1,
+  REG_EVEX_MAP4_D0,
+  REG_EVEX_MAP4_D1,
+  REG_EVEX_MAP4_D2,
+  REG_EVEX_MAP4_D3,
+  REG_EVEX_MAP4_D8_PREFIX_1,
+  REG_EVEX_MAP4_F6,
+  REG_EVEX_MAP4_F7,
+  REG_EVEX_MAP4_FE,
+  REG_EVEX_MAP4_FF,
 };
 
 enum
@@ -9070,6 +9085,14 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	  ins->rex &= ~REX_B;
 	  ins->rex2 &= ~REX_R;
 	}
+      if (ins->evex_type == evex_from_legacy)
+	{
+	  if (ins->vex.ll || ins->vex.zeroing
+	      || (!ins->vex.b && (ins->vex.register_specifier
+				  || !ins->vex.v)))
+	    return &bad_opcode;
+	  ins->rex |= REX_OPCODE;
+	}
 
       ins->need_vex = 4;
       ins->codep++;
@@ -9080,7 +9103,7 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	return &err_opcode;
 
       /* Set vector length.  */
-      if (ins->modrm.mod == 3 && ins->vex.b)
+      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type == evex_default)
 	ins->vex.length = 512;
       else
 	{
@@ -10994,7 +11017,7 @@  print_displacement (instr_info *ins, bfd_signed_vma val)
 static void
 intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.b)
+  if (ins->vex.b && ins->evex_type != evex_from_legacy)
     {
       if (!ins->vex.no_broadcast)
 	switch (bytemode)
@@ -11928,7 +11951,8 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	ins->vex.no_broadcast = true;
 
       if (!ins->vex.no_broadcast
-	  && (!ins->intel_syntax || !(ins->evex_used & EVEX_len_used)))
+	  && (!ins->intel_syntax || !(ins->evex_used & EVEX_len_used))
+	  && ins->evex_type == evex_default)
 	{
 	  if (bytemode == xh_mode)
 	    {
@@ -13280,6 +13304,14 @@  OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (!ins->need_vex)
     return true;
 
+  if (ins->evex_type == evex_from_legacy)
+    {
+      if (ins->vex.b)
+	ins->evex_used |= EVEX_b_used;
+      else
+	 return true;
+    }
+
   reg = ins->vex.register_specifier;
   ins->vex.register_specifier = 0;
   if (ins->address_mode != mode_64bit)
@@ -13371,12 +13403,19 @@  OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  names = att_names_xmm;
 	  ins->evex_used |= EVEX_len_used;
 	  break;
+	case v_mode:
 	case dq_mode:
 	  if (ins->rex & REX_W)
 	    names = att_names64;
+	  else if (bytemode == v_mode
+		   && !(sizeflag & DFLAG))
+	    names = att_names16;
 	  else
 	    names = att_names32;
 	  break;
+	case b_mode:
+	  names = att_names8rex;
+	  break;
 	case mask_bd_mode:
 	case mask_mode:
 	  if (reg > 0x7)
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 9dd5625f54d..f36a8da5cbe 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -960,6 +960,7 @@  typedef struct insn_template
 /* The next value is arbitrary, as long as it's non-zero and distinct
    from all other values above.  */
 #define Opcode_VexW	0xf /* Operand order controlled by VEX.W. */
+#define Opcode_APX_NDDD	0x11 /* Direction bit for APX NDD insns. */
 
   /* how many operands */
   unsigned int operands:3;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 791a9fe0177..4bb0c9f4906 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -286,17 +286,25 @@  add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg3
 add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+add, 0x0, APX_F|x64, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+add, 0x83/0, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+add, 0x80/0, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 
 inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
 inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+inc, 0xfe/0, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, {Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 
 sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sub, 0x2c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sub, 0x28, APX_F|x64, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVV|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
+sub, 0x83/5, APX_F|x64, Modrm|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+sub, 0x80/5, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
 dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+dec, 0xfe/1, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
@@ -305,6 +313,9 @@  sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|R
 sbb, 0x18, APX_F|x64, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sbb, 0x83/3, APX_F|x64, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sbb, 0x80/3, APX_F|x64, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x18, APX_F|x64, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sbb, 0x83/3, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+sbb, 0x80/3, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 cmp, 0x38, 0, D|W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 cmp, 0x83/7, 0, Modrm|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
@@ -319,16 +330,25 @@  and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|
 and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+and, 0x20, APX_F|x64, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVV|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+and, 0x83/4, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+and, 0x80/4, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+or, 0x8, APX_F|x64, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVV|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+or, 0x83/1, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+or, 0x80/1, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+xor, 0x30, APX_F|x64, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVV|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+xor, 0x83/6, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+xor, 0x80/6, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 // clr with 1 operand is really xor with 2 operands.
 clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }
@@ -340,11 +360,16 @@  adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|R
 adc, 0x10, APX_F|x64, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 adc, 0x83/2, APX_F|x64, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 adc, 0x80/2, APX_F|x64, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x10, APX_F|x64, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+adc, 0x83/2, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+adc, 0x80/2, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+neg, 0xf6/3, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 not, 0xf6/2, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+not, 0xf6/2, APX_F|x64, W|Modrm|CheckOperandSize|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 aaa, 0x37, No64, NoSuf, {}
 aas, 0x3f, No64, NoSuf, {}
@@ -378,6 +403,7 @@  cqto, 0x99, x64, Size64|NoSuf, {}
 mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64 }
+imul, 0xaf, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 // imul with 2 operands mimics imul with 3 by putting the register in
@@ -395,11 +421,19 @@  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|
 rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rol, 0xd0/0, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xc0/0, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xd2/0, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rol, 0xd0/0, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xc0/1, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xd2/1, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+ror, 0xd0/1, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xc0/1, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xd2/1, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+ror, 0xd0/1, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xc0/2, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
@@ -409,6 +443,10 @@  rcl, 0xd0/2, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg
 rcl, 0xc0/2, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xd2/2, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcl, 0xd0/2, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xc0/2, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xd2/2, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcl, 0xd0/2, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xc0/3, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
@@ -418,34 +456,60 @@  rcr, 0xd0/3, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg
 rcr, 0xc0/3, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xd2/3, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 rcr, 0xd0/3, APX_F|x64, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xc0/3, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xd2/3, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+rcr, 0xd0/3, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sal, 0xd0/4, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xc0/4, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xd2/4, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sal, 0xd0/4, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shl, 0xd0/4, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xc0/4, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xd2/4, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shl, 0xd0/4, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xc0/5, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xd2/5, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shr, 0xd0/5, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xc0/5, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xd2/5, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+shr, 0xd0/5, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xc0/7, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xd2/7, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sar, 0xd0/7, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xc0/7, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xd2/7, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
+sar, 0xd0/7, APX_F|x64, W|Modrm|No_sSuf|CheckOperandSize|VexVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 
 shld, 0xfa4, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+shld, 0x24, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shld, 0xa5, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shld, 0xa5, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+shrd, 0x2c, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shrd, 0xad, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+shrd, 0xad, APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Control transfer instructions.
 call, 0xe8, No64, JumpDword|DefaultSize|No_bSuf|No_sSuf|No_qSuf|BNDPrefixOk, { Disp16|Disp32 }
@@ -951,6 +1015,7 @@  ud2b, 0xfb9, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|U
 ud0, 0xfff, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 cmov<cc>, 0xf4<cc:opc>, CMOV, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+cmov<cc>, 0x4<cc:opc>, CMOV|APX_F|x64, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 
 fcmovb, 0xda/0, i687, Modrm|NoSuf, { FloatReg, FloatAcc }
 fcmovnae, 0xda/0, i687, Modrm|NoSuf, { FloatReg, FloatAcc }
@@ -2044,8 +2109,10 @@  xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
 // Multy-precision Add Carry, rdseed instructions.
 adcx, 0x660f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 adcx, 0x6666, ADX|APX_F|x64, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adcx, 0x6666, ADX|APX_F|x64, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adox, 0xf30f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 adox, 0xf366, ADX|APX_F|x64, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adox, 0xf366, ADX|APX_F|x64, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVV|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 rdseed, 0xfc7/7, RdSeed, Modrm|NoSuf, { Reg16|Reg32|Reg64 }
 
 // SMAP instructions.