[v3,2/2] Support Intel AMX-AVX512

Message ID 20250109060052.1941533-3-haochen.jiang@intel.com
State New
Headers
Series Support Diamond Rapids remaining AMX ISAs |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply

Commit Message

Jiang, Haochen Jan. 9, 2025, 6 a.m. UTC
  Note: The encoding issue on tcvtrowps2[bf16,ph][h,l] for Imm8
seems more and more unlikely to change. It is not a good design and 
could be treated as somehow a mistake (A real mistake would be the
insn get implemented wrongly with unexpected long latency, so I called
it somehow a mistake), but too late to change, which is really a shame.

---

Changes in v3:

  - Adjust the GPR mode handle in OP_VEX and combining the switch
    with the previous switch since they are all doing switch on
    bytemode.

---

This patch will support AMX-AVX512. In disassmbler, we pull out all
GPR mode out of the vex length switch to make it more general.

---

Changes in v2:

  - Pull out all GPR mode out of vex length switch in OP_VEX to make
    it more general.
  - Remove invalid test for 32-bit.
  - Reuse VexGdq for operands.
  - Update the mnemonics from tcvtrowps2pbf16[h,l] to tcvtrowps2bf16[h,l]
    according to ISE056.

---

gas/ChangeLog:

	* NEWS: Mention the full support on DMR AMX ISAs.
	* config/tc-i386.c: Add amx_avx512.
	* doc/c-i386.texi: Document .amx_avx512.
	* testsuite/gas/i386/x86-64.exp: Run AMX-AVX512 tests.
	* testsuite/gas/i386/x86-64-amx-avx512-intel.d: New test.
	* testsuite/gas/i386/x86-64-amx-avx512.d: Ditto.
	* testsuite/gas/i386/x86-64-amx-avx512.s: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-len.h: Add EVEX_LEN_0F384A_X86_64_W_0,
	EVEX_LEN_0F386D_X86_64_W_0, EVEX_LEN_0F3A07_X86_64_W_0,
	EVEX_LEN_0F3A77_X86_64_W_0.
	* i386-dis-evex-prefix.h: Add PREFIX_EVEX_0F384A_W_0_L_2,
	PREFIX_EVEX_0F386D_W_0_L_2, PREFIX_EVEX_0F3A07_W_0_L_2,
	PREFIX_EVEX_0F3A77_W_0_L_2.
	* i386-dis-evex-w.h: Add EVEX_W_0F384A_X86_64, EVEX_W_0F386D_X86_64,
	EVEX_W_0F3A07_X86_64, EVEX_W_0F3A77_X86_64.
	* i386-dis-evex-x86-64.h: Add X86_64_EVEX_0F384A, X86_64_EVEX_0F386D,
	X86_64_EVEX_0F3A07, X86_64_EVEX_0F3A77.
	* i386-dis-evex.h: Ditto.
	* i386-dis.c (EVEX_LEN_0F384A_X86_64_W_0): New.
	(EVEX_LEN_0F386D_X86_64_W_0): Ditto.
	(EVEX_LEN_0F3A07_X86_64_W_0): Ditto.
	(EVEX_LEN_0F3A77_X86_64_W_0): Ditto.
	(MOD_EVEX_0F384A_X86_64_W_0): Ditto.
	(MOD_EVEX_0F386D_X86_64_W_0): Ditto.
	(MOD_EVEX_0F3A07_X86_64_W_0): Ditto.
	(MOD_EVEX_0F3A77_X86_64_W_0): Ditto.
	(PREFIX_EVEX_0F384A_W_0_L_2): Ditto.
	(PREFIX_EVEX_0F386D_W_0_L_2): Ditto.
	(PREFIX_EVEX_0F3A07_W_0_L_2): Ditto.
	(PREFIX_EVEX_0F3A77_W_0_L_2): Ditto.
	(EVEX_W_0F384A_X86_64): Ditto.
	(EVEX_W_0F386D_X86_64): Ditto.
	(EVEX_W_0F3A07_X86_64): Ditto.
	(EVEX_W_0F3A77_X86_64): Ditto.
	(X86_64_EVEX_0F384A): Ditto.
	(X86_64_EVEX_0F386D): Ditto.
	(X86_64_EVEX_0F3A07): Ditto.
	(X86_64_EVEX_0F3A77): Ditto.
	(OP_VEX): Pull out all GPR mode out of the vector length switch.
	* i386-gen.c (isa_dependencies): Add AMX-AVX512.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuAMX_AVX512): New.
	(i386_cpu_flags): Add cpuamx_avx512.
	* i386-opc.tbl: Add AMX-AVX512 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                      |    3 +
 gas/config/tc-i386.c                          |    1 +
 gas/doc/c-i386.texi                           |    4 +-
 .../gas/i386/x86-64-amx-avx512-intel.d        |   35 +
 gas/testsuite/gas/i386/x86-64-amx-avx512.d    |   34 +
 gas/testsuite/gas/i386/x86-64-amx-avx512.s    |   55 +
 gas/testsuite/gas/i386/x86-64.exp             |    2 +
 opcodes/i386-dis-evex-len.h                   |   23 +
 opcodes/i386-dis-evex-prefix.h                |   27 +
 opcodes/i386-dis-evex-w.h                     |   12 +
 opcodes/i386-dis-evex-x86-64.h                |   15 +
 opcodes/i386-dis-evex.h                       |    6 +-
 opcodes/i386-dis.c                            |   50 +-
 opcodes/i386-gen.c                            |    3 +
 opcodes/i386-init.h                           |  722 ++---
 opcodes/i386-mnem.h                           | 2570 +++++++++--------
 opcodes/i386-opc.h                            |    3 +
 opcodes/i386-opc.tbl                          |   15 +
 opcodes/i386-tbl.h                            |  409 ++-
 19 files changed, 2207 insertions(+), 1782 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-avx512-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-avx512.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-avx512.s
  

Comments

Jan Beulich Jan. 9, 2025, 1:19 p.m. UTC | #1
On 09.01.2025 07:00, Haochen Jiang wrote:
> --- a/gas/NEWS
> +++ b/gas/NEWS
> @@ -1,5 +1,8 @@
>  -*- text -*-
>  
> +* Add support for the x86 Intel Dimaond Rapids AMX instructions, including
> +AMX-AVX512, AMX-FP8, AMX-MOVRS, AMX-TF32 and AMX-TRANSPOSE.

Ah, here's the (combined) NEWS entry. That's certainly fine. Two nits though:
For one the continued line wants to start with two blanks. And then it's
"Diamond" aiui.

Apart from this (and within the constraints of the seemingly flawed spec)
this is okay then.

Jan
  

Patch

diff --git a/gas/NEWS b/gas/NEWS
index 2b31fdb1515..920fc40c7de 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,8 @@ 
 -*- text -*-
 
+* Add support for the x86 Intel Dimaond Rapids AMX instructions, including
+AMX-AVX512, AMX-FP8, AMX-MOVRS, AMX-TF32 and AMX-TRANSPOSE.
+
 * Add support for the x86 Intel AVX10.2 instructions.
 
 * Add support for the x86 Intel SM4 AVX10.2 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 81597568443..d274625b0b9 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1186,6 +1186,7 @@  static const arch_entry cpu_arch[] =
   SUBARCH (amx_tf32, AMX_TF32, ANY_AMX_TF32, false),
   SUBARCH (amx_fp8, AMX_FP8, ANY_AMX_FP8, false),
   SUBARCH (amx_movrs, AMX_MOVRS, ANY_AMX_MOVRS, false),
+  SUBARCH (amx_avx512, AMX_AVX512, ANY_AMX_AVX512, false),
   SUBARCH (amx_tile, AMX_TILE, ANY_AMX_TILE, false),
   SUBARCH (movdiri, MOVDIRI, MOVDIRI, false),
   SUBARCH (movdir64b, MOVDIR64B, MOVDIR64B, false),
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index ddf0393ec95..bffc381b96c 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -232,6 +232,7 @@  accept various extension mnemonics.  For example,
 @code{amx_tf32},
 @code{amx_fp8}
 @code{amx_movrs},
+@code{amx_avx512},
 @code{amx_tile},
 @code{vmx},
 @code{vmfunc},
@@ -1705,7 +1706,8 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
 @item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_fp16}
 @item @samp{.amx_complex} @tab @samp{.amx_transpose} @tab @samp{.amx_tf32}
-@item @samp{.amx_fp8} @tab @samp{.amx_movrs} @tab @samp{.amx_tile}
+@item @samp{.amx_fp8} @tab @samp{.amx_movrs} @tab @samp{.amx_avx512}
+@item @samp{.amx_tile}
 @item @samp{.kl} @tab @samp{.widekl} @tab @samp{.uintr} @tab @samp{.hreset}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
diff --git a/gas/testsuite/gas/i386/x86-64-amx-avx512-intel.d b/gas/testsuite/gas/i386/x86-64-amx-avx512-intel.d
new file mode 100644
index 00000000000..33e6d01a558
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-avx512-intel.d
@@ -0,0 +1,35 @@ 
+#objdump: -dw -Mintel
+#name: x86_64 AMX-AVX512 insns (Intel disassembly)
+#source: x86-64-amx-avx512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+#...
+[a-f0-9]+ <_intel>:
+\s*[a-f0-9]+:\s*62 62 6e 48 4a f5\s+tcvtrowd2ps zmm30,tmm5,edx
+\s*[a-f0-9]+:\s*62 62 6e 48 4a f2\s+tcvtrowd2ps zmm30,tmm2,edx
+\s*[a-f0-9]+:\s*62 63 7e 48 07 f5 7b\s+tcvtrowd2ps zmm30,tmm5,0x7b
+\s*[a-f0-9]+:\s*62 63 7e 48 07 f2 7b\s+tcvtrowd2ps zmm30,tmm2,0x7b
+\s*[a-f0-9]+:\s*62 62 6f 48 6d f5\s+tcvtrowps2bf16h zmm30,tmm5,edx
+\s*[a-f0-9]+:\s*62 62 6f 48 6d f2\s+tcvtrowps2bf16h zmm30,tmm2,edx
+\s*[a-f0-9]+:\s*62 63 7f 48 07 f5 7b\s+tcvtrowps2bf16h zmm30,tmm5,0x7b
+\s*[a-f0-9]+:\s*62 63 7f 48 07 f2 7b\s+tcvtrowps2bf16h zmm30,tmm2,0x7b
+\s*[a-f0-9]+:\s*62 62 6e 48 6d f5\s+tcvtrowps2bf16l zmm30,tmm5,edx
+\s*[a-f0-9]+:\s*62 62 6e 48 6d f2\s+tcvtrowps2bf16l zmm30,tmm2,edx
+\s*[a-f0-9]+:\s*62 63 7e 48 77 f5 7b\s+tcvtrowps2bf16l zmm30,tmm5,0x7b
+\s*[a-f0-9]+:\s*62 63 7e 48 77 f2 7b\s+tcvtrowps2bf16l zmm30,tmm2,0x7b
+\s*[a-f0-9]+:\s*62 62 6c 48 6d f5\s+tcvtrowps2phh zmm30,tmm5,edx
+\s*[a-f0-9]+:\s*62 62 6c 48 6d f2\s+tcvtrowps2phh zmm30,tmm2,edx
+\s*[a-f0-9]+:\s*62 63 7c 48 07 f5 7b\s+tcvtrowps2phh zmm30,tmm5,0x7b
+\s*[a-f0-9]+:\s*62 63 7c 48 07 f2 7b\s+tcvtrowps2phh zmm30,tmm2,0x7b
+\s*[a-f0-9]+:\s*62 62 6d 48 6d f5\s+tcvtrowps2phl zmm30,tmm5,edx
+\s*[a-f0-9]+:\s*62 62 6d 48 6d f2\s+tcvtrowps2phl zmm30,tmm2,edx
+\s*[a-f0-9]+:\s*62 63 7f 48 77 f5 7b\s+tcvtrowps2phl zmm30,tmm5,0x7b
+\s*[a-f0-9]+:\s*62 63 7f 48 77 f2 7b\s+tcvtrowps2phl zmm30,tmm2,0x7b
+\s*[a-f0-9]+:\s*62 62 6d 48 4a f5\s+tilemovrow zmm30,tmm5,edx
+\s*[a-f0-9]+:\s*62 62 6d 48 4a f2\s+tilemovrow zmm30,tmm2,edx
+\s*[a-f0-9]+:\s*62 63 7d 48 07 f5 7b\s+tilemovrow zmm30,tmm5,0x7b
+\s*[a-f0-9]+:\s*62 63 7d 48 07 f2 7b\s+tilemovrow zmm30,tmm2,0x7b
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-avx512.d b/gas/testsuite/gas/i386/x86-64-amx-avx512.d
new file mode 100644
index 00000000000..d2f8ac6e51e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-avx512.d
@@ -0,0 +1,34 @@ 
+#objdump: -dw
+#name: x86_64 AMX-AVX512 insns
+#source: x86-64-amx-avx512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 62 6e 48 4a f5\s+tcvtrowd2ps %edx,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 62 6e 48 4a f2\s+tcvtrowd2ps %edx,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 63 7e 48 07 f5 7b\s+tcvtrowd2ps \$0x7b,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 63 7e 48 07 f2 7b\s+tcvtrowd2ps \$0x7b,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 62 6f 48 6d f5\s+tcvtrowps2bf16h %edx,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 62 6f 48 6d f2\s+tcvtrowps2bf16h %edx,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 63 7f 48 07 f5 7b\s+tcvtrowps2bf16h \$0x7b,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 63 7f 48 07 f2 7b\s+tcvtrowps2bf16h \$0x7b,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 62 6e 48 6d f5\s+tcvtrowps2bf16l %edx,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 62 6e 48 6d f2\s+tcvtrowps2bf16l %edx,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 63 7e 48 77 f5 7b\s+tcvtrowps2bf16l \$0x7b,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 63 7e 48 77 f2 7b\s+tcvtrowps2bf16l \$0x7b,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 62 6c 48 6d f5\s+tcvtrowps2phh %edx,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 62 6c 48 6d f2\s+tcvtrowps2phh %edx,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 63 7c 48 07 f5 7b\s+tcvtrowps2phh \$0x7b,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 63 7c 48 07 f2 7b\s+tcvtrowps2phh \$0x7b,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 62 6d 48 6d f5\s+tcvtrowps2phl %edx,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 62 6d 48 6d f2\s+tcvtrowps2phl %edx,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 63 7f 48 77 f5 7b\s+tcvtrowps2phl \$0x7b,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 63 7f 48 77 f2 7b\s+tcvtrowps2phl \$0x7b,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 62 6d 48 4a f5\s+tilemovrow %edx,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 62 6d 48 4a f2\s+tilemovrow %edx,%tmm2,%zmm30
+\s*[a-f0-9]+:\s*62 63 7d 48 07 f5 7b\s+tilemovrow \$0x7b,%tmm5,%zmm30
+\s*[a-f0-9]+:\s*62 63 7d 48 07 f2 7b\s+tilemovrow \$0x7b,%tmm2,%zmm30
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-avx512.s b/gas/testsuite/gas/i386/x86-64-amx-avx512.s
new file mode 100644
index 00000000000..6df493430a0
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-avx512.s
@@ -0,0 +1,55 @@ 
+# Check 64bit AMX-AVX512 instructions
+
+	.text
+_start:
+	tcvtrowd2ps	%edx, %tmm5, %zmm30
+	tcvtrowd2ps	%edx, %tmm2, %zmm30
+	tcvtrowd2ps	$123, %tmm5, %zmm30
+	tcvtrowd2ps	$123, %tmm2, %zmm30
+	tcvtrowps2bf16h	%edx, %tmm5, %zmm30
+	tcvtrowps2bf16h	%edx, %tmm2, %zmm30
+	tcvtrowps2bf16h	$123, %tmm5, %zmm30
+	tcvtrowps2bf16h	$123, %tmm2, %zmm30
+	tcvtrowps2bf16l	%edx, %tmm5, %zmm30
+	tcvtrowps2bf16l	%edx, %tmm2, %zmm30
+	tcvtrowps2bf16l	$123, %tmm5, %zmm30
+	tcvtrowps2bf16l	$123, %tmm2, %zmm30
+	tcvtrowps2phh	%edx, %tmm5, %zmm30
+	tcvtrowps2phh	%edx, %tmm2, %zmm30
+	tcvtrowps2phh	$123, %tmm5, %zmm30
+	tcvtrowps2phh	$123, %tmm2, %zmm30
+	tcvtrowps2phl	%edx, %tmm5, %zmm30
+	tcvtrowps2phl	%edx, %tmm2, %zmm30
+	tcvtrowps2phl	$123, %tmm5, %zmm30
+	tcvtrowps2phl	$123, %tmm2, %zmm30
+	tilemovrow	%edx, %tmm5, %zmm30
+	tilemovrow	%edx, %tmm2, %zmm30
+	tilemovrow	$123, %tmm5, %zmm30
+	tilemovrow	$123, %tmm2, %zmm30
+
+_intel:
+	.intel_syntax noprefix
+	tcvtrowd2ps	zmm30, tmm5, edx
+	tcvtrowd2ps	zmm30, tmm2, edx
+	tcvtrowd2ps	zmm30, tmm5, 123
+	tcvtrowd2ps	zmm30, tmm2, 123
+	tcvtrowps2bf16h	zmm30, tmm5, edx
+	tcvtrowps2bf16h	zmm30, tmm2, edx
+	tcvtrowps2bf16h	zmm30, tmm5, 123
+	tcvtrowps2bf16h	zmm30, tmm2, 123
+	tcvtrowps2bf16l	zmm30, tmm5, edx
+	tcvtrowps2bf16l	zmm30, tmm2, edx
+	tcvtrowps2bf16l	zmm30, tmm5, 123
+	tcvtrowps2bf16l	zmm30, tmm2, 123
+	tcvtrowps2phh	zmm30, tmm5, edx
+	tcvtrowps2phh	zmm30, tmm2, edx
+	tcvtrowps2phh	zmm30, tmm5, 123
+	tcvtrowps2phh	zmm30, tmm2, 123
+	tcvtrowps2phl	zmm30, tmm5, edx
+	tcvtrowps2phl	zmm30, tmm2, edx
+	tcvtrowps2phl	zmm30, tmm5, 123
+	tcvtrowps2phl	zmm30, tmm2, 123
+	tilemovrow	zmm30, tmm5, edx
+	tilemovrow	zmm30, tmm2, edx
+	tilemovrow	zmm30, tmm5, 123
+	tilemovrow	zmm30, tmm2, 123
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index c99488935f5..95c78f4ed7a 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -538,6 +538,8 @@  run_dump_test "x86-64-amx-fp8-bad"
 run_dump_test "x86-64-amx-movrs"
 run_dump_test "x86-64-amx-movrs-intel"
 run_list_test "x86-64-amx-movrs-inval"
+run_dump_test "x86-64-amx-avx512"
+run_dump_test "x86-64-amx-avx512-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index 2b4361f7ae6..434e051bb63 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -47,6 +47,8 @@  static const struct dis386 evex_len_table[][3] = {
   /* EVEX_LEN_0F384A_X86_64_W_0 */
   {
     { X86_64_EVEX_PFX_TABLE (PREFIX_VEX_0F384A_X86_64_W_0_L_0) },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_0F384A_X86_64_W_0_L_2) },
   },
 
   /* EVEX_LEN_0F385A */
@@ -63,6 +65,13 @@  static const struct dis386 evex_len_table[][3] = {
     { VEX_W_TABLE (EVEX_W_0F385B_L_2) },
   },
 
+  /* EVEX_LEN_0F386D_X86_64_W_0_M_1 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_0F386D_X86_64_W_0_L_2) },
+  },
+
   /* EVEX_LEN_0F38C6 */
   {
     { Bad_Opcode },
@@ -91,6 +100,13 @@  static const struct dis386 evex_len_table[][3] = {
     { VEX_W_TABLE (VEX_W_0F3A01_L_1) },
   },
 
+  /* EVEX_LEN_0F3A07_X86_64_W_0 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A07_X86_64_W_0_L_2) },
+  },
+
   /* EVEX_LEN_0F3A18 */
   {
     { Bad_Opcode },
@@ -161,6 +177,13 @@  static const struct dis386 evex_len_table[][3] = {
     { VEX_W_TABLE (EVEX_W_0F3A43_L_n) },
   },
 
+  /* EVEX_LEN_0F3A77_X86_64_W_0 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_0F3A77_X86_64_W_0_L_2) },
+  },
+
   /* EVEX_LEN_MAP5_6E */
   {
     { PREFIX_TABLE (PREFIX_EVEX_MAP5_6E_L_0) },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 16fb2698390..a5b652a3420 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -243,6 +243,12 @@ 
     { VEX_W_TABLE (EVEX_W_0F383A_P_1) },
     { "%XEvpminuw",	{ XM, Vex, EXx }, 0 },
   },
+  /* PREFIX_EVEX_0F384A_W_0_L_2 */
+  {
+    { Bad_Opcode },
+    { "tcvtrowd2ps",	{ XM, Rtmm, VexGdq }, 0 },
+    { "tilemovrow",	{ XM, Rtmm, VexGdq }, 0 },
+  },
   /* PREFIX_EVEX_0F3852 */
   {
     { "vdpphp%XS",	{ XM, Vex, EXx }, 0 },
@@ -264,6 +270,13 @@ 
     { Bad_Opcode },
     { "vp2intersectY%DQ", { MaskG, Vex, EXx, EXxEVexS }, 0 },
   },
+  /* PREFIX_EVEX_0F386D_W_0_L_2 */
+  {
+    { "tcvtrowps2phh",	{ XM, Rtmm, VexGdq }, 0 },
+    { "tcvtrowps2bf16l",	{ XM, Rtmm, VexGdq }, 0 },
+    { "tcvtrowps2phl",	{ XM, Rtmm, VexGdq }, 0 },
+    { "tcvtrowps2bf16h",	{ XM, Rtmm, VexGdq }, 0 },
+  },
   /* PREFIX_EVEX_0F3872 */
   {
     { Bad_Opcode },
@@ -306,6 +319,13 @@ 
     { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
     { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
   },
+  /* PREFIX_EVEX_0F3A07_W_0_L_2 */
+  {
+    { "tcvtrowps2phh",	{ XM, Rtmm, Ib }, 0 },
+    { "tcvtrowd2ps",	{ XM, Rtmm, Ib }, 0 },
+    { "tilemovrow",	{ XM, Rtmm, Ib }, 0 },
+    { "tcvtrowps2bf16h",	{ XM, Rtmm, Ib }, 0 },
+  },
   /* PREFIX_EVEX_0F3A08 */
   {
     { "vrndscalep%XH",  { XM, EXxh, EXxEVexS, Ib }, 0 },
@@ -377,6 +397,13 @@ 
     { Bad_Opcode },
     { "vfpclasss%XW",	{ MaskG, EXdq, Ib }, 0 },
   },
+  /* PREFIX_EVEX_0F3A77_W_0_L_2 */
+  {
+    { Bad_Opcode },
+    { "tcvtrowps2bf16l",	{ XM, Rtmm, Ib }, 0 },
+    { Bad_Opcode },
+    { "tcvtrowps2phl",	{ XM, Rtmm, Ib }, 0 },
+  },
   /* PREFIX_EVEX_0F3AC2 */
   {
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
diff --git a/opcodes/i386-dis-evex-w.h b/opcodes/i386-dis-evex-w.h
index 8c73320d633..c15469210a5 100644
--- a/opcodes/i386-dis-evex-w.h
+++ b/opcodes/i386-dis-evex-w.h
@@ -365,6 +365,10 @@ 
     { "vbroadcasti32x8",	{ XM, Mymm }, PREFIX_DATA },
     { "vbroadcasti64x4",	{ XM, Mymm }, PREFIX_DATA },
   },
+  /* EVEX_W_0F386D_X86_64 */
+  {
+    { EVEX_LEN_TABLE (EVEX_LEN_0F386D_X86_64_W_0) },
+  },
   /* EVEX_W_0F3870 */
   {
     { Bad_Opcode },
@@ -388,6 +392,10 @@ 
     { Bad_Opcode },
     { "vpmultishiftqb",	{ XM, Vex, EXx }, PREFIX_DATA },
   },
+  /* EVEX_W_0F3A07_X86_64 */
+  {
+    { EVEX_LEN_TABLE (EVEX_LEN_0F3A07_X86_64_W_0) },
+  },
   /* EVEX_W_0F3A18_L_n */
   {
     { "vinsertf32x4",	{ XM, Vex, EXxmm, Ib }, PREFIX_DATA },
@@ -456,6 +464,10 @@ 
     { Bad_Opcode },
     { "vpshrdw",   { XM, Vex, EXx, Ib }, 0 },
   },
+  /* EVEX_W_0F3A77_X86_64 */
+  {
+    { EVEX_LEN_TABLE (EVEX_LEN_0F3A77_X86_64_W_0) },
+  },
   /* EVEX_W_MAP4_8F_R_0 */
   {
     { "pop2", { { PUSH2_POP2_Fixup, q_mode}, Eq }, NO_PREFIX },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
index 9b0d7132268..8df4ba07bc5 100644
--- a/opcodes/i386-dis-evex-x86-64.h
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -3,3 +3,18 @@ 
     { Bad_Opcode },
     { VEX_W_TABLE (EVEX_W_0F384A_X86_64) },
   },
+  /* X86_64_EVEX_0F386D */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F386D_X86_64) },
+  },
+  /* X86_64_EVEX_0F3A07 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F3A07_X86_64) },
+  },
+  /* X86_64_EVEX_0F3A77 */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F3A77_X86_64) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index fdb4e07f2e1..895d617fed2 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -415,7 +415,7 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_EVEX_0F386D) },
     { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F386E) },
     { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F386F) },
     /* 70 */
@@ -591,7 +591,7 @@  static const struct dis386 evex_table[][256] = {
     { VEX_W_TABLE (VEX_W_0F3A04) },
     { "%XEvpermilp%XD", { XM, EXx, Ib }, PREFIX_DATA },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_EVEX_0F3A07) },
     /* 08 */
     { PREFIX_TABLE (PREFIX_EVEX_0F3A08) },
     { "vrndscalep%XD", { XM, EXx, EXxEVexS, Ib }, PREFIX_DATA },
@@ -717,7 +717,7 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_EVEX_0F3A77) },
     /* 78 */
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index adbd93c0489..bd00897c88f 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1207,9 +1207,11 @@  enum
   PREFIX_EVEX_0F3838,
   PREFIX_EVEX_0F3839,
   PREFIX_EVEX_0F383A,
+  PREFIX_EVEX_0F384A_X86_64_W_0_L_2,
   PREFIX_EVEX_0F3852,
   PREFIX_EVEX_0F3853,
   PREFIX_EVEX_0F3868,
+  PREFIX_EVEX_0F386D_X86_64_W_0_L_2,
   PREFIX_EVEX_0F3872,
   PREFIX_EVEX_0F3874,
   PREFIX_EVEX_0F389A,
@@ -1217,6 +1219,7 @@  enum
   PREFIX_EVEX_0F38AA,
   PREFIX_EVEX_0F38AB,
 
+  PREFIX_EVEX_0F3A07_X86_64_W_0_L_2,
   PREFIX_EVEX_0F3A08,
   PREFIX_EVEX_0F3A0A,
   PREFIX_EVEX_0F3A26,
@@ -1228,6 +1231,7 @@  enum
   PREFIX_EVEX_0F3A57,
   PREFIX_EVEX_0F3A66,
   PREFIX_EVEX_0F3A67,
+  PREFIX_EVEX_0F3A77_X86_64_W_0_L_2,
   PREFIX_EVEX_0F3AC2,
 
   PREFIX_EVEX_MAP4_4x,
@@ -1374,6 +1378,11 @@  enum
   X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
 
   X86_64_EVEX_0F384A,
+  X86_64_EVEX_0F386D,
+  X86_64_EVEX_0F3A07,
+  X86_64_EVEX_0F3A77,
+
+  X86_64_EVEX_MAP5_6F_M_0,
 };
 
 enum
@@ -1571,10 +1580,12 @@  enum
   EVEX_LEN_0F384A_X86_64_W_0,
   EVEX_LEN_0F385A,
   EVEX_LEN_0F385B,
+  EVEX_LEN_0F386D_X86_64_W_0,
   EVEX_LEN_0F38C6,
   EVEX_LEN_0F38C7,
   EVEX_LEN_0F3A00,
   EVEX_LEN_0F3A01,
+  EVEX_LEN_0F3A07_X86_64_W_0,
   EVEX_LEN_0F3A18,
   EVEX_LEN_0F3A19,
   EVEX_LEN_0F3A1A,
@@ -1585,6 +1596,7 @@  enum
   EVEX_LEN_0F3A3A,
   EVEX_LEN_0F3A3B,
   EVEX_LEN_0F3A43,
+  EVEX_LEN_0F3A77_X86_64_W_0,
 
   EVEX_LEN_MAP5_6E,
   EVEX_LEN_MAP5_7E,
@@ -1804,12 +1816,14 @@  enum
   EVEX_W_0F3859,
   EVEX_W_0F385A_L_n,
   EVEX_W_0F385B_L_2,
+  EVEX_W_0F386D_X86_64,
   EVEX_W_0F3870,
   EVEX_W_0F3872_P_2,
   EVEX_W_0F387A,
   EVEX_W_0F387B,
   EVEX_W_0F3883,
 
+  EVEX_W_0F3A07_X86_64,
   EVEX_W_0F3A18_L_n,
   EVEX_W_0F3A19_L_n,
   EVEX_W_0F3A1A_L_2,
@@ -1824,6 +1838,7 @@  enum
   EVEX_W_0F3A43_L_n,
   EVEX_W_0F3A70,
   EVEX_W_0F3A72,
+  EVEX_W_0F3A77_X86_64,
 
   EVEX_W_MAP4_8F_R_0,
   EVEX_W_MAP4_F8_P1_M_1,
@@ -14030,6 +14045,25 @@  OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	}
 
       return true;
+
+    case v_mode:
+    case dq_mode:
+      if (ins->rex & REX_W)
+	oappend_register (ins, att_names64[reg]);
+      else if (bytemode == v_mode
+	       && !(sizeflag & DFLAG))
+	oappend_register (ins, att_names16[reg]);
+      else
+	oappend_register (ins, att_names32[reg]);
+      return true;
+
+    case b_mode:
+      oappend_register (ins, att_names8rex[reg]);
+      return true;
+
+    case q_mode:
+      oappend_register (ins, att_names64[reg]);
+      return true;
     }
 
   switch (ins->vex.length)
@@ -14041,22 +14075,6 @@  OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  names = att_names_xmm;
 	  ins->evex_used |= EVEX_len_used;
 	  break;
-	case v_mode:
-	case dq_mode:
-	  if (ins->rex & REX_W)
-	    names = att_names64;
-	  else if (bytemode == v_mode
-		   && !(sizeflag & DFLAG))
-	    names = att_names16;
-	  else
-	    names = att_names32;
-	  break;
-	case b_mode:
-	  names = att_names8rex;
-	  break;
-	case q_mode:
-	  names = att_names64;
-	  break;
 	case mask_bd_mode:
 	case mask_mode:
 	  if (reg > 0x7)
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index d2ecea2ecd5..d7e0b29bfbd 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -273,6 +273,8 @@  static const dependency isa_dependencies[] =
     "AMX_TILE" },
   { "AMX_MOVRS",
     "AMX_TILE" },
+  { "AMX_AVX512",
+    "AMX_TILE|AVX10_2" },
   { "KL",
     "SSE2" },
   { "WIDEKL",
@@ -443,6 +445,7 @@  static bitfield cpu_flags[] =
   BITFIELD (AMX_TF32),
   BITFIELD (AMX_FP8),
   BITFIELD (AMX_MOVRS),
+  BITFIELD (AMX_AVX512),
   BITFIELD (AMX_TILE),
   BITFIELD (MOVDIRI),
   BITFIELD (MOVDIR64B),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 77dad3159dd..cc303e0e818 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -256,6 +256,8 @@  enum i386_cpu
   CpuAMX_FP8,
   /* AMX-MOVRS Instructions support required.  */
   CpuAMX_MOVRS,
+  /* AMX-AVX512 Instructions support required.  */
+  CpuAMX_AVX512,
   /* AMX-TILE instructions required */
   CpuAMX_TILE,
   /* GFNI instructions required */
@@ -509,6 +511,7 @@  typedef union i386_cpu_flags
       unsigned int cpuamx_tf32:1;
       unsigned int cpuamx_fp8:1;
       unsigned int cpuamx_movrs:1;
+      unsigned int cpuamx_avx512:1;
       unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index b28ceb5be19..731d94bc128 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3243,6 +3243,21 @@  t2rpntlvw<z>rs<loc>, 0x<z:pfx>f8 | <loc:opc>, AMX_TRANSPOSE&APX_F(AMX_MOVRS), Si
 tileloaddrs, 0xf24a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
 tileloaddrst1, 0x664a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
 
+tcvtrowd2ps, 0xf34a, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM }
+tcvtrowd2ps, 0xf307, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM }
+
+tcvtrowps2bf16h, 0xf26d, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM }
+tcvtrowps2bf16h, 0xf207, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM }
+tcvtrowps2bf16l, 0xf36d, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM }
+tcvtrowps2bf16l, 0xf377, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM }
+tcvtrowps2phh, 0x6d, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM }
+tcvtrowps2phh, 0x07, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM }
+tcvtrowps2phl, 0x666d, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM }
+tcvtrowps2phl, 0xf277, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM }
+
+tilemovrow, 0x664a, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM }
+tilemovrow, 0x6607, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM }
+
 // AMX instructions end.
 
 // KEYLOCKER instructions.