[v4] Support Intel AMX-MOVRS

Message ID 20250113064852.1073417-1-haochen.jiang@intel.com
State New
Headers
Series [v4] Support Intel AMX-MOVRS |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Test passed

Commit Message

Haochen Jiang Jan. 13, 2025, 6:48 a.m. UTC
  From: "Hu, Lin1" <lin1.hu@intel.com>

Hi all,

This is the v4 patch for AMX-MOVRS, with patch descrption and changes
embedded below.

Ok for trunk?

Thx,
Haochen

---

Changes in v4:

  - Skip MOD_TABLE pass for M* operands since OP_M would check that.
  - Set AMX_TRANPOSE for "any" instead of skipping the assert.
  - Put insn entry near the earlier ones in i386-opc.tbl since they
    are quite similar.
  - Remove NEWS in Changelog since it would be added in AMX-AVX512,
    not here.

---

This patch will support AMX-MOVRS feature. Unlike all the other
AMX insns in vector space where we pass vex_len_table before
vex_w_table, we first pass vex_w_table for tileloaddrs[,t1] to
align with the order in EVEX space. The reason why we first pass
vex_w_table in EVEX space is due to AMX-AVX512, where tcvtrowd2ps
and tilemovrow with r32 shares the same opcode with tileloaddrs[,t1].
All of them have evex.w = 0 but with different evex.length. Re-doing
that shortly is not ideal.

APX_F extension is also implemented in this patch. The encoding will
be:
  - EVEX.128.NP/66.MAP5.W0 F8/F9 !(11):rrr:100 for
    T2RPNTLVW[Z0,Z1]RS[,T1] with NF=0.
  - EVEX.128.F2/66.0F38.W0 4A !(11):rrr:100 FOR TILELOADDRS[,T1] with
    NF=0.

For APX_F extension, we could not use APX_F(AMX_TRANSPOSE&AMX_MOVRS)
since the transformation could not be done. Instead, we will use
AMX_TRANSPOSE & APX_F(AMX_MOVRS). Thus, we should set AMX_TRANSPOSE
for "any" for cpu_flags in assembler. Since it will only affect the
cpu_flags_match, handle that there.

---

Changes in v3:

  - Order the APX_F extension testcase in ASCII number sort.
  - Adjust 0F384B disassembler table pass same as 0F384A.
  - Combine the APX_F assembler entry with a work around.

Changes in v2:

  - Add APX_F extension for AMX-MOVRS.
  - Add invalid testcase for sibmem.
  - Templatize the table for T2RPNTLVW[Z0,Z1]RS[,T1].

---

gas/ChangeLog:

	* config/tc-i386.c (cpu_arch): Add amx_movrs.
	(cpu_flags_match): Set any bitfield for multiple cpuid
	enabled insns.
	* doc/c-i386.texi: Document .amx_movrs.
	* testsuite/gas/i386/x86-64.exp: Run AMX-MOVRS tests.
	* testsuite/gas/i386/x86-64-amx-movrs-intel.d: New test.
	* testsuite/gas/i386/x86-64-amx-movrs-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-amx-movrs-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-amx-movrs.d: Ditto.
	* testsuite/gas/i386/x86-64-amx-movrs.s: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-len.h (EVEX_LEN_0F384A_X86_64_W_0): New.
	* i386-dis-evex-w.h (EVEX_W_0F384A_X86_64): Ditto.
	* i386-dis-evex-x86-64.h (X86_64_EVEX_0F384A): Ditto.
	* i386-dis-evex.h: New entry for AMX-MOVRS.
	* i386-dis.c:
	(PREFIX_VEX_0F384A_X86_64_L_0_W_0): New.
	(PREFIX_VEX_MAP5_F8_X86_64_L_0_W_0): Ditto.
	(PREFIX_VEX_MAP5_F9_X86_64_L_0_W_0): Ditto.
	(X86_64_VEX_0F384A): Ditto.
	(X86_64_VEX_MAP5_F8): Ditto.
	(X86_64_VEX_MAP5_F9): Ditto.
	(X86_64_EVEX_0F384A): Ditto.
	(VEX_LEN_0F384A_X86_64_W_0): Ditto.
	(VEX_LEN_MAP5_F8_X86_64): Ditto.
	(VEX_LEN_MAP5_F9_X86_64): Ditto.
	(EVEX_LEN_0F384A_X86_64_W_0): Ditto.
	(VEX_W_0F384A_X86_64): Ditto.
	(VEX_W_MAP5_F8_X86_64): Ditto.
	(VEX_W_MAP5_F9_X86_64): Ditto.
	(EVEX_W_0F384A_X86_64): Ditto.
	(prefix_table): New entry for AMX-MOVRS.
	(x86_64_table): Ditto.
	(vex_len_table): Ditto.
	(vex_w_table): Ditto.
	(map5_f8_opcode): New.
	(map5_f9_opcode): Ditto.
	(get_valid_dis386): Handle VEX_MAP5 opcode for AMX-MOVRS.
	* i386-gen.c (isa_dependencies): Add AMX_MOVRS.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuAMX_MOVRS): New.
	(i386_cpu_flags): Add cpuamx_movrs.
	* i386-opc.tbl: Add AMX-MOVRS instructions.
	* i386-tbl.h: Regenerated.

Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
---
 gas/config/tc-i386.c                          |   16 +-
 gas/doc/c-i386.texi                           |    3 +-
 .../gas/i386/x86-64-amx-movrs-intel.d         |   23 +
 .../gas/i386/x86-64-amx-movrs-inval.l         |   13 +
 .../gas/i386/x86-64-amx-movrs-inval.s         |   19 +
 gas/testsuite/gas/i386/x86-64-amx-movrs.d     |   21 +
 gas/testsuite/gas/i386/x86-64-amx-movrs.s     |   31 +
 .../gas/i386/x86-64-apx-evex-promoted-intel.d |   16 +
 .../gas/i386/x86-64-apx-evex-promoted-wig.d   |   16 +
 .../gas/i386/x86-64-apx-evex-promoted.d       |   16 +
 .../gas/i386/x86-64-apx-evex-promoted.s       |   16 +
 gas/testsuite/gas/i386/x86-64.exp             |    3 +
 opcodes/i386-dis-evex-len.h                   |    5 +
 opcodes/i386-dis-evex-w.h                     |    4 +
 opcodes/i386-dis-evex-x86-64.h                |    5 +
 opcodes/i386-dis-evex.h                       |    6 +-
 opcodes/i386-dis.c                            |   98 +-
 opcodes/i386-gen.c                            |    3 +
 opcodes/i386-init.h                           |  694 +--
 opcodes/i386-mnem.h                           | 4352 +++++++++--------
 opcodes/i386-opc.h                            |    3 +
 opcodes/i386-opc.tbl                          |    3 +
 opcodes/i386-tbl.h                            |  272 +-
 23 files changed, 3017 insertions(+), 2621 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs.s
 create mode 100644 opcodes/i386-dis-evex-x86-64.h
  

Comments

Jan Beulich Jan. 13, 2025, 4:29 p.m. UTC | #1
On 13.01.2025 07:48, Haochen Jiang wrote:
> From: "Hu, Lin1" <lin1.hu@intel.com>
> 
> Hi all,
> 
> This is the v4 patch for AMX-MOVRS, with patch descrption and changes
> embedded below.
> 
> Ok for trunk?

Yes, and again thanks. I've taken note to see about moving the workaround
to i386-gen subsequently.

Jan
  

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index af1b9988dd8..3452824246e 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1185,6 +1185,7 @@  static const arch_entry cpu_arch[] =
   SUBARCH (amx_transpose, AMX_TRANSPOSE, ANY_AMX_TRANSPOSE, false),
   SUBARCH (amx_tf32, AMX_TF32, ANY_AMX_TF32, false),
   SUBARCH (amx_fp8, AMX_FP8, ANY_AMX_FP8, false),
+  SUBARCH (amx_movrs, AMX_MOVRS, ANY_AMX_MOVRS, false),
   SUBARCH (amx_tile, AMX_TILE, ANY_AMX_TILE, false),
   SUBARCH (movdiri, MOVDIRI, MOVDIRI, false),
   SUBARCH (movdir64b, MOVDIR64B, MOVDIR64B, false),
@@ -2250,12 +2251,22 @@  cpu_flags_match (const insn_template *t)
 	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
 	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
 	      || any.bitfield.cpucmpccxadd || any.bitfield.cpuuser_msr
-	      || any.bitfield.cpumsr_imm || any.bitfield.cpuamx_transpose))
+	      || any.bitfield.cpumsr_imm || any.bitfield.cpuamx_transpose
+	      || any.bitfield.cpuamx_movrs))
 	{
 	  /* These checks (verifying that APX_F() was properly used in the
 	     opcode table entry) make sure there's no need for an "else" to
 	     the "if()" below.  */
 	  gas_assert (!cpu_flags_all_zero (&all));
+
+	  /* For APX_F extension of multiple cpuid enabled insns, we could not
+	     use APX_F(cpuid_A&cpuid_B) since the transformation could not be
+	     done. Instead, we will use cpuid_A & APX_F(cpuid_B), then the
+	     "any" bitfield would not be set for cpuid_A. Set cpuid_A for "any"
+	     here since it is its original meaning.  */
+	  if (all.bitfield.cpuamx_transpose && any.bitfield.cpuamx_movrs)
+	    any.bitfield.cpuamx_transpose = 1;
+
 	  cpu = cpu_flags_and (all, any);
 	  gas_assert (cpu_flags_equal (&cpu, &all));
 
@@ -4055,7 +4066,8 @@  install_template (const insn_template *t)
 	   || maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512DQ)
 	   || maybe_cpu (t, CpuAVX512BW) || maybe_cpu (t, CpuBMI)
 	   || maybe_cpu (t, CpuBMI2) || maybe_cpu (t, CpuUSER_MSR)
-	   || maybe_cpu (t, CpuMSR_IMM) || maybe_cpu (t, CpuAMX_TRANSPOSE))
+	   || maybe_cpu (t, CpuMSR_IMM) || maybe_cpu (t, CpuAMX_TRANSPOSE)
+	   || maybe_cpu (t, CpuAMX_MOVRS))
 	  && maybe_cpu (t, CpuAPX_F))
 	{
 	  if (need_evex_encoding (t))
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 20e21405053..6970d370474 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -231,6 +231,7 @@  accept various extension mnemonics.  For example,
 @code{amx_transpose},
 @code{amx_tf32},
 @code{amx_fp8}
+@code{amx_movrs},
 @code{amx_tile},
 @code{vmx},
 @code{vmfunc},
@@ -1705,7 +1706,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
 @item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_fp16}
 @item @samp{.amx_complex} @tab @samp{.amx_transpose} @tab @samp{.amx_tf32}
-@item @samp{.amx_fp8} @tab @samp{.amx_tile}
+@item @samp{.amx_fp8} @tab @samp{.amx_movrs} @tab @samp{.amx_tile}
 @item @samp{.kl} @tab @samp{.widekl} @tab @samp{.uintr} @tab @samp{.hreset}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d b/gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d
new file mode 100644
index 00000000000..f4cd0bd0911
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d
@@ -0,0 +1,23 @@ 
+#objdump: -dw -Mintel
+#name: x86_64 AMX-MOVRS insns (Intel disassembly)
+#source: x86-64-amx-movrs.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+#...
+[a-f0-9]+ <_intel>:
+\s*[a-f0-9]+:\s*c4 a5 78 f8 b4 f5 00 00 00 10\s+t2rpntlvwz0rs tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c5 78 f8 14 21\s+t2rpntlvwz0rs tmm2,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a5 78 f9 b4 f5 00 00 00 10\s+t2rpntlvwz0rst1 tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c5 78 f9 14 21\s+t2rpntlvwz0rst1 tmm2,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a5 79 f8 b4 f5 00 00 00 10\s+t2rpntlvwz1rs tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c5 79 f8 14 21\s+t2rpntlvwz1rs tmm2,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a5 79 f9 b4 f5 00 00 00 10\s+t2rpntlvwz1rst1 tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c5 79 f9 14 21\s+t2rpntlvwz1rst1 tmm2,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a2 7b 4a b4 f5 00 00 00 10\s+tileloaddrs tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 7b 4a 1c 21\s+tileloaddrs tmm3,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a2 79 4a b4 f5 00 00 00 10\s+tileloaddrst1 tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 79 4a 1c 21\s+tileloaddrst1 tmm3,\[r9\+riz\*1\]
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l b/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l
new file mode 100644
index 00000000000..aa49c0d3533
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l
@@ -0,0 +1,13 @@ 
+.* Assembler messages:
+.*:5: Error: `\(%rip\)' cannot be used here
+.*:6: Error: `\(%rip\)' cannot be used here
+.*:7: Error: `\(%rip\)' cannot be used here
+.*:8: Error: `\(%rip\)' cannot be used here
+.*:9: Warning: operand 2 `%tmm1' implicitly denotes `%tmm0' to `%tmm1' group in `t2rpntlvwz0rs'
+.*:10: Warning: operand 2 `%tmm3' implicitly denotes `%tmm2' to `%tmm3' group in `t2rpntlvwz0rst1'
+.*:11: Warning: operand 2 `%tmm5' implicitly denotes `%tmm4' to `%tmm5' group in `t2rpntlvwz1rs'
+.*:12: Warning: operand 2 `%tmm7' implicitly denotes `%tmm6' to `%tmm7' group in `t2rpntlvwz1rst1'
+.*:16: Error: `t2rpntlvwz0rs' is not supported on `x86_64.noamx_transpose'
+.*:17: Error: `t2rpntlvwz0rst1' is not supported on `x86_64.noamx_transpose'
+.*:18: Error: `t2rpntlvwz1rs' is not supported on `x86_64.noamx_transpose'
+.*:19: Error: `t2rpntlvwz1rst1' is not supported on `x86_64.noamx_transpose'
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s b/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s
new file mode 100644
index 00000000000..98b54f38ece
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s
@@ -0,0 +1,19 @@ 
+# Check Invalid 64bit AMX-MOVRS instructions
+
+	.text
+_start:
+	t2rpntlvwz0rs	(%rip), %tmm2
+	t2rpntlvwz0rst1	(%rip), %tmm2
+	t2rpntlvwz1rs	(%rip), %tmm2
+	t2rpntlvwz1rst1	(%rip), %tmm2
+	t2rpntlvwz0rs	(%r9), %tmm1
+	t2rpntlvwz0rst1	(%r9), %tmm3
+	t2rpntlvwz1rs	(%r9), %tmm5
+	t2rpntlvwz1rst1	(%r9), %tmm7
+
+	.arch .noamx_transpose
+_transpose:
+	t2rpntlvwz0rs	(%r9), %tmm2
+	t2rpntlvwz0rst1	(%r9), %tmm2
+	t2rpntlvwz1rs	(%r9), %tmm2
+	t2rpntlvwz1rst1	(%r9), %tmm2
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs.d b/gas/testsuite/gas/i386/x86-64-amx-movrs.d
new file mode 100644
index 00000000000..b0bc77e8f15
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs.d
@@ -0,0 +1,21 @@ 
+#objdump: -dw
+#name: x86_64 AMX-MOVRS insns
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 a5 78 f8 b4 f5 00 00 00 10\s+t2rpntlvwz0rs 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c5 78 f8 14 21\s+t2rpntlvwz0rs \(%r9,%riz,1\),%tmm2
+\s*[a-f0-9]+:\s*c4 a5 78 f9 b4 f5 00 00 00 10\s+t2rpntlvwz0rst1 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c5 78 f9 14 21\s+t2rpntlvwz0rst1 \(%r9,%riz,1\),%tmm2
+\s*[a-f0-9]+:\s*c4 a5 79 f8 b4 f5 00 00 00 10\s+t2rpntlvwz1rs 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c5 79 f8 14 21\s+t2rpntlvwz1rs \(%r9,%riz,1\),%tmm2
+\s*[a-f0-9]+:\s*c4 a5 79 f9 b4 f5 00 00 00 10\s+t2rpntlvwz1rst1 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c5 79 f9 14 21\s+t2rpntlvwz1rst1 \(%r9,%riz,1\),%tmm2
+\s*[a-f0-9]+:\s*c4 a2 7b 4a b4 f5 00 00 00 10\s+tileloaddrs 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c2 7b 4a 1c 21\s+tileloaddrs \(%r9,%riz,1\),%tmm3
+\s*[a-f0-9]+:\s*c4 a2 79 4a b4 f5 00 00 00 10\s+tileloaddrst1 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c2 79 4a 1c 21\s+tileloaddrst1 \(%r9,%riz,1\),%tmm3
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs.s b/gas/testsuite/gas/i386/x86-64-amx-movrs.s
new file mode 100644
index 00000000000..07b6aba2a57
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs.s
@@ -0,0 +1,31 @@ 
+# Check 64bit AMX-MOVRS instructions
+
+	.text
+_start:
+	t2rpntlvwz0rs	0x10000000(%rbp, %r14, 8), %tmm6
+	t2rpntlvwz0rs	(%r9), %tmm2
+	t2rpntlvwz0rst1	0x10000000(%rbp, %r14, 8), %tmm6
+	t2rpntlvwz0rst1	(%r9), %tmm2
+	t2rpntlvwz1rs	0x10000000(%rbp, %r14, 8), %tmm6
+	t2rpntlvwz1rs	(%r9), %tmm2
+	t2rpntlvwz1rst1	0x10000000(%rbp, %r14, 8), %tmm6
+	t2rpntlvwz1rst1	(%r9), %tmm2
+	tileloaddrs	0x10000000(%rbp, %r14, 8), %tmm6
+	tileloaddrs	(%r9), %tmm3
+	tileloaddrst1	0x10000000(%rbp, %r14, 8), %tmm6
+	tileloaddrst1	(%r9), %tmm3
+
+_intel:
+	.intel_syntax noprefix
+	t2rpntlvwz0rs	tmm6, [rbp+r14*8+0x10000000]
+	t2rpntlvwz0rs	tmm2, [r9]
+	t2rpntlvwz0rst1	tmm6, [rbp+r14*8+0x10000000]
+	t2rpntlvwz0rst1	tmm2, [r9]
+	t2rpntlvwz1rs	tmm6, [rbp+r14*8+0x10000000]
+	t2rpntlvwz1rs	tmm2, [r9]
+	t2rpntlvwz1rst1	tmm6, [rbp+r14*8+0x10000000]
+	t2rpntlvwz1rst1	tmm2, [r9]
+	tileloaddrs	tmm6, [rbp+r14*8+0x10000000]
+	tileloaddrs	tmm3, [r9]
+	tileloaddrst1	tmm6, [rbp+r14*8+0x10000000]
+	tileloaddrst1	tmm3, [r9]
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
index d45167e318c..1cf4f7ee2eb 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
@@ -136,10 +136,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0 tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1 tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[	 ]+tmm6,\[rbp\+r31\*8\+0x10000000\]
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+tmm3,\[r16\+riz\*1\]
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+tmm6,\[r31\+r14\*8\+0x10000000\]
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+tmm3,\[r16\+riz\*1\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
@@ -271,10 +279,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0 tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1 tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[	 ]+tmm6,\[rbp\+r31\*8\+0x10000000\]
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+tmm3,\[r16\+riz\*1\]
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+tmm6,\[r31\+r14\*8\+0x10000000\]
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+tmm3,\[r16\+riz\*1\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
index fa6d66b5dda..cebbe9d77f0 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
@@ -136,10 +136,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[  ]+0x10000000\(%rbp,%r31,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+\(%r16,%riz,1\),%tmm3
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+0x10000000\(%r31,%r14,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+\(%r16,%riz,1\),%tmm3
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
@@ -271,10 +279,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[  ]+0x10000000\(%rbp,%r31,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+\(%r16,%riz,1\),%tmm3
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+0x10000000\(%r31,%r14,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+\(%r16,%riz,1\),%tmm3
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
index ba684c1b0c6..ac16bbad39e 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
@@ -136,10 +136,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[	 ]+0x10000000\(%rbp,%r31,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+\(%r16,%riz,1\),%tmm3
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+0x10000000\(%r31,%r14,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+\(%r16,%riz,1\),%tmm3
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
@@ -271,10 +279,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[	 ]+0x10000000\(%rbp,%r31,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+\(%r16,%riz,1\),%tmm3
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+0x10000000\(%r31,%r14,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+\(%r16,%riz,1\),%tmm3
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
index 3fc9832a84b..a16490fa96b 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
@@ -130,10 +130,18 @@  _start:
 	shrx	%r31,0x123(%r31,%rax,4),%r15
 	sttilecfg	0x123(%r31,%rax,4)
 	t2rpntlvwz0	0x123(%r31,%rax,8),%tmm6
+	t2rpntlvwz0rs	0x123(%r31,%rax,8),%tmm6
+	t2rpntlvwz0rst1	0x123(%r31,%rax,8),%tmm6
 	t2rpntlvwz0t1	0x123(%r31,%rax,8),%tmm6
 	t2rpntlvwz1	0x123(%r31,%rax,8),%tmm6
+	t2rpntlvwz1rs	0x123(%r31,%rax,8),%tmm6
+	t2rpntlvwz1rst1	0x123(%r31,%rax,8),%tmm6
 	t2rpntlvwz1t1	0x123(%r31,%rax,8),%tmm6
 	tileloadd	0x123(%r31,%rax,4),%tmm6
+	tileloaddrs     0x10000000(%rbp, %r31, 8), %tmm6
+	tileloaddrs     (%r16), %tmm3
+	tileloaddrst1   0x10000000(%r31, %r14, 8), %tmm6
+	tileloaddrst1   (%r16), %tmm3
 	tileloaddt1	0x123(%r31,%rax,4),%tmm6
 	tilestored	%tmm6,0x123(%r31,%rax,4)
 	wrssd	%r25d,0x123(%r31,%rax,4)
@@ -267,10 +275,18 @@  _start:
 	shrx	r15,QWORD PTR [r31+rax*4+0x123],r31
 	sttilecfg	[r31+rax*4+0x123]
 	t2rpntlvwz0	tmm6,[r31+rax*8+0x123]
+	t2rpntlvwz0rs	tmm6,[r31+rax*8+0x123]
+	t2rpntlvwz0rst1	tmm6,[r31+rax*8+0x123]
 	t2rpntlvwz0t1	tmm6,[r31+rax*8+0x123]
 	t2rpntlvwz1	tmm6,[r31+rax*8+0x123]
+	t2rpntlvwz1rs	tmm6,[r31+rax*8+0x123]
+	t2rpntlvwz1rst1	tmm6,[r31+rax*8+0x123]
 	t2rpntlvwz1t1	tmm6,[r31+rax*8+0x123]
 	tileloadd	tmm6,[r31+rax*4+0x123]
+	tileloaddrs	tmm6, [rbp+r31*8+0x10000000]
+	tileloaddrs	tmm3, [r16]
+	tileloaddrst1	tmm6, [r31+r14*8+0x10000000]
+	tileloaddrst1	tmm3, [r16]
 	tileloaddt1	tmm6,[r31+rax*4+0x123]
 	tilestored	[r31+rax*4+0x123],tmm6
 	wrssd	DWORD PTR [r31+rax*4+0x123],r25d
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index fe503671312..c99488935f5 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -535,6 +535,9 @@  run_dump_test "x86-64-amx-fp8"
 run_dump_test "x86-64-amx-fp8-intel"
 run_list_test "x86-64-amx-fp8-inval"
 run_dump_test "x86-64-amx-fp8-bad"
+run_dump_test "x86-64-amx-movrs"
+run_dump_test "x86-64-amx-movrs-intel"
+run_list_test "x86-64-amx-movrs-inval"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index e931fdd655a..2b4361f7ae6 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -44,6 +44,11 @@  static const struct dis386 evex_len_table[][3] = {
     { "vperm%DQ",	{ XM, Vex, EXx }, PREFIX_DATA },
   },
 
+  /* EVEX_LEN_0F384A_X86_64_W_0 */
+  {
+    { X86_64_EVEX_PFX_TABLE (PREFIX_VEX_0F384A_X86_64_W_0_L_0) },
+  },
+
   /* EVEX_LEN_0F385A */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-w.h b/opcodes/i386-dis-evex-w.h
index 1bb716c0ba7..8c73320d633 100644
--- a/opcodes/i386-dis-evex-w.h
+++ b/opcodes/i386-dis-evex-w.h
@@ -346,6 +346,10 @@ 
   {
     { "vpbroadcastmw2dY",	{ XM, MaskR }, 0 },
   },
+  /* EVEX_W_0F384A_X86_64 */
+  {
+    { EVEX_LEN_TABLE (EVEX_LEN_0F384A_X86_64_W_0) },
+  },
   /* EVEX_W_0F3859 */
   {
     { "vbroadcasti32x2",	{ XM, EXq }, PREFIX_DATA },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
new file mode 100644
index 00000000000..9b0d7132268
--- /dev/null
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -0,0 +1,5 @@ 
+  /* X86_64_EVEX_0F384A */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F384A_X86_64) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 006d0c4a990..fdb4e07f2e1 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -376,7 +376,7 @@  static const struct dis386 evex_table[][256] = {
     /* 48 */
     { Bad_Opcode },
     { X86_64_EVEX_MEM_W_TABLE (VEX_W_0F3849_X86_64_L_0) },
-    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_EVEX_0F384A) },
     { X86_64_EVEX_MEM_W_TABLE (VEX_W_0F384B_X86_64_L_0) },
     { "vrcp14p%XW",	{ XM, EXx }, PREFIX_DATA },
     { "vrcp14s%XW",	{ XMScalar, VexScalar, EXdq }, PREFIX_DATA },
@@ -1445,8 +1445,8 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_MAP5_F8) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_MAP5_F9) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index f9b6cf30fd0..4d3dae74d65 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1134,6 +1134,7 @@  enum
   PREFIX_VEX_0F3848_X86_64_L_0_W_0,
   PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_0,
   PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_1,
+  PREFIX_VEX_0F384A_X86_64_W_0_L_0,
   PREFIX_VEX_0F384B_X86_64_L_0_W_0,
   PREFIX_VEX_0F3850_W_0,
   PREFIX_VEX_0F3851_W_0,
@@ -1159,6 +1160,8 @@  enum
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
   PREFIX_VEX_0F3AF0_L_0,
+  PREFIX_VEX_MAP5_F8_X86_64_L_0_W_0,
+  PREFIX_VEX_MAP5_F9_X86_64_L_0_W_0,
   PREFIX_VEX_MAP5_FD_X86_64_L_0_W_0,
   PREFIX_VEX_MAP7_F6_L_0_W_0_R_0_X86_64,
   PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64,
@@ -1351,6 +1354,7 @@  enum
 
   X86_64_VEX_0F3848,
   X86_64_VEX_0F3849,
+  X86_64_VEX_0F384A,
   X86_64_VEX_0F384B,
   X86_64_VEX_0F385C,
   X86_64_VEX_0F385E,
@@ -1361,9 +1365,13 @@  enum
   X86_64_VEX_0F386F,
   X86_64_VEX_0F38Ex,
 
+  X86_64_VEX_MAP5_F8,
+  X86_64_VEX_MAP5_F9,
   X86_64_VEX_MAP5_FD,
   X86_64_VEX_MAP7_F6_L_0_W_0_R_0,
   X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
+
+  X86_64_EVEX_0F384A,
 };
 
 enum
@@ -1436,6 +1444,7 @@  enum
   VEX_LEN_0F3841,
   VEX_LEN_0F3848_X86_64,
   VEX_LEN_0F3849_X86_64,
+  VEX_LEN_0F384A_X86_64_W_0,
   VEX_LEN_0F384B_X86_64,
   VEX_LEN_0F385A,
   VEX_LEN_0F385C_X86_64,
@@ -1483,6 +1492,8 @@  enum
   VEX_LEN_0F3ADE_W_0,
   VEX_LEN_0F3ADF,
   VEX_LEN_0F3AF0,
+  VEX_LEN_MAP5_F8_X86_64,
+  VEX_LEN_MAP5_F9_X86_64,
   VEX_LEN_MAP5_FD_X86_64,
   VEX_LEN_MAP7_F6,
   VEX_LEN_MAP7_F8,
@@ -1555,6 +1566,7 @@  enum
   EVEX_LEN_0F381A,
   EVEX_LEN_0F381B,
   EVEX_LEN_0F3836,
+  EVEX_LEN_0F384A_X86_64_W_0,
   EVEX_LEN_0F385A,
   EVEX_LEN_0F385B,
   EVEX_LEN_0F38C6,
@@ -1609,6 +1621,7 @@  enum
   VEX_W_0F3846,
   VEX_W_0F3848_X86_64_L_0,
   VEX_W_0F3849_X86_64_L_0,
+  VEX_W_0F384A_X86_64,
   VEX_W_0F384B_X86_64_L_0,
   VEX_W_0F3850,
   VEX_W_0F3851,
@@ -1656,6 +1669,8 @@  enum
   VEX_W_0F3ACE,
   VEX_W_0F3ACF,
   VEX_W_0F3ADE,
+  VEX_W_MAP5_F8_X86_64_L_0,
+  VEX_W_MAP5_F9_X86_64_L_0,
   VEX_W_MAP5_FD_X86_64_L_0,
   VEX_W_MAP7_F6_L_0,
   VEX_W_MAP7_F8_L_0,
@@ -1783,6 +1798,7 @@  enum
   EVEX_W_0F3835_P_2,
   EVEX_W_0F3837,
   EVEX_W_0F383A_P_1,
+  EVEX_W_0F384A_X86_64,
   EVEX_W_0F3859,
   EVEX_W_0F385A_L_n,
   EVEX_W_0F385B_L_2,
@@ -4100,6 +4116,14 @@  static const struct dis386 prefix_table[][4] = {
     { RM_TABLE (RM_VEX_0F3849_X86_64_L_0_W_0_M_1_P_3) },
   },
 
+  /* PREFIX_VEX_0F384A_X86_64_W_0_L_0 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "tileloaddrst1",	{ TMM, MVexSIBMEM }, 0 },
+    { "tileloaddrs",	{ TMM, MVexSIBMEM }, 0 },
+  },
+
   /* PREFIX_VEX_0F384B_X86_64_L_0_W_0 */
   {
     { Bad_Opcode },
@@ -4284,6 +4308,20 @@  static const struct dis386 prefix_table[][4] = {
     { "%XErorxS",		{ Gdq, Edq, Ib }, 0 },
   },
 
+  /* PREFIX_VEX_MAP5_F8_X86_64_L_0_W_0 */
+  {
+    { "t2rpntlvwz0rs",	{ TMM, MVexSIBMEM }, 0 },
+    { Bad_Opcode },
+    { "t2rpntlvwz1rs",	{ TMM, MVexSIBMEM }, 0 },
+  },
+
+  /* PREFIX_VEX_MAP5_F9_X86_64_L_0_W_0 */
+  {
+    { "t2rpntlvwz0rst1",	{ TMM, MVexSIBMEM }, 0 },
+    { Bad_Opcode },
+    { "t2rpntlvwz1rst1",	{ TMM, MVexSIBMEM }, 0 },
+  },
+
   /* PREFIX_VEX_MAP5_FD_X86_64_L_0_W_0 */
   {
     { "tdpbf8ps",	{ TMM, Rtmm, VexTmm }, 0 },
@@ -4640,6 +4678,12 @@  static const struct dis386 x86_64_table[][2] = {
     { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
   },
 
+  /* X86_64_VEX_0F384A */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F384A_X86_64) },
+  },
+
   /* X86_64_VEX_0F384B */
   {
     { Bad_Opcode },
@@ -4694,6 +4738,18 @@  static const struct dis386 x86_64_table[][2] = {
     { "%XEcmp%CCxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
   },
 
+  /* X86_64_VEX_MAP5_F8 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_MAP5_F8_X86_64) },
+  },
+
+  /* X86_64_VEX_MAP5_F9 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_MAP5_F9_X86_64) },
+  },
+
   /* X86_64_VEX_MAP5_FD */
   {
     { Bad_Opcode },
@@ -4711,6 +4767,8 @@  static const struct dis386 x86_64_table[][2] = {
     { Bad_Opcode },
     { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
   },
+
+#include "i386-dis-evex-x86-64.h"
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -6553,7 +6611,7 @@  static const struct dis386 vex_table[][256] = {
     /* 48 */
     { X86_64_TABLE (X86_64_VEX_0F3848) },
     { X86_64_TABLE (X86_64_VEX_0F3849) },
-    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_VEX_0F384A) },
     { X86_64_TABLE (X86_64_VEX_0F384B) },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -7241,6 +7299,11 @@  static const struct dis386 vex_len_table[][2] = {
     { VEX_W_TABLE (VEX_W_0F3849_X86_64_L_0) },
   },
 
+  /* VEX_LEN_0F384A_X86_64_W_0 */
+  {
+    { PREFIX_TABLE (PREFIX_VEX_0F384A_X86_64_W_0_L_0) },
+  },
+
   /* VEX_LEN_0F384B_X86_64 */
   {
     { VEX_W_TABLE (VEX_W_0F384B_X86_64_L_0) },
@@ -7488,6 +7551,16 @@  static const struct dis386 vex_len_table[][2] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },
   },
 
+  /* VEX_LEN_MAP5_F8_X86_64 */
+  {
+    { VEX_W_TABLE (VEX_W_MAP5_F8_X86_64_L_0) },
+  },
+
+  /* VEX_LEN_MAP5_F9_X86_64 */
+  {
+    { VEX_W_TABLE (VEX_W_MAP5_F9_X86_64_L_0) },
+  },
+
   /* VEX_LEN_MAP5_FD_X86_64 */
   {
     { VEX_W_TABLE (VEX_W_MAP5_FD_X86_64_L_0) },
@@ -7935,6 +8008,10 @@  static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3849_X86_64_L_0 */
     { MOD_TABLE (MOD_VEX_0F3849_X86_64_L_0_W_0) },
   },
+  {
+    /* VEX_W_0F384A_X86_64 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384A_X86_64_W_0) },
+  },
   {
     /* VEX_W_0F384B_X86_64_L_0 */
     { PREFIX_TABLE (PREFIX_VEX_0F384B_X86_64_L_0_W_0) },
@@ -8129,6 +8206,14 @@  static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3ADE */
     { VEX_LEN_TABLE (VEX_LEN_0F3ADE_W_0) },
   },
+  {
+    /* VEX_W_MAP5_F8_X86_64 */
+    { PREFIX_TABLE (PREFIX_VEX_MAP5_F8_X86_64_L_0_W_0) },
+  },
+  {
+    /* VEX_W_MAP5_F9_X86_64 */
+    { PREFIX_TABLE (PREFIX_VEX_MAP5_F9_X86_64_L_0_W_0) },
+  },
   {
     /* VEX_W_MAP5_FD_X86_64 */
     { PREFIX_TABLE (PREFIX_VEX_MAP5_FD_X86_64_L_0_W_0) },
@@ -8890,6 +8975,8 @@  static const struct dis386 bad_opcode = { "(bad)", { XX }, 0 };
 /* Fetch error indicator.  */
 static const struct dis386 err_opcode = { NULL, { XX }, 0 };
 
+static const struct dis386 map5_f8_opcode = { X86_64_TABLE (X86_64_VEX_MAP5_F8) };
+static const struct dis386 map5_f9_opcode = { X86_64_TABLE (X86_64_VEX_MAP5_F9) };
 static const struct dis386 map5_fd_opcode = { X86_64_TABLE (X86_64_VEX_MAP5_FD) };
 static const struct dis386 map7_f6_opcode = { VEX_LEN_TABLE (VEX_LEN_MAP7_F6) };
 static const struct dis386 map7_f8_opcode = { VEX_LEN_TABLE (VEX_LEN_MAP7_F8) };
@@ -9211,7 +9298,14 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       else if (vindex == 0xf6)
 	dp = &map7_f6_opcode;
       else if (vindex == 0xf8)
-	dp = &map7_f8_opcode;
+	{
+	  if (vex_table_index == VEX_MAP5)
+	    dp = &map5_f8_opcode;
+	  else
+	    dp = &map7_f8_opcode;
+	}
+      else if (vindex == 0xf9)
+	dp = &map5_f9_opcode;
       else if (vindex == 0xfd)
 	dp = &map5_fd_opcode;
       else
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 60aa8ddbad7..e91894b2483 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -271,6 +271,8 @@  static const dependency isa_dependencies[] =
     "AMX_TILE" },
   { "AMX_FP8",
     "AMX_TILE" },
+  { "AMX_MOVRS",
+    "AMX_TILE" },
   { "KL",
     "SSE2" },
   { "WIDEKL",
@@ -443,6 +445,7 @@  static bitfield cpu_flags[] =
   BITFIELD (AMX_TRANSPOSE),
   BITFIELD (AMX_TF32),
   BITFIELD (AMX_FP8),
+  BITFIELD (AMX_MOVRS),
   BITFIELD (AMX_TILE),
   BITFIELD (MOVDIRI),
   BITFIELD (MOVDIR64B),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 5d330550205..ea3cd2b570e 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -256,6 +256,8 @@  enum i386_cpu
   CpuAMX_TF32,
   /* AMX-FP8 instructions required */
   CpuAMX_FP8,
+  /* AMX-MOVRS Instructions support required.  */
+  CpuAMX_MOVRS,
   /* AMX-TILE instructions required */
   CpuAMX_TILE,
   /* GFNI instructions required */
@@ -509,6 +511,7 @@  typedef union i386_cpu_flags
       unsigned int cpuamx_complex:1;
       unsigned int cpuamx_tf32:1;
       unsigned int cpuamx_fp8:1;
+      unsigned int cpuamx_movrs:1;
       unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 32dd883926c..10966b187ae 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3207,6 +3207,8 @@  tdpbsud, 0xf35e, AMX_INT8, Modrm|Vex128|Space0F38|Src2VVVV|VexW0|NoSuf, { RegTMM
 
 tileloadd, 0xf24b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
 tileloaddt1, 0x664b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddrs, 0xf24a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddrst1, 0x664a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
 tilestored, 0xf34b, APX_F(AMX_TILE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
 
 tilerelease, 0x49c0, AMX_TILE, Vex128|Space0F38|VexW0|NoSuf, {}
@@ -3218,6 +3220,7 @@  tilezero, 0xf249, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
 <loc:opc, $t:0x0, t1:0x1>
 
 t2rpntlvw<z><loc>, 0x<z:pfx>6e | <loc:opc>, APX_F(AMX_TRANSPOSE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf|ImplicitGroup, { Unspecified|BaseIndex, RegTMM }
+t2rpntlvw<z>rs<loc>, 0x<z:pfx>f8 | <loc:opc>, AMX_TRANSPOSE&APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Map5|VexW0|NoSuf|ImplicitGroup, { Unspecified|BaseIndex, RegTMM }
 
 <z>
 <loc>