[v3,1/2] Support Intel AMX-MOVRS

Message ID 20250109060052.1941533-2-haochen.jiang@intel.com
State New
Headers
Series Support Diamond Rapids remaining AMX ISAs |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm fail Patch failed to apply
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 fail Patch failed to apply

Commit Message

Jiang, Haochen Jan. 9, 2025, 6 a.m. UTC
  From: "Hu, Lin1" <lin1.hu@intel.com>

Note: The work around definitely seems not that elegant, but
it is the most straight-forward way for now. If we get something
like that in the future, it is worthy to get it work a cleaner way
to replace that work around.

---

Changes in v3:

  - Order the APX_F extension testcase in ASCII number sort.
  - Adjust 0F384B disassembler table pass same as 0F384A.
  - Combine the APX_F assembler entry with a work around.

---

This patch will support AMX-MOVRS feature. Unlike all the other
AMX insns in vector space where we pass vex_len_table before
vex_w_table, we first pass vex_w_table for tileloaddrs[,t1] to
align with the order in EVEX space. The reason why we first pass
vex_w_table in EVEX space is due to AMX-AVX512, where tcvtrowd2ps
and tilemovrow with r32 shares the same opcode with tileloaddrs[,t1].
All of them have evex.w = 0 but with different evex.length. Re-doing
that shortly is not ideal.

APX_F extension is also implemented in this patch. The encoding will
be:
  - EVEX.128.NP/66.MAP5.W0 F8/F9 !(11):rrr:100 for
    T2RPNTLVW[Z0,Z1]RS[,T1] with NF=0.
  - EVEX.128.F2/66.0F38.W0 4A !(11):rrr:100 FOR TILELOADDRS[,T1] with
    NF=0.

For APX_F extension, since there is only one case in the foreseeable
future for CPUID combination with APX_F extension, we choose to work
around to skip the CPUID assert in assembler.

---

Changes in v2:

  - Add APX_F extension for AMX-MOVRS.
  - Add invalid testcase for sibmem.
  - Templatize the table for T2RPNTLVW[Z0,Z1]RS[,T1].

---

gas/ChangeLog:

	* NEWS: Support Intel AMX-MOVRS.
	* config/tc-i386.c: Add amx_movrs.
	* doc/c-i386.texi: Document .amx_movrs.
	* testsuite/gas/i386/x86-64.exp: Run AMX-MOVRS tests.
	* testsuite/gas/i386/x86-64-amx-movrs-intel.d: New test.
	* testsuite/gas/i386/x86-64-amx-movrs-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-amx-movrs-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-amx-movrs.d: Ditto.
	* testsuite/gas/i386/x86-64-amx-movrs.s: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-len.h (EVEX_LEN_0F384A_X86_64_W_0): New.
	* i386-dis-evex-w.h (EVEX_W_0F384A_X86_64): Ditto.
	* i386-dis-evex-x86-64.h (X86_64_EVEX_0F384A): Ditto.
	* i386-dis-evex.h: New entry for AMX-MOVRS.
	* i386-dis.c:
	(MOD_VEX_MAP5_F8_X86_64): Ditto.
	(MOD_VEX_MAP5_F9_X86_64): Ditto.
	(PREFIX_VEX_0F384A_X86_64_M_0_L_0_W_0): Ditto.
	(PREFIX_VEX_MAP5_F8_X86_64_M_0_L_0_W_0): Ditto.
	(PREFIX_VEX_MAP5_F9_X86_64_M_0_L_0_W_0): Ditto.
	(X86_64_VEX_0F384A): Ditto.
	(X86_64_VEX_MAP5_F8): Ditto.
	(X86_64_VEX_MAP5_F9): Ditto.
	(X86_64_EVEX_0F384A): Ditto.
	(VEX_LEN_0F384A_X86_64_W_0): Ditto.
	(VEX_LEN_MAP5_F8_X86_64_M_0): Ditto.
	(VEX_LEN_MAP5_F9_X86_64_M_0): Ditto.
	(EVEX_LEN_0F384A_X86_64_W_0): Ditto.
	(VEX_W_0F384A_X86_64): Ditto.
	(VEX_W_MAP5_F8_X86_64): Ditto.
	(VEX_W_MAP5_F9_X86_64): Ditto.
	(EVEX_W_0F384A_X86_64): Ditto.
	(prefix_table): New entry for AMX-MOVRS.
	(x86_64_table): Ditto.
	(vex_len_table): Ditto.
	(vex_w_table): Ditto.
	(map5_f8_opcode): New.
	(map5_f9_opcode): Ditto.
	(get_valid_dis386): Handle VEX_MAP5 opcode for AMX-MOVRS.
	* i386-gen.c (isa_dependencies): Add AMX_MOVRS.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuAMX_MOVRS): New.
	(i386_cpu_flags): Add cpuamx_movrs.
	* i386-opc.tbl: Add AMX-MOVRS instructions.
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          |   16 +-
 gas/doc/c-i386.texi                           |    3 +-
 .../gas/i386/x86-64-amx-movrs-intel.d         |   23 +
 .../gas/i386/x86-64-amx-movrs-inval.l         |   13 +
 .../gas/i386/x86-64-amx-movrs-inval.s         |   19 +
 gas/testsuite/gas/i386/x86-64-amx-movrs.d     |   21 +
 gas/testsuite/gas/i386/x86-64-amx-movrs.s     |   31 +
 .../gas/i386/x86-64-apx-evex-promoted-intel.d |   16 +
 .../gas/i386/x86-64-apx-evex-promoted-wig.d   |   16 +
 .../gas/i386/x86-64-apx-evex-promoted.d       |   16 +
 .../gas/i386/x86-64-apx-evex-promoted.s       |   16 +
 gas/testsuite/gas/i386/x86-64.exp             |    3 +
 opcodes/i386-dis-evex-len.h                   |    5 +
 opcodes/i386-dis-evex-w.h                     |    4 +
 opcodes/i386-dis-evex-x86-64.h                |    5 +
 opcodes/i386-dis-evex.h                       |    6 +-
 opcodes/i386-dis.c                            |  108 +-
 opcodes/i386-gen.c                            |    3 +
 opcodes/i386-init.h                           |  742 +--
 opcodes/i386-mnem.h                           | 4350 +++++++++--------
 opcodes/i386-opc.h                            |    3 +
 opcodes/i386-opc.tbl                          |   11 +-
 opcodes/i386-tbl.h                            |  279 +-
 23 files changed, 3058 insertions(+), 2651 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs.s
 create mode 100644 opcodes/i386-dis-evex-x86-64.h
  

Comments

Jan Beulich Jan. 9, 2025, 1:05 p.m. UTC | #1
On 09.01.2025 07:00, Haochen Jiang wrote:
> gas/ChangeLog:
> 
> 	* NEWS: Support Intel AMX-MOVRS.

This ...

> 	* config/tc-i386.c: Add amx_movrs.
> 	* doc/c-i386.texi: Document .amx_movrs.
> 	* testsuite/gas/i386/x86-64.exp: Run AMX-MOVRS tests.
> 	* testsuite/gas/i386/x86-64-amx-movrs-intel.d: New test.
> 	* testsuite/gas/i386/x86-64-amx-movrs-inval.l: Ditto.
> 	* testsuite/gas/i386/x86-64-amx-movrs-inval.s: Ditto.
> 	* testsuite/gas/i386/x86-64-amx-movrs.d: Ditto.
> 	* testsuite/gas/i386/x86-64-amx-movrs.s: Ditto.
> 
> opcodes/ChangeLog:
> 
> 	* i386-dis-evex-len.h (EVEX_LEN_0F384A_X86_64_W_0): New.
> 	* i386-dis-evex-w.h (EVEX_W_0F384A_X86_64): Ditto.
> 	* i386-dis-evex-x86-64.h (X86_64_EVEX_0F384A): Ditto.
> 	* i386-dis-evex.h: New entry for AMX-MOVRS.
> 	* i386-dis.c:
> 	(MOD_VEX_MAP5_F8_X86_64): Ditto.
> 	(MOD_VEX_MAP5_F9_X86_64): Ditto.
> 	(PREFIX_VEX_0F384A_X86_64_M_0_L_0_W_0): Ditto.
> 	(PREFIX_VEX_MAP5_F8_X86_64_M_0_L_0_W_0): Ditto.
> 	(PREFIX_VEX_MAP5_F9_X86_64_M_0_L_0_W_0): Ditto.
> 	(X86_64_VEX_0F384A): Ditto.
> 	(X86_64_VEX_MAP5_F8): Ditto.
> 	(X86_64_VEX_MAP5_F9): Ditto.
> 	(X86_64_EVEX_0F384A): Ditto.
> 	(VEX_LEN_0F384A_X86_64_W_0): Ditto.
> 	(VEX_LEN_MAP5_F8_X86_64_M_0): Ditto.
> 	(VEX_LEN_MAP5_F9_X86_64_M_0): Ditto.
> 	(EVEX_LEN_0F384A_X86_64_W_0): Ditto.
> 	(VEX_W_0F384A_X86_64): Ditto.
> 	(VEX_W_MAP5_F8_X86_64): Ditto.
> 	(VEX_W_MAP5_F9_X86_64): Ditto.
> 	(EVEX_W_0F384A_X86_64): Ditto.
> 	(prefix_table): New entry for AMX-MOVRS.
> 	(x86_64_table): Ditto.
> 	(vex_len_table): Ditto.
> 	(vex_w_table): Ditto.
> 	(map5_f8_opcode): New.
> 	(map5_f9_opcode): Ditto.
> 	(get_valid_dis386): Handle VEX_MAP5 opcode for AMX-MOVRS.
> 	* i386-gen.c (isa_dependencies): Add AMX_MOVRS.
> 	(cpu_flags): Ditto.
> 	* i386-init.h: Regenerated.
> 	* i386-mnem.h: Ditto.
> 	* i386-opc.h (CpuAMX_MOVRS): New.
> 	(i386_cpu_flags): Add cpuamx_movrs.
> 	* i386-opc.tbl: Add AMX-MOVRS instructions.
> 	* i386-tbl.h: Regenerated.
> ---
>  gas/config/tc-i386.c                          |   16 +-
>  gas/doc/c-i386.texi                           |    3 +-
>  .../gas/i386/x86-64-amx-movrs-intel.d         |   23 +
>  .../gas/i386/x86-64-amx-movrs-inval.l         |   13 +
>  .../gas/i386/x86-64-amx-movrs-inval.s         |   19 +
>  gas/testsuite/gas/i386/x86-64-amx-movrs.d     |   21 +
>  gas/testsuite/gas/i386/x86-64-amx-movrs.s     |   31 +
>  .../gas/i386/x86-64-apx-evex-promoted-intel.d |   16 +
>  .../gas/i386/x86-64-apx-evex-promoted-wig.d   |   16 +
>  .../gas/i386/x86-64-apx-evex-promoted.d       |   16 +
>  .../gas/i386/x86-64-apx-evex-promoted.s       |   16 +
>  gas/testsuite/gas/i386/x86-64.exp             |    3 +
>  opcodes/i386-dis-evex-len.h                   |    5 +
>  opcodes/i386-dis-evex-w.h                     |    4 +
>  opcodes/i386-dis-evex-x86-64.h                |    5 +
>  opcodes/i386-dis-evex.h                       |    6 +-
>  opcodes/i386-dis.c                            |  108 +-
>  opcodes/i386-gen.c                            |    3 +
>  opcodes/i386-init.h                           |  742 +--
>  opcodes/i386-mnem.h                           | 4350 +++++++++--------
>  opcodes/i386-opc.h                            |    3 +
>  opcodes/i386-opc.tbl                          |   11 +-
>  opcodes/i386-tbl.h                            |  279 +-
>  23 files changed, 3058 insertions(+), 2651 deletions(-)
>  create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l
>  create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s
>  create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs.d
>  create mode 100644 gas/testsuite/gas/i386/x86-64-amx-movrs.s
>  create mode 100644 opcodes/i386-dis-evex-x86-64.h

... must have been lost, which is even more so odd since this - if I'm
not mistaken - not only adds full AMX-MOVRS support, but also completes
AMX-TRANSPOSE.

> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -1185,6 +1185,7 @@ static const arch_entry cpu_arch[] =
>    SUBARCH (amx_transpose, AMX_TRANSPOSE, ANY_AMX_TRANSPOSE, false),
>    SUBARCH (amx_tf32, AMX_TF32, ANY_AMX_TF32, false),
>    SUBARCH (amx_fp8, AMX_FP8, ANY_AMX_FP8, false),
> +  SUBARCH (amx_movrs, AMX_MOVRS, ANY_AMX_MOVRS, false),
>    SUBARCH (amx_tile, AMX_TILE, ANY_AMX_TILE, false),
>    SUBARCH (movdiri, MOVDIRI, MOVDIRI, false),
>    SUBARCH (movdir64b, MOVDIR64B, MOVDIR64B, false),
> @@ -2249,14 +2250,22 @@ cpu_flags_match (const insn_template *t)
>  	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
>  	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
>  	      || any.bitfield.cpucmpccxadd || any.bitfield.cpuuser_msr
> -	      || any.bitfield.cpumsr_imm || any.bitfield.cpuamx_transpose))
> +	      || any.bitfield.cpumsr_imm || any.bitfield.cpuamx_transpose
> +	      || any.bitfield.cpuamx_movrs))
>  	{
>  	  /* These checks (verifying that APX_F() was properly used in the
>  	     opcode table entry) make sure there's no need for an "else" to
>  	     the "if()" below.  */
>  	  gas_assert (!cpu_flags_all_zero (&all));
>  	  cpu = cpu_flags_and (all, any);
> -	  gas_assert (cpu_flags_equal (&cpu, &all));
> +	  /* For APX_F extension of multiple cpuid enabled insns, we could not
> +	     use APX_F(cpuid_A&couid_B) since the transformation could not be
> +	     done. Instead, we will use cpuid_A && APX_F(cpuid_B), then the
> +	     bitfield will not be match for cpu and all. Since there won't
> +	     be much cases, here we work around those CPUIDs and do not
> +	     need this assert check.  */
> +	  if (!(all.bitfield.cpuamx_transpose && any.bitfield.cpuamx_movrs))
> +	    gas_assert (cpu_flags_equal (&cpu, &all));

First: With the extensive comment, this workaround may be okay for now. One
nit though: s/much/many/.

However, slightly better would be if the assertion could remain in effect
even for these insns. Can we massage "all" (overwritten right afterwards
for the EVEX case) and/or "any" (cleared altogether afterwards) such that
the assertion would pass (and the resulting "cpu" still be correct)? Can
we perhaps even arrange for the adjustment to be done in i386-gen.c? (If
the answer is "no" for now, I can live with that. I'd then try to find
time later to clean this up.)

> @@ -4099,6 +4117,14 @@ static const struct dis386 prefix_table[][4] = {
>      { RM_TABLE (RM_VEX_0F3849_X86_64_L_0_W_0_M_1_P_3) },
>    },
>  
> +  /* PREFIX_VEX_0F384A_X86_64_W_0_L_0 */
> +  {
> +    { Bad_Opcode },
> +    { Bad_Opcode },
> +    { "tileloaddrst1",	{ TMM, MVexSIBMEM }, 0 },
> +    { "tileloaddrs",	{ TMM, MVexSIBMEM }, 0 },
> +  },

Seeing how this doesn't need to go through mod_table[], ...

> @@ -4283,6 +4309,20 @@ static const struct dis386 prefix_table[][4] = {
>      { "%XErorxS",		{ Gdq, Edq, Ib }, 0 },
>    },
>  
> +  /* PREFIX_VEX_MAP5_F8_X86_64_M_0_L_0_W_0 */
> +  {
> +    { "t2rpntlvwz0rs",	{ TMM, MVexSIBMEM }, 0 },
> +    { Bad_Opcode },
> +    { "t2rpntlvwz1rs",	{ TMM, MVexSIBMEM }, 0 },
> +  },
> +
> +  /* PREFIX_VEX_MAP5_F9_X86_64_M_0_L_0_W_0 */
> +  {
> +    { "t2rpntlvwz0rst1",	{ TMM, MVexSIBMEM }, 0 },
> +    { Bad_Opcode },
> +    { "t2rpntlvwz1rst1",	{ TMM, MVexSIBMEM }, 0 },
> +  },

... why do these need to? MVexSIBMEM expands to OP_M after all.

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -3216,9 +3216,6 @@ tilezero, 0xf249, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
>  
>  t2rpntlvw<z><loc>, 0x<z:pfx>6e | <loc:opc>, APX_F(AMX_TRANSPOSE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf|ImplicitGroup, { Unspecified|BaseIndex, RegTMM }
>  
> -<z>
> -<loc>
> -

Can these stay where they are and instead ...

> @@ -3238,6 +3235,14 @@ tdpbhf8ps, 0xf2fd, AMX_FP8, Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, Re
>  tdphbf8ps, 0xf3fd, AMX_FP8, Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, RegTMM, RegTMM }
>  tdphf8ps, 0x66fd, AMX_FP8, Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, RegTMM, RegTMM }
>  
> +t2rpntlvw<z>rs<loc>, 0x<z:pfx>f8 | <loc:opc>, AMX_TRANSPOSE&APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Map5|VexW0|NoSuf|ImplicitGroup, { Unspecified|BaseIndex, RegTMM }

... this one move up? It can, after call be grouped with the TRANSPOSE ones as
well as with ...

> +<z>
> +<loc>
> +
> +tileloaddrs, 0xf24a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
> +tileloaddrst1, 0x664a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }

... the MOVRS ones, yet it's closer related to the earlier one (already mnemonmic-
wise).

Jan
  
Jiang, Haochen Jan. 13, 2025, 6:19 a.m. UTC | #2
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, January 9, 2025 9:05 PM
> 
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -1185,6 +1185,7 @@ static const arch_entry cpu_arch[] =
> >    SUBARCH (amx_transpose, AMX_TRANSPOSE, ANY_AMX_TRANSPOSE,
> false),
> >    SUBARCH (amx_tf32, AMX_TF32, ANY_AMX_TF32, false),
> >    SUBARCH (amx_fp8, AMX_FP8, ANY_AMX_FP8, false),
> > +  SUBARCH (amx_movrs, AMX_MOVRS, ANY_AMX_MOVRS, false),
> >    SUBARCH (amx_tile, AMX_TILE, ANY_AMX_TILE, false),
> >    SUBARCH (movdiri, MOVDIRI, MOVDIRI, false),
> >    SUBARCH (movdir64b, MOVDIR64B, MOVDIR64B, false),
> > @@ -2249,14 +2250,22 @@ cpu_flags_match (const insn_template *t)
> >  	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
> >  	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
> >  	      || any.bitfield.cpucmpccxadd || any.bitfield.cpuuser_msr
> > -	      || any.bitfield.cpumsr_imm || any.bitfield.cpuamx_transpose))
> > +	      || any.bitfield.cpumsr_imm || any.bitfield.cpuamx_transpose
> > +	      || any.bitfield.cpuamx_movrs))
> >  	{
> >  	  /* These checks (verifying that APX_F() was properly used in the
> >  	     opcode table entry) make sure there's no need for an "else" to
> >  	     the "if()" below.  */
> >  	  gas_assert (!cpu_flags_all_zero (&all));
> >  	  cpu = cpu_flags_and (all, any);
> > -	  gas_assert (cpu_flags_equal (&cpu, &all));
> > +	  /* For APX_F extension of multiple cpuid enabled insns, we could not
> > +	     use APX_F(cpuid_A&couid_B) since the transformation could not be
> > +	     done. Instead, we will use cpuid_A && APX_F(cpuid_B), then the
> > +	     bitfield will not be match for cpu and all. Since there won't
> > +	     be much cases, here we work around those CPUIDs and do not
> > +	     need this assert check.  */
> > +	  if (!(all.bitfield.cpuamx_transpose && any.bitfield.cpuamx_movrs))
> > +	    gas_assert (cpu_flags_equal (&cpu, &all));
> 
> First: With the extensive comment, this workaround may be okay for now.
> One
> nit though: s/much/many/.
> 
> However, slightly better would be if the assertion could remain in effect
> even for these insns. Can we massage "all" (overwritten right afterwards
> for the EVEX case) and/or "any" (cleared altogether afterwards) such that
> the assertion would pass (and the resulting "cpu" still be correct)? Can
> we perhaps even arrange for the adjustment to be done in i386-gen.c? (If
> the answer is "no" for now, I can live with that. I'd then try to find
> time later to clean this up.)

From my point of view, ideally it should be addressed in i386-gen.c when
setting "any" (i.e., set AMX-TRANSPOSE for any) since it actually should set.
But setting that might be a special handling similar with here. Thus, setting
that here should be ok. I would keep the assert as-is and put
any.bitfield.cpuamx_transpose=1 for that if condition and bring it before
calculating cpu.

> 
> > @@ -4099,6 +4117,14 @@ static const struct dis386 prefix_table[][4] = {
> >      { RM_TABLE (RM_VEX_0F3849_X86_64_L_0_W_0_M_1_P_3) },
> >    },
> >
> > +  /* PREFIX_VEX_0F384A_X86_64_W_0_L_0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { Bad_Opcode },
> > +    { "tileloaddrst1",	{ TMM, MVexSIBMEM }, 0 },
> > +    { "tileloaddrs",	{ TMM, MVexSIBMEM }, 0 },
> > +  },
> 
> Seeing how this doesn't need to go through mod_table[], ...
> 
> > @@ -4283,6 +4309,20 @@ static const struct dis386 prefix_table[][4] = {
> >      { "%XErorxS",		{ Gdq, Edq, Ib }, 0 },
> >    },
> >
> > +  /* PREFIX_VEX_MAP5_F8_X86_64_M_0_L_0_W_0 */
> > +  {
> > +    { "t2rpntlvwz0rs",	{ TMM, MVexSIBMEM }, 0 },
> > +    { Bad_Opcode },
> > +    { "t2rpntlvwz1rs",	{ TMM, MVexSIBMEM }, 0 },
> > +  },
> > +
> > +  /* PREFIX_VEX_MAP5_F9_X86_64_M_0_L_0_W_0 */
> > +  {
> > +    { "t2rpntlvwz0rst1",	{ TMM, MVexSIBMEM }, 0 },
> > +    { Bad_Opcode },
> > +    { "t2rpntlvwz1rst1",	{ TMM, MVexSIBMEM }, 0 },
> > +  },
> 
> ... why do these need to? MVexSIBMEM expands to OP_M after all.

Yes and similar entries in AMX-TRANSPOSE got the same issue, I will
raise another patch to fix that.

Thx,
Haochen
  

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 953b60fe1fb..81597568443 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1185,6 +1185,7 @@  static const arch_entry cpu_arch[] =
   SUBARCH (amx_transpose, AMX_TRANSPOSE, ANY_AMX_TRANSPOSE, false),
   SUBARCH (amx_tf32, AMX_TF32, ANY_AMX_TF32, false),
   SUBARCH (amx_fp8, AMX_FP8, ANY_AMX_FP8, false),
+  SUBARCH (amx_movrs, AMX_MOVRS, ANY_AMX_MOVRS, false),
   SUBARCH (amx_tile, AMX_TILE, ANY_AMX_TILE, false),
   SUBARCH (movdiri, MOVDIRI, MOVDIRI, false),
   SUBARCH (movdir64b, MOVDIR64B, MOVDIR64B, false),
@@ -2249,14 +2250,22 @@  cpu_flags_match (const insn_template *t)
 	      || any.bitfield.cpuavx512f || any.bitfield.cpuavx512bw
 	      || any.bitfield.cpuavx512dq || any.bitfield.cpuamx_tile
 	      || any.bitfield.cpucmpccxadd || any.bitfield.cpuuser_msr
-	      || any.bitfield.cpumsr_imm || any.bitfield.cpuamx_transpose))
+	      || any.bitfield.cpumsr_imm || any.bitfield.cpuamx_transpose
+	      || any.bitfield.cpuamx_movrs))
 	{
 	  /* These checks (verifying that APX_F() was properly used in the
 	     opcode table entry) make sure there's no need for an "else" to
 	     the "if()" below.  */
 	  gas_assert (!cpu_flags_all_zero (&all));
 	  cpu = cpu_flags_and (all, any);
-	  gas_assert (cpu_flags_equal (&cpu, &all));
+	  /* For APX_F extension of multiple cpuid enabled insns, we could not
+	     use APX_F(cpuid_A&couid_B) since the transformation could not be
+	     done. Instead, we will use cpuid_A && APX_F(cpuid_B), then the
+	     bitfield will not be match for cpu and all. Since there won't
+	     be much cases, here we work around those CPUIDs and do not
+	     need this assert check.  */
+	  if (!(all.bitfield.cpuamx_transpose && any.bitfield.cpuamx_movrs))
+	    gas_assert (cpu_flags_equal (&cpu, &all));
 
 	  if (need_evex_encoding (t))
 	    all = any;
@@ -4054,7 +4063,8 @@  install_template (const insn_template *t)
 	   || maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512DQ)
 	   || maybe_cpu (t, CpuAVX512BW) || maybe_cpu (t, CpuBMI)
 	   || maybe_cpu (t, CpuBMI2) || maybe_cpu (t, CpuUSER_MSR)
-	   || maybe_cpu (t, CpuMSR_IMM) || maybe_cpu (t, CpuAMX_TRANSPOSE))
+	   || maybe_cpu (t, CpuMSR_IMM) || maybe_cpu (t, CpuAMX_TRANSPOSE)
+	   || maybe_cpu (t, CpuAMX_MOVRS))
 	  && maybe_cpu (t, CpuAPX_F))
 	{
 	  if (need_evex_encoding (t))
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 799a9b99af4..ddf0393ec95 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -231,6 +231,7 @@  accept various extension mnemonics.  For example,
 @code{amx_transpose},
 @code{amx_tf32},
 @code{amx_fp8}
+@code{amx_movrs},
 @code{amx_tile},
 @code{vmx},
 @code{vmfunc},
@@ -1704,7 +1705,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
 @item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_fp16}
 @item @samp{.amx_complex} @tab @samp{.amx_transpose} @tab @samp{.amx_tf32}
-@item @samp{.amx_fp8} @tab @samp{.amx_tile}
+@item @samp{.amx_fp8} @tab @samp{.amx_movrs} @tab @samp{.amx_tile}
 @item @samp{.kl} @tab @samp{.widekl} @tab @samp{.uintr} @tab @samp{.hreset}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d b/gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d
new file mode 100644
index 00000000000..f4cd0bd0911
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs-intel.d
@@ -0,0 +1,23 @@ 
+#objdump: -dw -Mintel
+#name: x86_64 AMX-MOVRS insns (Intel disassembly)
+#source: x86-64-amx-movrs.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+#...
+[a-f0-9]+ <_intel>:
+\s*[a-f0-9]+:\s*c4 a5 78 f8 b4 f5 00 00 00 10\s+t2rpntlvwz0rs tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c5 78 f8 14 21\s+t2rpntlvwz0rs tmm2,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a5 78 f9 b4 f5 00 00 00 10\s+t2rpntlvwz0rst1 tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c5 78 f9 14 21\s+t2rpntlvwz0rst1 tmm2,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a5 79 f8 b4 f5 00 00 00 10\s+t2rpntlvwz1rs tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c5 79 f8 14 21\s+t2rpntlvwz1rs tmm2,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a5 79 f9 b4 f5 00 00 00 10\s+t2rpntlvwz1rst1 tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c5 79 f9 14 21\s+t2rpntlvwz1rst1 tmm2,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a2 7b 4a b4 f5 00 00 00 10\s+tileloaddrs tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 7b 4a 1c 21\s+tileloaddrs tmm3,\[r9\+riz\*1\]
+\s*[a-f0-9]+:\s*c4 a2 79 4a b4 f5 00 00 00 10\s+tileloaddrst1 tmm6,\[rbp\+r14\*8\+0x10000000\]
+\s*[a-f0-9]+:\s*c4 c2 79 4a 1c 21\s+tileloaddrst1 tmm3,\[r9\+riz\*1\]
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l b/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l
new file mode 100644
index 00000000000..aa49c0d3533
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.l
@@ -0,0 +1,13 @@ 
+.* Assembler messages:
+.*:5: Error: `\(%rip\)' cannot be used here
+.*:6: Error: `\(%rip\)' cannot be used here
+.*:7: Error: `\(%rip\)' cannot be used here
+.*:8: Error: `\(%rip\)' cannot be used here
+.*:9: Warning: operand 2 `%tmm1' implicitly denotes `%tmm0' to `%tmm1' group in `t2rpntlvwz0rs'
+.*:10: Warning: operand 2 `%tmm3' implicitly denotes `%tmm2' to `%tmm3' group in `t2rpntlvwz0rst1'
+.*:11: Warning: operand 2 `%tmm5' implicitly denotes `%tmm4' to `%tmm5' group in `t2rpntlvwz1rs'
+.*:12: Warning: operand 2 `%tmm7' implicitly denotes `%tmm6' to `%tmm7' group in `t2rpntlvwz1rst1'
+.*:16: Error: `t2rpntlvwz0rs' is not supported on `x86_64.noamx_transpose'
+.*:17: Error: `t2rpntlvwz0rst1' is not supported on `x86_64.noamx_transpose'
+.*:18: Error: `t2rpntlvwz1rs' is not supported on `x86_64.noamx_transpose'
+.*:19: Error: `t2rpntlvwz1rst1' is not supported on `x86_64.noamx_transpose'
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s b/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s
new file mode 100644
index 00000000000..98b54f38ece
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs-inval.s
@@ -0,0 +1,19 @@ 
+# Check Invalid 64bit AMX-MOVRS instructions
+
+	.text
+_start:
+	t2rpntlvwz0rs	(%rip), %tmm2
+	t2rpntlvwz0rst1	(%rip), %tmm2
+	t2rpntlvwz1rs	(%rip), %tmm2
+	t2rpntlvwz1rst1	(%rip), %tmm2
+	t2rpntlvwz0rs	(%r9), %tmm1
+	t2rpntlvwz0rst1	(%r9), %tmm3
+	t2rpntlvwz1rs	(%r9), %tmm5
+	t2rpntlvwz1rst1	(%r9), %tmm7
+
+	.arch .noamx_transpose
+_transpose:
+	t2rpntlvwz0rs	(%r9), %tmm2
+	t2rpntlvwz0rst1	(%r9), %tmm2
+	t2rpntlvwz1rs	(%r9), %tmm2
+	t2rpntlvwz1rst1	(%r9), %tmm2
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs.d b/gas/testsuite/gas/i386/x86-64-amx-movrs.d
new file mode 100644
index 00000000000..b0bc77e8f15
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs.d
@@ -0,0 +1,21 @@ 
+#objdump: -dw
+#name: x86_64 AMX-MOVRS insns
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 a5 78 f8 b4 f5 00 00 00 10\s+t2rpntlvwz0rs 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c5 78 f8 14 21\s+t2rpntlvwz0rs \(%r9,%riz,1\),%tmm2
+\s*[a-f0-9]+:\s*c4 a5 78 f9 b4 f5 00 00 00 10\s+t2rpntlvwz0rst1 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c5 78 f9 14 21\s+t2rpntlvwz0rst1 \(%r9,%riz,1\),%tmm2
+\s*[a-f0-9]+:\s*c4 a5 79 f8 b4 f5 00 00 00 10\s+t2rpntlvwz1rs 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c5 79 f8 14 21\s+t2rpntlvwz1rs \(%r9,%riz,1\),%tmm2
+\s*[a-f0-9]+:\s*c4 a5 79 f9 b4 f5 00 00 00 10\s+t2rpntlvwz1rst1 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c5 79 f9 14 21\s+t2rpntlvwz1rst1 \(%r9,%riz,1\),%tmm2
+\s*[a-f0-9]+:\s*c4 a2 7b 4a b4 f5 00 00 00 10\s+tileloaddrs 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c2 7b 4a 1c 21\s+tileloaddrs \(%r9,%riz,1\),%tmm3
+\s*[a-f0-9]+:\s*c4 a2 79 4a b4 f5 00 00 00 10\s+tileloaddrst1 0x10000000\(%rbp,%r14,8\),%tmm6
+\s*[a-f0-9]+:\s*c4 c2 79 4a 1c 21\s+tileloaddrst1 \(%r9,%riz,1\),%tmm3
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-movrs.s b/gas/testsuite/gas/i386/x86-64-amx-movrs.s
new file mode 100644
index 00000000000..07b6aba2a57
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-movrs.s
@@ -0,0 +1,31 @@ 
+# Check 64bit AMX-MOVRS instructions
+
+	.text
+_start:
+	t2rpntlvwz0rs	0x10000000(%rbp, %r14, 8), %tmm6
+	t2rpntlvwz0rs	(%r9), %tmm2
+	t2rpntlvwz0rst1	0x10000000(%rbp, %r14, 8), %tmm6
+	t2rpntlvwz0rst1	(%r9), %tmm2
+	t2rpntlvwz1rs	0x10000000(%rbp, %r14, 8), %tmm6
+	t2rpntlvwz1rs	(%r9), %tmm2
+	t2rpntlvwz1rst1	0x10000000(%rbp, %r14, 8), %tmm6
+	t2rpntlvwz1rst1	(%r9), %tmm2
+	tileloaddrs	0x10000000(%rbp, %r14, 8), %tmm6
+	tileloaddrs	(%r9), %tmm3
+	tileloaddrst1	0x10000000(%rbp, %r14, 8), %tmm6
+	tileloaddrst1	(%r9), %tmm3
+
+_intel:
+	.intel_syntax noprefix
+	t2rpntlvwz0rs	tmm6, [rbp+r14*8+0x10000000]
+	t2rpntlvwz0rs	tmm2, [r9]
+	t2rpntlvwz0rst1	tmm6, [rbp+r14*8+0x10000000]
+	t2rpntlvwz0rst1	tmm2, [r9]
+	t2rpntlvwz1rs	tmm6, [rbp+r14*8+0x10000000]
+	t2rpntlvwz1rs	tmm2, [r9]
+	t2rpntlvwz1rst1	tmm6, [rbp+r14*8+0x10000000]
+	t2rpntlvwz1rst1	tmm2, [r9]
+	tileloaddrs	tmm6, [rbp+r14*8+0x10000000]
+	tileloaddrs	tmm3, [r9]
+	tileloaddrst1	tmm6, [rbp+r14*8+0x10000000]
+	tileloaddrst1	tmm3, [r9]
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
index d45167e318c..1cf4f7ee2eb 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
@@ -136,10 +136,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0 tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1 tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[	 ]+tmm6,\[rbp\+r31\*8\+0x10000000\]
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+tmm3,\[r16\+riz\*1\]
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+tmm6,\[r31\+r14\*8\+0x10000000\]
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+tmm3,\[r16\+riz\*1\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
@@ -271,10 +279,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0 tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1 tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs tmm6,\[r31\+rax\*8\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1 tmm6,\[r31\+rax\*8\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[	 ]+tmm6,\[rbp\+r31\*8\+0x10000000\]
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+tmm3,\[r16\+riz\*1\]
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+tmm6,\[r31\+r14\*8\+0x10000000\]
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+tmm3,\[r16\+riz\*1\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
index fa6d66b5dda..cebbe9d77f0 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d
@@ -136,10 +136,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[  ]+0x10000000\(%rbp,%r31,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+\(%r16,%riz,1\),%tmm3
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+0x10000000\(%r31,%r14,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+\(%r16,%riz,1\),%tmm3
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
@@ -271,10 +279,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[  ]+0x10000000\(%rbp,%r31,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+\(%r16,%riz,1\),%tmm3
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+0x10000000\(%r31,%r14,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+\(%r16,%riz,1\),%tmm3
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
index ba684c1b0c6..ac16bbad39e 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
@@ -136,10 +136,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[	 ]+0x10000000\(%rbp,%r31,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+\(%r16,%riz,1\),%tmm3
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+0x10000000\(%r31,%r14,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+\(%r16,%riz,1\),%tmm3
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
@@ -271,10 +279,18 @@  Disassembly of section \.text:
 [	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz0[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7c 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz0rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz0t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6e b4 c7 23 01 00 00[	 ]+t2rpntlvwz1[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f8 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rs[	 ]+0x123\(%r31,%rax,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 dd 7d 08 f9 b4 c7 23 01 00 00[	 ]+t2rpntlvwz1rst1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 6f b4 c7 23 01 00 00[	 ]+t2rpntlvwz1t1[	 ]+0x123\(%r31,%rax,8\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 b2 7b 08 4a b4 fd 00 00 00 10[	 ]+tileloaddrs[	 ]+0x10000000\(%rbp,%r31,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7f 08 4a 1c 20[	 ]+tileloaddrs[	 ]+\(%r16,%riz,1\),%tmm3
+[	 ]*[a-f0-9]+:[	 ]*62 9a 7d 08 4a b4 f7 00 00 00 10[	 ]+tileloaddrst1[	 ]+0x10000000\(%r31,%r14,8\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 fa 7d 08 4a 1c 20[	 ]+tileloaddrst1[	 ]+\(%r16,%riz,1\),%tmm3
 [	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
 [	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
 [	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
index 3fc9832a84b..a16490fa96b 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
@@ -130,10 +130,18 @@  _start:
 	shrx	%r31,0x123(%r31,%rax,4),%r15
 	sttilecfg	0x123(%r31,%rax,4)
 	t2rpntlvwz0	0x123(%r31,%rax,8),%tmm6
+	t2rpntlvwz0rs	0x123(%r31,%rax,8),%tmm6
+	t2rpntlvwz0rst1	0x123(%r31,%rax,8),%tmm6
 	t2rpntlvwz0t1	0x123(%r31,%rax,8),%tmm6
 	t2rpntlvwz1	0x123(%r31,%rax,8),%tmm6
+	t2rpntlvwz1rs	0x123(%r31,%rax,8),%tmm6
+	t2rpntlvwz1rst1	0x123(%r31,%rax,8),%tmm6
 	t2rpntlvwz1t1	0x123(%r31,%rax,8),%tmm6
 	tileloadd	0x123(%r31,%rax,4),%tmm6
+	tileloaddrs     0x10000000(%rbp, %r31, 8), %tmm6
+	tileloaddrs     (%r16), %tmm3
+	tileloaddrst1   0x10000000(%r31, %r14, 8), %tmm6
+	tileloaddrst1   (%r16), %tmm3
 	tileloaddt1	0x123(%r31,%rax,4),%tmm6
 	tilestored	%tmm6,0x123(%r31,%rax,4)
 	wrssd	%r25d,0x123(%r31,%rax,4)
@@ -267,10 +275,18 @@  _start:
 	shrx	r15,QWORD PTR [r31+rax*4+0x123],r31
 	sttilecfg	[r31+rax*4+0x123]
 	t2rpntlvwz0	tmm6,[r31+rax*8+0x123]
+	t2rpntlvwz0rs	tmm6,[r31+rax*8+0x123]
+	t2rpntlvwz0rst1	tmm6,[r31+rax*8+0x123]
 	t2rpntlvwz0t1	tmm6,[r31+rax*8+0x123]
 	t2rpntlvwz1	tmm6,[r31+rax*8+0x123]
+	t2rpntlvwz1rs	tmm6,[r31+rax*8+0x123]
+	t2rpntlvwz1rst1	tmm6,[r31+rax*8+0x123]
 	t2rpntlvwz1t1	tmm6,[r31+rax*8+0x123]
 	tileloadd	tmm6,[r31+rax*4+0x123]
+	tileloaddrs	tmm6, [rbp+r31*8+0x10000000]
+	tileloaddrs	tmm3, [r16]
+	tileloaddrst1	tmm6, [r31+r14*8+0x10000000]
+	tileloaddrst1	tmm3, [r16]
 	tileloaddt1	tmm6,[r31+rax*4+0x123]
 	tilestored	[r31+rax*4+0x123],tmm6
 	wrssd	DWORD PTR [r31+rax*4+0x123],r25d
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index fe503671312..c99488935f5 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -535,6 +535,9 @@  run_dump_test "x86-64-amx-fp8"
 run_dump_test "x86-64-amx-fp8-intel"
 run_list_test "x86-64-amx-fp8-inval"
 run_dump_test "x86-64-amx-fp8-bad"
+run_dump_test "x86-64-amx-movrs"
+run_dump_test "x86-64-amx-movrs-intel"
+run_list_test "x86-64-amx-movrs-inval"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index e931fdd655a..2b4361f7ae6 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -44,6 +44,11 @@  static const struct dis386 evex_len_table[][3] = {
     { "vperm%DQ",	{ XM, Vex, EXx }, PREFIX_DATA },
   },
 
+  /* EVEX_LEN_0F384A_X86_64_W_0 */
+  {
+    { X86_64_EVEX_PFX_TABLE (PREFIX_VEX_0F384A_X86_64_W_0_L_0) },
+  },
+
   /* EVEX_LEN_0F385A */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-w.h b/opcodes/i386-dis-evex-w.h
index 1bb716c0ba7..8c73320d633 100644
--- a/opcodes/i386-dis-evex-w.h
+++ b/opcodes/i386-dis-evex-w.h
@@ -346,6 +346,10 @@ 
   {
     { "vpbroadcastmw2dY",	{ XM, MaskR }, 0 },
   },
+  /* EVEX_W_0F384A_X86_64 */
+  {
+    { EVEX_LEN_TABLE (EVEX_LEN_0F384A_X86_64_W_0) },
+  },
   /* EVEX_W_0F3859 */
   {
     { "vbroadcasti32x2",	{ XM, EXq }, PREFIX_DATA },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
new file mode 100644
index 00000000000..9b0d7132268
--- /dev/null
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -0,0 +1,5 @@ 
+  /* X86_64_EVEX_0F384A */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F384A_X86_64) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 006d0c4a990..fdb4e07f2e1 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -376,7 +376,7 @@  static const struct dis386 evex_table[][256] = {
     /* 48 */
     { Bad_Opcode },
     { X86_64_EVEX_MEM_W_TABLE (VEX_W_0F3849_X86_64_L_0) },
-    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_EVEX_0F384A) },
     { X86_64_EVEX_MEM_W_TABLE (VEX_W_0F384B_X86_64_L_0) },
     { "vrcp14p%XW",	{ XM, EXx }, PREFIX_DATA },
     { "vrcp14s%XW",	{ XMScalar, VexScalar, EXdq }, PREFIX_DATA },
@@ -1445,8 +1445,8 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_MAP5_F8) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_MAP5_F9) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index e7f9b110b23..adbd93c0489 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -965,6 +965,8 @@  enum
   MOD_VEX_0F3849_X86_64_L_0_W_0,
   MOD_VEX_0F386E_X86_64,
   MOD_VEX_0F386F_X86_64,
+  MOD_VEX_MAP5_F8_X86_64,
+  MOD_VEX_MAP5_F9_X86_64, 
 
   MOD_EVEX_MAP4_60,
   MOD_EVEX_MAP4_61,
@@ -1134,6 +1136,7 @@  enum
   PREFIX_VEX_0F3848_X86_64_L_0_W_0,
   PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_0,
   PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_1,
+  PREFIX_VEX_0F384A_X86_64_W_0_L_0,
   PREFIX_VEX_0F384B_X86_64_L_0_W_0,
   PREFIX_VEX_0F3850_W_0,
   PREFIX_VEX_0F3851_W_0,
@@ -1159,6 +1162,8 @@  enum
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
   PREFIX_VEX_0F3AF0_L_0,
+  PREFIX_VEX_MAP5_F8_X86_64_M_0_L_0_W_0,
+  PREFIX_VEX_MAP5_F9_X86_64_M_0_L_0_W_0,
   PREFIX_VEX_MAP5_FD_X86_64_L_0_W_0,
   PREFIX_VEX_MAP7_F6_L_0_W_0_R_0_X86_64,
   PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64,
@@ -1351,6 +1356,7 @@  enum
 
   X86_64_VEX_0F3848,
   X86_64_VEX_0F3849,
+  X86_64_VEX_0F384A,
   X86_64_VEX_0F384B,
   X86_64_VEX_0F385C,
   X86_64_VEX_0F385E,
@@ -1361,9 +1367,13 @@  enum
   X86_64_VEX_0F386F,
   X86_64_VEX_0F38Ex,
 
+  X86_64_VEX_MAP5_F8,
+  X86_64_VEX_MAP5_F9,
   X86_64_VEX_MAP5_FD,
   X86_64_VEX_MAP7_F6_L_0_W_0_R_0,
   X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
+
+  X86_64_EVEX_0F384A,
 };
 
 enum
@@ -1436,6 +1446,7 @@  enum
   VEX_LEN_0F3841,
   VEX_LEN_0F3848_X86_64,
   VEX_LEN_0F3849_X86_64,
+  VEX_LEN_0F384A_X86_64_W_0,
   VEX_LEN_0F384B_X86_64,
   VEX_LEN_0F385A,
   VEX_LEN_0F385C_X86_64,
@@ -1483,6 +1494,8 @@  enum
   VEX_LEN_0F3ADE_W_0,
   VEX_LEN_0F3ADF,
   VEX_LEN_0F3AF0,
+  VEX_LEN_MAP5_F8_X86_64_M_0,
+  VEX_LEN_MAP5_F9_X86_64_M_0,
   VEX_LEN_MAP5_FD_X86_64,
   VEX_LEN_MAP7_F6,
   VEX_LEN_MAP7_F8,
@@ -1555,6 +1568,7 @@  enum
   EVEX_LEN_0F381A,
   EVEX_LEN_0F381B,
   EVEX_LEN_0F3836,
+  EVEX_LEN_0F384A_X86_64_W_0,
   EVEX_LEN_0F385A,
   EVEX_LEN_0F385B,
   EVEX_LEN_0F38C6,
@@ -1609,6 +1623,7 @@  enum
   VEX_W_0F3846,
   VEX_W_0F3848_X86_64_L_0,
   VEX_W_0F3849_X86_64_L_0,
+  VEX_W_0F384A_X86_64,
   VEX_W_0F384B_X86_64_L_0,
   VEX_W_0F3850,
   VEX_W_0F3851,
@@ -1656,6 +1671,8 @@  enum
   VEX_W_0F3ACE,
   VEX_W_0F3ACF,
   VEX_W_0F3ADE,
+  VEX_W_MAP5_F8_X86_64_M_0_L_0,
+  VEX_W_MAP5_F9_X86_64_M_0_L_0,
   VEX_W_MAP5_FD_X86_64_L_0,
   VEX_W_MAP7_F6_L_0,
   VEX_W_MAP7_F8_L_0,
@@ -1783,6 +1800,7 @@  enum
   EVEX_W_0F3835_P_2,
   EVEX_W_0F3837,
   EVEX_W_0F383A_P_1,
+  EVEX_W_0F384A_X86_64,
   EVEX_W_0F3859,
   EVEX_W_0F385A_L_n,
   EVEX_W_0F385B_L_2,
@@ -4099,6 +4117,14 @@  static const struct dis386 prefix_table[][4] = {
     { RM_TABLE (RM_VEX_0F3849_X86_64_L_0_W_0_M_1_P_3) },
   },
 
+  /* PREFIX_VEX_0F384A_X86_64_W_0_L_0 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "tileloaddrst1",	{ TMM, MVexSIBMEM }, 0 },
+    { "tileloaddrs",	{ TMM, MVexSIBMEM }, 0 },
+  },
+
   /* PREFIX_VEX_0F384B_X86_64_L_0_W_0 */
   {
     { Bad_Opcode },
@@ -4283,6 +4309,20 @@  static const struct dis386 prefix_table[][4] = {
     { "%XErorxS",		{ Gdq, Edq, Ib }, 0 },
   },
 
+  /* PREFIX_VEX_MAP5_F8_X86_64_M_0_L_0_W_0 */
+  {
+    { "t2rpntlvwz0rs",	{ TMM, MVexSIBMEM }, 0 },
+    { Bad_Opcode },
+    { "t2rpntlvwz1rs",	{ TMM, MVexSIBMEM }, 0 },
+  },
+
+  /* PREFIX_VEX_MAP5_F9_X86_64_M_0_L_0_W_0 */
+  {
+    { "t2rpntlvwz0rst1",	{ TMM, MVexSIBMEM }, 0 },
+    { Bad_Opcode },
+    { "t2rpntlvwz1rst1",	{ TMM, MVexSIBMEM }, 0 },
+  },
+
   /* PREFIX_VEX_MAP5_FD_X86_64_L_0_W_0 */
   {
     { "tdpbf8ps",	{ TMM, Rtmm, VexTmm }, 0 },
@@ -4639,6 +4679,12 @@  static const struct dis386 x86_64_table[][2] = {
     { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
   },
 
+  /* X86_64_VEX_0F384A */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F384A_X86_64) },
+  },
+
   /* X86_64_VEX_0F384B */
   {
     { Bad_Opcode },
@@ -4693,6 +4739,18 @@  static const struct dis386 x86_64_table[][2] = {
     { "%XEcmp%CCxadd", { Mdq, Gdq, VexGdq }, PREFIX_DATA },
   },
 
+  /* X86_64_VEX_MAP5_F8 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_VEX_MAP5_F8_X86_64) },
+  },
+
+  /* X86_64_VEX_MAP5_F9 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_VEX_MAP5_F9_X86_64) },
+  },
+
   /* X86_64_VEX_MAP5_FD */
   {
     { Bad_Opcode },
@@ -4710,6 +4768,8 @@  static const struct dis386 x86_64_table[][2] = {
     { Bad_Opcode },
     { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
   },
+
+#include "i386-dis-evex-x86-64.h"
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -6552,7 +6612,7 @@  static const struct dis386 vex_table[][256] = {
     /* 48 */
     { X86_64_TABLE (X86_64_VEX_0F3848) },
     { X86_64_TABLE (X86_64_VEX_0F3849) },
-    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_VEX_0F384A) },
     { X86_64_TABLE (X86_64_VEX_0F384B) },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -7240,6 +7300,11 @@  static const struct dis386 vex_len_table[][2] = {
     { VEX_W_TABLE (VEX_W_0F3849_X86_64_L_0) },
   },
 
+  /* VEX_LEN_0F384A_X86_64_W_0 */
+  {
+    { PREFIX_TABLE (PREFIX_VEX_0F384A_X86_64_W_0_L_0) },
+  },
+
   /* VEX_LEN_0F384B_X86_64 */
   {
     { VEX_W_TABLE (VEX_W_0F384B_X86_64_L_0) },
@@ -7487,6 +7552,16 @@  static const struct dis386 vex_len_table[][2] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3AF0_L_0) },
   },
 
+  /* VEX_LEN_MAP5_F8_X86_64_M_0 */
+  {
+    { VEX_W_TABLE (VEX_W_MAP5_F8_X86_64_M_0_L_0) },
+  },
+
+  /* VEX_LEN_MAP5_F9_X86_64_M_0 */
+  {
+    { VEX_W_TABLE (VEX_W_MAP5_F9_X86_64_M_0_L_0) },
+  },
+
   /* VEX_LEN_MAP5_FD_X86_64 */
   {
     { VEX_W_TABLE (VEX_W_MAP5_FD_X86_64_L_0) },
@@ -7934,6 +8009,10 @@  static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3849_X86_64_L_0 */
     { MOD_TABLE (MOD_VEX_0F3849_X86_64_L_0_W_0) },
   },
+  {
+    /* VEX_W_0F384A_X86_64 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384A_X86_64_W_0) },
+  },
   {
     /* VEX_W_0F384B_X86_64_L_0 */
     { PREFIX_TABLE (PREFIX_VEX_0F384B_X86_64_L_0_W_0) },
@@ -8128,6 +8207,14 @@  static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3ADE */
     { VEX_LEN_TABLE (VEX_LEN_0F3ADE_W_0) },
   },
+  {
+    /* VEX_W_MAP5_F8_X86_64_M_0 */
+    { PREFIX_TABLE (PREFIX_VEX_MAP5_F8_X86_64_M_0_L_0_W_0) },
+  },
+  {
+    /* VEX_W_MAP5_F9_X86_64_M_0 */
+    { PREFIX_TABLE (PREFIX_VEX_MAP5_F9_X86_64_M_0_L_0_W_0) },
+  },
   {
     /* VEX_W_MAP5_FD_X86_64 */
     { PREFIX_TABLE (PREFIX_VEX_MAP5_FD_X86_64_L_0_W_0) },
@@ -8500,6 +8587,14 @@  static const struct dis386 mod_table[][2] = {
     /* MOD_VEX_0F386F_X86_64 */
     { VEX_LEN_TABLE (VEX_LEN_0F386F_X86_64_M_0) },
   },
+  {
+    /* MOD_VEX_MAP5_F8_X86_64 */
+    { VEX_LEN_TABLE (VEX_LEN_MAP5_F8_X86_64_M_0) },
+  },
+  {
+    /* MOD_VEX_MAP5_F9_X86_64 */
+    { VEX_LEN_TABLE (VEX_LEN_MAP5_F9_X86_64_M_0) },
+  },
 
 #include "i386-dis-evex-mod.h"
 };
@@ -8889,6 +8984,8 @@  static const struct dis386 bad_opcode = { "(bad)", { XX }, 0 };
 /* Fetch error indicator.  */
 static const struct dis386 err_opcode = { NULL, { XX }, 0 };
 
+static const struct dis386 map5_f8_opcode = { X86_64_TABLE (X86_64_VEX_MAP5_F8) };
+static const struct dis386 map5_f9_opcode = { X86_64_TABLE (X86_64_VEX_MAP5_F9) };
 static const struct dis386 map5_fd_opcode = { X86_64_TABLE (X86_64_VEX_MAP5_FD) };
 static const struct dis386 map7_f6_opcode = { VEX_LEN_TABLE (VEX_LEN_MAP7_F6) };
 static const struct dis386 map7_f8_opcode = { VEX_LEN_TABLE (VEX_LEN_MAP7_F8) };
@@ -9210,7 +9307,14 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       else if (vindex == 0xf6)
 	dp = &map7_f6_opcode;
       else if (vindex == 0xf8)
-	dp = &map7_f8_opcode;
+	{
+	  if (vex_table_index == VEX_MAP5)
+	    dp = &map5_f8_opcode;
+	  else
+	    dp = &map7_f8_opcode;
+	}
+      else if (vindex == 0xf9)
+	dp = &map5_f9_opcode;
       else if (vindex == 0xfd)
 	dp = &map5_fd_opcode;
       else
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 8252a1b7c2b..d2ecea2ecd5 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -271,6 +271,8 @@  static const dependency isa_dependencies[] =
     "AMX_TILE" },
   { "AMX_FP8",
     "AMX_TILE" },
+  { "AMX_MOVRS",
+    "AMX_TILE" },
   { "KL",
     "SSE2" },
   { "WIDEKL",
@@ -440,6 +442,7 @@  static bitfield cpu_flags[] =
   BITFIELD (AMX_TRANSPOSE),
   BITFIELD (AMX_TF32),
   BITFIELD (AMX_FP8),
+  BITFIELD (AMX_MOVRS),
   BITFIELD (AMX_TILE),
   BITFIELD (MOVDIRI),
   BITFIELD (MOVDIR64B),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 915c60ef879..77dad3159dd 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -254,6 +254,8 @@  enum i386_cpu
   CpuAMX_TF32,
   /* AMX-FP8 instructions required */
   CpuAMX_FP8,
+  /* AMX-MOVRS Instructions support required.  */
+  CpuAMX_MOVRS,
   /* AMX-TILE instructions required */
   CpuAMX_TILE,
   /* GFNI instructions required */
@@ -506,6 +508,7 @@  typedef union i386_cpu_flags
       unsigned int cpuamx_complex:1;
       unsigned int cpuamx_tf32:1;
       unsigned int cpuamx_fp8:1;
+      unsigned int cpuamx_movrs:1;
       unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index e55dde8730b..b28ceb5be19 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3216,9 +3216,6 @@  tilezero, 0xf249, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
 
 t2rpntlvw<z><loc>, 0x<z:pfx>6e | <loc:opc>, APX_F(AMX_TRANSPOSE), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf|ImplicitGroup, { Unspecified|BaseIndex, RegTMM }
 
-<z>
-<loc>
-
 tconjtcmmimfp16ps, 0x6b, AMX_COMPLEX&AMX_TRANSPOSE, Modrm|Vex128|Space0F38|Src2VVVV|VexW0|NoSuf, { RegTMM, RegTMM, RegTMM }
 tconjtfp16, 0x666b, AMX_COMPLEX&AMX_TRANSPOSE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM, RegTMM }
 
@@ -3238,6 +3235,14 @@  tdpbhf8ps, 0xf2fd, AMX_FP8, Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, Re
 tdphbf8ps, 0xf3fd, AMX_FP8, Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, RegTMM, RegTMM }
 tdphf8ps, 0x66fd, AMX_FP8, Modrm|Vex128|Map5|Src2VVVV|VexW0|NoSuf, { RegTMM, RegTMM, RegTMM }
 
+t2rpntlvw<z>rs<loc>, 0x<z:pfx>f8 | <loc:opc>, AMX_TRANSPOSE&APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Map5|VexW0|NoSuf|ImplicitGroup, { Unspecified|BaseIndex, RegTMM }
+
+<z>
+<loc>
+
+tileloaddrs, 0xf24a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddrst1, 0x664a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+
 // AMX instructions end.
 
 // KEYLOCKER instructions.