[v1] x86: Support ZHAOXIN GMI instructions

Message ID f65b8a66-98cb-4531-b42e-3f07321460df@zhaoxin.com
State New
Headers
Series [v1] x86: Support ZHAOXIN GMI instructions |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_binutils_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_binutils_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_binutils_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_binutils_check--master-arm success Test passed

Commit Message

Mayshao-oc Oct. 14, 2024, 2:03 a.m. UTC
  Hi all:

        This patch adds support for Zhaoxin GMI instructions.
        The documentation is attached for your reference.
        Test ok, ok for trunk?

BR
Mayshao
  

Comments

Haochen Jiang Oct. 14, 2024, 3:09 a.m. UTC | #1
> From: mayshao-oc <mayshao-oc@zhaoxin.com>
> Sent: Monday, October 14, 2024 10:03 AM
> To: binutils@sourceware.org
> 
>         This patch adds support for Zhaoxin GMI instructions.
>         The documentation is attached for your reference.
>         Test ok, ok for trunk?

The disassembler part looks extremely weird to me.

I suppose the adding for this is not correct:

@@ -1315,7 +1320,8 @@ enum
 enum
 {
   THREE_BYTE_0F38 = 0,
-  THREE_BYTE_0F3A
+  THREE_BYTE_0F3A,
+  THREE_BYTE_0FA6
 };

My understanding comes following, correct me if I am wrong.

Let's take inst sm2 for example. The full encoding for sm2 is
0xf20fa6c0. In the encoding, a6 is opcode and c0 is modrm byte.
This means a6 should be the "index", not a6c0 in your this part
of patch:

@@ -1050,6 +1051,10 @@ enum
   PREFIX_0F7D,
   PREFIX_0F7E,
   PREFIX_0F7F,
+  PREFIX_0FA6C0,
+  PREFIX_0FA6C8,
+  PREFIX_0FA6D0,
+  PREFIX_0FA6E8,

You can translate modrm byte c0 to 11 000 000. Therefore, the REG
is always 0. That is why montmul occupies the first row of REG_0FA6
entry, since the opcode for it is 0xf30fa6c0.

The actual change should be something like adding prefix table pass
for montmul/sm2, since one of them is f2, the other is f3.

Thx,
Haochen

> 
> BR
> Mayshao
  
Mayshao-oc Oct. 14, 2024, 3:28 a.m. UTC | #2
Hi Haochen:
           Thanks for your review, I comment below.
BR
mayshao

On 10/14/24 11:09, Jiang, Haochen wrote:
> 
> 
> 
>> From: mayshao-oc <mayshao-oc@zhaoxin.com>
>> Sent: Monday, October 14, 2024 10:03 AM
>> To: binutils@sourceware.org
>>
>>          This patch adds support for Zhaoxin GMI instructions.
>>          The documentation is attached for your reference.
>>          Test ok, ok for trunk?
> 
> The disassembler part looks extremely weird to me.
> 
> I suppose the adding for this is not correct:
> 
> @@ -1315,7 +1320,8 @@ enum
>   enum
>   {
>     THREE_BYTE_0F38 = 0,
> -  THREE_BYTE_0F3A
> +  THREE_BYTE_0F3A,
> +  THREE_BYTE_0FA6
>   };
> 
> My understanding comes following, correct me if I am wrong.
> 
> Let's take inst sm2 for example. The full encoding for sm2 is
> 0xf20fa6c0. In the encoding, a6 is opcode and c0 is modrm byte.
> This means a6 should be the "index", not a6c0 in your this part
> of patch:
All GMI instructions are three byte opcode, so for the inst sm2, c0 is 
not modrm byte.
> 
> @@ -1050,6 +1051,10 @@ enum
>     PREFIX_0F7D,
>     PREFIX_0F7E,
>     PREFIX_0F7F,
> +  PREFIX_0FA6C0,
> +  PREFIX_0FA6C8,
> +  PREFIX_0FA6D0,
> +  PREFIX_0FA6E8,
> 
> You can translate modrm byte c0 to 11 000 000. Therefore, the REG
> is always 0. That is why montmul occupies the first row of REG_0FA6
> entry, since the opcode for it is 0xf30fa6c0.
> 
> The actual change should be something like adding prefix table pass
> for montmul/sm2, since one of them is f2, the other is f3.
> 
> Thx,
> Haochen
> 
>>
>> BR
>> Mayshao
  
Haochen Jiang Oct. 14, 2024, 3:37 a.m. UTC | #3
> From: mayshao-oc <mayshao-oc@zhaoxin.com>
> Sent: Monday, October 14, 2024 11:28 AM
>
> On 10/14/24 11:09, Jiang, Haochen wrote:
> >
> >
> >
> >> From: mayshao-oc <mayshao-oc@zhaoxin.com>
> >> Sent: Monday, October 14, 2024 10:03 AM
> >> To: binutils@sourceware.org
> >>
> >>          This patch adds support for Zhaoxin GMI instructions.
> >>          The documentation is attached for your reference.
> >>          Test ok, ok for trunk?
> >
> > The disassembler part looks extremely weird to me.
> >
> > I suppose the adding for this is not correct:
> >
> > @@ -1315,7 +1320,8 @@ enum
> >   enum
> >   {
> >     THREE_BYTE_0F38 = 0,
> > -  THREE_BYTE_0F3A
> > +  THREE_BYTE_0F3A,
> > +  THREE_BYTE_0FA6
> >   };
> >
> > My understanding comes following, correct me if I am wrong.
> >
> > Let's take inst sm2 for example. The full encoding for sm2 is
> > 0xf20fa6c0. In the encoding, a6 is opcode and c0 is modrm byte.
> > This means a6 should be the "index", not a6c0 in your this part of
> > patch:
> All GMI instructions are three byte opcode, so for the inst sm2, c0 is not
> modrm byte.

Maybe I did not make it clear here, the full opcode is f20fa6, where f2 is
mandatory prefix and 0f is the escape opcode byte. These three bytes
consist of the full opcode.

This will make c0 as modrm byte.

Thx,
Haochen

> >
> > @@ -1050,6 +1051,10 @@ enum
> >     PREFIX_0F7D,
> >     PREFIX_0F7E,
> >     PREFIX_0F7F,
> > +  PREFIX_0FA6C0,
> > +  PREFIX_0FA6C8,
> > +  PREFIX_0FA6D0,
> > +  PREFIX_0FA6E8,
> >
> > You can translate modrm byte c0 to 11 000 000. Therefore, the REG is
> > always 0. That is why montmul occupies the first row of REG_0FA6
> > entry, since the opcode for it is 0xf30fa6c0.
> >
> > The actual change should be something like adding prefix table pass
> > for montmul/sm2, since one of them is f2, the other is f3.
> >
> > Thx,
> > Haochen
> >
> >>
> >> BR
> >> Mayshao
  
Jan Beulich Oct. 14, 2024, 6:13 a.m. UTC | #4
On 14.10.2024 05:37, Jiang, Haochen wrote:
>> From: mayshao-oc <mayshao-oc@zhaoxin.com>
>> Sent: Monday, October 14, 2024 11:28 AM
>>
>> On 10/14/24 11:09, Jiang, Haochen wrote:
>>>
>>>
>>>
>>>> From: mayshao-oc <mayshao-oc@zhaoxin.com>
>>>> Sent: Monday, October 14, 2024 10:03 AM
>>>> To: binutils@sourceware.org
>>>>
>>>>          This patch adds support for Zhaoxin GMI instructions.
>>>>          The documentation is attached for your reference.
>>>>          Test ok, ok for trunk?
>>>
>>> The disassembler part looks extremely weird to me.
>>>
>>> I suppose the adding for this is not correct:
>>>
>>> @@ -1315,7 +1320,8 @@ enum
>>>   enum
>>>   {
>>>     THREE_BYTE_0F38 = 0,
>>> -  THREE_BYTE_0F3A
>>> +  THREE_BYTE_0F3A,
>>> +  THREE_BYTE_0FA6
>>>   };
>>>
>>> My understanding comes following, correct me if I am wrong.
>>>
>>> Let's take inst sm2 for example. The full encoding for sm2 is
>>> 0xf20fa6c0. In the encoding, a6 is opcode and c0 is modrm byte.
>>> This means a6 should be the "index", not a6c0 in your this part of
>>> patch:
>> All GMI instructions are three byte opcode, so for the inst sm2, c0 is not
>> modrm byte.
> 
> Maybe I did not make it clear here, the full opcode is f20fa6, where f2 is
> mandatory prefix and 0f is the escape opcode byte. These three bytes
> consist of the full opcode.
> 
> This will make c0 as modrm byte.

I agree - even if the byte isn't used as a ModR/M one, its decoding should
treat it so. That's no different from SFENCE, LFENCE, and MFENCE, to name
just a few. Plus - even the original PadLock insns match that pattern.
Nothing good will come from things being done differently here.

Jan
  
Jan Beulich Oct. 14, 2024, 6:54 a.m. UTC | #5
On 14.10.2024 04:03, mayshao-oc wrote:
>         This patch adds support for Zhaoxin GMI instructions.
>         The documentation is attached for your reference.
>         Test ok, ok for trunk?

In the NEWS entry, please add "x86" one way or another. Not everyone may
know that Zhaoxin is an x86 flavor. Also the entry needs to move up, into
the post-2.43 section.

For the i386-gen change I wonder whether GMI is an entirely separate
extension, or kind of an extension to PadLock. Can you clarify the
relationship please?

In i386-dis.c, if - following earlier comments - you still need some fixup
function(s), please name them suitably. The pre-existing OP_0f07() is
grossly misnamed, and that misnaming shouldn't be further extended.

Jan
  
Haochen Jiang Oct. 14, 2024, 8:37 a.m. UTC | #6
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, October 14, 2024 2:54 PM
> 
> In i386-dis.c, if - following earlier comments - you still need some fixup
> function(s), please name them suitably. The pre-existing OP_0f07() is
> grossly misnamed, and that misnaming shouldn't be further extended.

As Jan said, OP_0f07 is misnamed.

But I suppose OP_Skip_MODRM might be better here if you still believe
they are not modrm bytes, just like mentioned LFENCE and SFENCE example.

I am not sure if they should be changed to this, you could have a try on that.

Thx,
Haochen

> 
> Jan
  
Mayshao-oc Oct. 14, 2024, 9:05 a.m. UTC | #7
Hi Haochen:
      Thanks for your comment. You are right, I need to use prefix to 
distinguish  montmul/sm2. For the sm2 instruction, 0c is modrm byte.
      As to the function name, I need to skip repz display for sm3 and 
sm4 instruction, and I want to use OP_Skip_REPZ as function name, do you 
think its make sense.

Best Regards
Mayshao


On 10/14/24 16:37, Jiang, Haochen wrote:
> 
> 
> 
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, October 14, 2024 2:54 PM
>>
>> In i386-dis.c, if - following earlier comments - you still need some fixup
>> function(s), please name them suitably. The pre-existing OP_0f07() is
>> grossly misnamed, and that misnaming shouldn't be further extended.
> 
> As Jan said, OP_0f07 is misnamed.
> 
> But I suppose OP_Skip_MODRM might be better here if you still believe
> they are not modrm bytes, just like mentioned LFENCE and SFENCE example.
> 
> I am not sure if they should be changed to this, you could have a try on that.
> 
> Thx,
> Haochen
> 
>>
>> Jan
  
Jan Beulich Oct. 14, 2024, 10:16 a.m. UTC | #8
On 14.10.2024 11:05, mayshao-oc wrote:
>       Thanks for your comment. You are right, I need to use prefix to 
> distinguish  montmul/sm2. For the sm2 instruction, 0c is modrm byte.
>       As to the function name, I need to skip repz display for sm3 and 
> sm4 instruction, and I want to use OP_Skip_REPZ as function name, do you 
> think its make sense.

If you add a decoding step through prefix_table[], you shouldn't need
such a helper function.

Jan
  
Mayshao-oc Oct. 15, 2024, 1:31 a.m. UTC | #9
Hi Jan:

On 10/14/24 14:54, Jan Beulich wrote:
> 
> 
> 
> On 14.10.2024 04:03, mayshao-oc wrote:
>>          This patch adds support for Zhaoxin GMI instructions.
>>          The documentation is attached for your reference.
>>          Test ok, ok for trunk?
> 
> In the NEWS entry, please add "x86" one way or another. Not everyone may
> know that Zhaoxin is an x86 flavor. Also the entry needs to move up, into
> the post-2.43 section.
> 
Thanks for your comment, I fix it in the GMI v2 patch.
> For the i386-gen change I wonder whether GMI is an entirely separate
> extension, or kind of an extension to PadLock. Can you clarify the
> relationship please?
GMI is an entirely separate extension. GMI includes sm2, sm3, sm4 
algorithms, and padlock includes RNG, AES, SHA, RSA algorithms, so they
are different.
> 
> In i386-dis.c, if - following earlier comments - you still need some fixup
> function(s), please name them suitably. The pre-existing OP_0f07() is
> grossly misnamed, and that misnaming shouldn't be further extendedI will take care of the function name next time, and thanks for your 
comment. From the comments and code, I learned something about 
programming. I name the fixup function MONTMUL_Fixup in the GMI v2 
patch, maybe there is a better idea?

> Jan

BR
Mayshao
  

Patch

From 3d4ed4d9155eba670073dd06633a91df8212e809 Mon Sep 17 00:00:00 2001
From: MayShao-oc <MayShao-oc@zhaoxin.com>
Date: Sat, 12 Oct 2024 10:01:14 +0800
Subject: [PATCH v1] x86: Support ZHAOXIN GMI

Hi all:
      This patch adds support for Zhaoxin GMI instructions.
      Test ok, ok for trunk?
BR
Mayshao

gas/ChangeLog:

	* NEWS: Support ZHAOXIN GMI instructions.
	* config/tc-i386.c: Add gmi.
	* doc/c-i386.texi: Document gmi.
	* testsuite/gas/i386/i386.exp: Add gmi test.
	* testsuite/gas/i386/gmi.d: Ditto.
	* testsuite/gas/i386/gmi.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (OP_0f07_Fixup): New.
	(OP_0f07_Fixup1): Ditto.
	* i386-gen.c: Add gmi.
	* i386-opc.h (CpuGMI): New.
	* i386-opc.tbl: Add Zhaoxin GMI instructions.
	* i386-tbl.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-init.h: Ditto.

---
 gas/NEWS                        |   2 +
 gas/config/tc-i386.c            |   1 +
 gas/doc/c-i386.texi             |   6 +-
 gas/testsuite/gas/i386/gmi.d    |  12 ++
 gas/testsuite/gas/i386/gmi.s    |   8 +
 gas/testsuite/gas/i386/i386.exp |   1 +
 opcodes/i386-dis.c              | 336 +++++++++++++++++++++++++++++++-
 opcodes/i386-gen.c              |   1 +
 opcodes/i386-opc.h              |   3 +
 opcodes/i386-opc.tbl            |   5 +
 10 files changed, 364 insertions(+), 11 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/gmi.d
 create mode 100644 gas/testsuite/gas/i386/gmi.s

diff --git a/gas/NEWS b/gas/NEWS
index d64330143b0..25ca7677567 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -82,6 +82,8 @@  Changes in 2.43:
 
 * Add support for the AArch64 Lookup Table Extension v2 (LUTv2).
 
+* Add support for the Zhaoxin GMI instructions.
+
 Changes in 2.42:
 
 * Add support for AMD znver5 processor.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index cdefde03717..735154ee721 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1219,6 +1219,7 @@  static const arch_entry cpu_arch[] =
   SUBARCH (user_msr, USER_MSR, USER_MSR, false),
   SUBARCH (apx_f, APX_F, APX_F, false),
   VECARCH (avx10.2, AVX10_2, ANY_AVX10_2, set),
+  SUBARCH (gmi, GMI, GMI, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 9667061752d..952ddbc7377 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -274,7 +274,8 @@  accept various extension mnemonics.  For example,
 @code{snp},
 @code{invlpgb},
 @code{tlbsync},
-@code{svme} and
+@code{svme},
+@code{gmi} and
 @code{padlock}.
 Note that these extension mnemonics can be prefixed with @code{no} to revoke
 the respective (and any dependent) functionality.  Note further that the
@@ -1705,7 +1706,8 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
 @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
 @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
-@item @samp{.tlbsync} @tab @samp{.apx_f}
+@item @samp{.tlbsync} @tab @samp{.apx_f} @tab @samp{.gmi}
+
 @end multitable
 
 Apart from the warning, there are only two other effects on
diff --git a/gas/testsuite/gas/i386/gmi.d b/gas/testsuite/gas/i386/gmi.d
new file mode 100644
index 00000000000..063da8e120a
--- /dev/null
+++ b/gas/testsuite/gas/i386/gmi.d
@@ -0,0 +1,12 @@ 
+#objdump: -dw
+#name: zhaoxin gmi
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+000 <foo>:
+   0:[	 ]*f2 0f a6 c0 [	 ]*sm2
+   4:[	 ]*f3 0f a6 e8 [	 ]*sm3
+   8:[	 ]*f3 0f a7 f0 [	 ]*sm4
+#pass
diff --git a/gas/testsuite/gas/i386/gmi.s b/gas/testsuite/gas/i386/gmi.s
new file mode 100644
index 00000000000..c412de55d31
--- /dev/null
+++ b/gas/testsuite/gas/i386/gmi.s
@@ -0,0 +1,8 @@ 
+# ZHAOXIN GMI instructions
+
+	.text
+foo:
+	sm2
+	sm3
+	sm4
+	.p2align 4,0
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 699e2004bef..4aaa2239069 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -114,6 +114,7 @@  if [gas_32_check] then {
     run_dump_test "quoted2"
     run_dump_test "unary"
     run_dump_test "padlock"
+    run_dump_test "gmi"
     run_dump_test "crx"
     run_list_test "cr-err" ""
     run_dump_test "cdr"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 3a4af4d61a3..67eb1d5b41e 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -83,6 +83,8 @@  static bool OP_Rounding (instr_info *, int, int);
 static bool OP_REG_VexI4 (instr_info *, int, int);
 static bool OP_VexI4 (instr_info *, int, int);
 static bool OP_0f07 (instr_info *, int, int);
+static bool OP_0f07_Fixup (instr_info *, int, int);
+static bool OP_0f07_Fixup1 (instr_info *, int, int);
 static bool OP_Monitor (instr_info *, int, int);
 static bool OP_Mwait (instr_info *, int, int);
 
@@ -890,7 +892,6 @@  enum
   REG_0F71,
   REG_0F72,
   REG_0F73,
-  REG_0FA6,
   REG_0FA7,
   REG_0FAE,
   REG_0FBA,
@@ -1050,6 +1051,10 @@  enum
   PREFIX_0F7D,
   PREFIX_0F7E,
   PREFIX_0F7F,
+  PREFIX_0FA6C0,
+  PREFIX_0FA6C8,
+  PREFIX_0FA6D0,
+  PREFIX_0FA6E8,
   PREFIX_0FAE_REG_0_MOD_3,
   PREFIX_0FAE_REG_1_MOD_3,
   PREFIX_0FAE_REG_2_MOD_3,
@@ -1315,7 +1320,8 @@  enum
 enum
 {
   THREE_BYTE_0F38 = 0,
-  THREE_BYTE_0F3A
+  THREE_BYTE_0F3A,
+  THREE_BYTE_0FA6
 };
 
 enum
@@ -2311,7 +2317,7 @@  static const struct dis386 dis386_twobyte[] = {
   { "btS",		{ Ev, Gv }, 0 },
   { "shldS",		{ Ev, Gv, Ib }, 0 },
   { "shldS",		{ Ev, Gv, CL }, 0 },
-  { REG_TABLE (REG_0FA6) },
+  { THREE_BYTE_TABLE (THREE_BYTE_0FA6) },
   { REG_TABLE (REG_0FA7) },
   /* a8 */
   { "pushP",		{ gs }, 0 },
@@ -2848,12 +2854,6 @@  static const struct dis386 reg_table[][8] = {
     { "psllq",		{ Nq, Ib }, PREFIX_OPCODE },
     { "pslldq",		{ Ux, Ib }, PREFIX_DATA },
   },
-  /* REG_0FA6 */
-  {
-    { "montmul",	{ { OP_0f07, 0 } }, 0 },
-    { "xsha1",		{ { OP_0f07, 0 } }, 0 },
-    { "xsha256",	{ { OP_0f07, 0 } }, 0 },
-  },
   /* REG_0FA7 */
   {
     { "xstore-rng",	{ { OP_0f07, 0 } }, 0 },
@@ -2862,6 +2862,7 @@  static const struct dis386 reg_table[][8] = {
     { "xcrypt-ctr",	{ { OP_0f07, 0 } }, 0 },
     { "xcrypt-cfb",	{ { OP_0f07, 0 } }, 0 },
     { "xcrypt-ofb",	{ { OP_0f07, 0 } }, 0 },
+    { "sm4",		{ { OP_0f07_Fixup1, 0 } }, 0 },
   },
   /* REG_0FAE */
   {
@@ -3438,6 +3439,35 @@  static const struct dis386 prefix_table[][4] = {
     { "movdqa",	{ EXxS, XM }, PREFIX_OPCODE },
   },
 
+  /* PREFIX_0FA6C0 */
+  {
+    { Bad_Opcode },
+    { "montmul",	{ { OP_0f07_Fixup, 0 } }, 0 },
+    { Bad_Opcode },
+    { "sm2",	{ { 0, 0 } }, 0 },
+  },
+  /* PREFIX_0FA6C8 */
+  {
+    { Bad_Opcode },
+    { "xsha1",		{ { OP_0f07_Fixup, 0 } }, 0 },
+    { Bad_Opcode },
+    { Bad_Opcode },
+  },
+  /* PREFIX_0FA6D0 */
+  {
+    { Bad_Opcode },
+    { "xsha256",	{ { OP_0f07_Fixup, 0 } }, 0 },
+    { Bad_Opcode },
+    { Bad_Opcode },
+  },
+  /* PREFIX_0FA6E8 */
+  {
+    { Bad_Opcode },
+    { "sm3",	{ { 0, 0 } }, 0 },
+    { Bad_Opcode },
+    { Bad_Opcode },
+  },
+
   /* PREFIX_0FAE_REG_0_MOD_3 */
   {
     { Bad_Opcode },
@@ -5088,6 +5118,272 @@  static const struct dis386 three_byte_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
   },
+  /* THREE_BYTE_0FA6 */
+  {
+    /* 00 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 08 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 10 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 18 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 20 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 28 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 30 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 38 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 40 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 48 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 50 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 58 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 60 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 68 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 70 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 78 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 80 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 88 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 90 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 98 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* a0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* a8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* b0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* b8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* c0 */
+    { PREFIX_TABLE (PREFIX_0FA6C0) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* c8 */
+    { PREFIX_TABLE (PREFIX_0FA6C8) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* d0 */
+    { PREFIX_TABLE (PREFIX_0FA6D0) },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* d8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* e0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* e8 */
+    { PREFIX_TABLE (PREFIX_0FA6E8) },
+  }
 };
 
 static const struct dis386 xop_table[][256] = {
@@ -13086,6 +13382,28 @@  OP_0f07 (instr_info *ins, int bytemode, int sizeflag)
   return OP_E (ins, bytemode, sizeflag);
 }
 
+/* repz is printed before some instructions,eg:montmul,xsha1,xsha256. */
+static bool
+OP_0f07_Fixup (instr_info *ins, int bytemode, int sizeflag)
+{
+  (void) bytemode;
+  (void) sizeflag;
+  /* The 0xf3 prefix should be displayed as "repz" for montmul, xsha1,xsha256.  */
+  if (ins->prefixes & PREFIX_REPZ)
+    ins->all_prefixes[ins->last_repz_prefix] = 0xf3;
+  return true;
+}
+
+/* repz is removed before some instruction, eg: sm4. */
+static bool
+OP_0f07_Fixup1 (instr_info *ins, int bytemode, int sizeflag)
+{
+  /* Remove REPZ prefix. */
+  if (ins->prefixes & PREFIX_REPZ)
+    ins->all_prefixes[ins->last_repz_prefix] = 0;
+  return OP_E (ins, bytemode, sizeflag);
+}
+
 /* NOP is an alias of "xchg %ax,%ax" in 16bit mode, "xchg %eax,%eax" in
    32bit mode and "xchg %rax,%rax" in 64bit mode.  */
 
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 565aae722f8..0f7ab4c6a56 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -334,6 +334,7 @@  static bitfield cpu_flags[] =
   BITFIELD (3dnow),
   BITFIELD (3dnowA),
   BITFIELD (PadLock),
+  BITFIELD (GMI),
   BITFIELD (SVME),
   BITFIELD (VMX),
   BITFIELD (SMX),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index c0d5e44d461..d404fbc6b05 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -66,6 +66,8 @@  enum i386_cpu
   CpuSSE3,
   /* VIA PadLock required */
   CpuPadLock,
+  /* ZHAOXIN GMI required */
+  CpuGMI,
   /* AMD Secure Virtual Machine Ext-s required */
   CpuSVME,
   /* VMX Instructions required */
@@ -400,6 +402,7 @@  typedef union i386_cpu_flags
       unsigned int cpusse2:1;
       unsigned int cpusse3:1;
       unsigned int cpupadlock:1;
+      unsigned int cpugmi:1;
       unsigned int cpusvme:1;
       unsigned int cpuvmx:1;
       unsigned int cpusmx:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 9a14a4d1819..7ded4e64133 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -2131,6 +2131,11 @@  xcryptofb, 0xf30fa7e8, PadLock, NoSuf|RepPrefixOk, {}
 // Alias for xstore-rng.
 xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
 
+// ZHAOXIN GMI instructions
+sm2, 0xf20fa6c0, GMI, NoSuf, {}
+sm3, 0xf30fa6e8, GMI, NoSuf, {}
+sm4, 0xf30fa7f0, GMI, NoSuf, {}
+
 // Multy-precision Add Carry, rdseed instructions.
 <adx:pfx, c:66, o:f3>
 ad<adx>x, 0x<adx:pfx>66, ADX&APX_F, C|Modrm|CheckOperandSize|No_bSuf|No_wSuf|No_sSuf|DstVVVV|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-- 
2.27.0