From patchwork Fri Jan 3 02:50:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 103927 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A7C203858428 for ; Fri, 3 Jan 2025 02:51:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A7C203858428 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=JQ7hgV4s X-Original-To: binutils@sourceware.org Delivered-To: binutils@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by sourceware.org (Postfix) with ESMTPS id 3A41E3858C62 for ; Fri, 3 Jan 2025 02:50:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3A41E3858C62 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3A41E3858C62 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1735872633; cv=none; b=nAbt+SQy82i760Sl1XW1pPRZwQRpKoLsK+Canq26VOXlt4mt+WfaE1DpbnZJ1W0BWN/jeVBEJ2XuySSS36hCGeazi2ms7BbpGdwAVuJYDTUYW0AyKG8AmCMZqg/a/qPCtg0uJE2HHQVL2ZD9bAQMBgmKcJvMfrfUBgVZmRfogAs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1735872633; c=relaxed/simple; bh=c0xuOkF4SKBLg/I0Zo3lRFEJjuk/s50PPMbmlyV8kcs=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=tHAFsQaUUBI7SB3DE+a+ih8WZko8ACfK0qBms9J/V4enK30QAJqZp+F7vHVFe+aOfoyDL9HKfjIZ8yViQuwK14XVv7UmJSVsP47lkto3sOVA7Z1pTFC4DQXpOyBdIxsPvq5B8i8yB4PZD+bmgTtYjiMI/kQkE7IhbFtO/Bk8Y1A= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3A41E3858C62 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1735872633; x=1767408633; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=c0xuOkF4SKBLg/I0Zo3lRFEJjuk/s50PPMbmlyV8kcs=; b=JQ7hgV4sjIoOZnOQCsfA6MsLYgOTNxUYDyO7LbExH2t1OauiXSH13zBS i6eAeE5ioe6KlIKYWVegG64GMBxZ6iDtJsQmWEg4JPiRBnEam8BI6BuO5 pK/0RPA1+tGywBOj+Dd3S9PsPPLYDhpwSjWuBbDyKV2/tzrJ7cE2P6tAh g8QN08l049xLZuRgn0YvzMdvH6VLli3qJj0GhOPHltmxU7ZywqIwpTyOs 5TlnavCqJ4rjsP+klH2L0e2IkQJRkbW9xGvwUhiM+3yTgu9494BRAwlgT XtSxSoy+OFZLWZQuon0bl2HJK92gcMjjd8JOnamrpvg7vy/E7cKTL+4/x g==; X-CSE-ConnectionGUID: bYEZo7QMRauPUuJT048D5Q== X-CSE-MsgGUID: qpx08NhQQwOcPsXu4dI9iA== X-IronPort-AV: E=McAfee;i="6700,10204,11303"; a="58587694" X-IronPort-AV: E=Sophos;i="6.12,286,1728975600"; d="scan'208";a="58587694" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jan 2025 18:50:32 -0800 X-CSE-ConnectionGUID: k64Ta58yQyelJOeRurJ6kw== X-CSE-MsgGUID: Rj9OUzdVTNyPVwWJixjLXg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="105715763" Received: from shliclel4217.sh.intel.com ([10.239.240.127]) by fmviesa003.fm.intel.com with ESMTP; 02 Jan 2025 18:50:30 -0800 From: Haochen Jiang To: binutils@sourceware.org Cc: hjl.tools@gmail.com, jbeulich@suse.com, ludloff@gmail.com Subject: [PATCH v2] Support Intel AMX-AVX512 Date: Fri, 3 Jan 2025 10:50:29 +0800 Message-Id: <20250103025029.1909253-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: binutils-bounces~patchwork=sourceware.org@sourceware.org Hi all, Although there is still opensin AMX-AVX512 encoding, I would like to first send out the v2 patch with the current encodings since in this patch, there are also other parts need to be reviewed. Patch descrption and changes are embedded following. The encoding issue I mentioned previously is on tcvtrowps2[bf16,ph][h,l]. For Reg32 part, it is ok. However, for Imm8 part, under current HW design, it is split to two opcodes. It is not an ideal design. Due to Christmas/ New Year Holiday, the answer for whether it could be changed is still delayed for now. I will update that as soon as I get the answer. Thx, Haochen --- Changes in v2: - Pull out all GPR mode out of vex length switch in OP_VEX to make it more general. - Remove invalid test for 32-bit. - Reuse VexGdq for operands. - Update the mnemonics from tcvtrowps2pbf16[h,l] to tcvtrowps2bf16[h,l] according to ISE056. --- This patch will support AMX-AVX512. In disassmbler, we pull out all GPR mode out of the vex length switch to make it more general. --- gas/ChangeLog: * config/tc-i386.c: Add amx_avx512. * doc/c-i386.texi: Document .amx_avx512. * testsuite/gas/i386/x86-64.exp: Run AMX-AVX512 tests. * testsuite/gas/i386/x86-64-amx-avx512-intel.d: New test. * testsuite/gas/i386/x86-64-amx-avx512.d: Ditto. * testsuite/gas/i386/x86-64-amx-avx512.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-len.h: Add EVEX_LEN_0F384A_X86_64_W_0, EVEX_LEN_0F386D_X86_64_W_0, EVEX_LEN_0F3A07_X86_64_W_0, EVEX_LEN_0F3A77_X86_64_W_0. * i386-dis-evex-prefix.h: Add PREFIX_EVEX_0F384A_W_0_L_2, PREFIX_EVEX_0F386D_W_0_L_2, PREFIX_EVEX_0F3A07_W_0_L_2, PREFIX_EVEX_0F3A77_W_0_L_2. * i386-dis-evex-w.h: Add EVEX_W_0F384A_X86_64, EVEX_W_0F386D_X86_64, EVEX_W_0F3A07_X86_64, EVEX_W_0F3A77_X86_64. * i386-dis-evex-x86-64.h: Add X86_64_EVEX_0F384A, X86_64_EVEX_0F386D, X86_64_EVEX_0F3A07, X86_64_EVEX_0F3A77. * i386-dis-evex.h: Ditto. * i386-dis.c (EVEX_LEN_0F384A_X86_64_W_0): New. (EVEX_LEN_0F386D_X86_64_W_0): Ditto. (EVEX_LEN_0F3A07_X86_64_W_0): Ditto. (EVEX_LEN_0F3A77_X86_64_W_0): Ditto. (MOD_EVEX_0F384A_X86_64_W_0): Ditto. (MOD_EVEX_0F386D_X86_64_W_0): Ditto. (MOD_EVEX_0F3A07_X86_64_W_0): Ditto. (MOD_EVEX_0F3A77_X86_64_W_0): Ditto. (PREFIX_EVEX_0F384A_W_0_L_2): Ditto. (PREFIX_EVEX_0F386D_W_0_L_2): Ditto. (PREFIX_EVEX_0F3A07_W_0_L_2): Ditto. (PREFIX_EVEX_0F3A77_W_0_L_2): Ditto. (EVEX_W_0F384A_X86_64): Ditto. (EVEX_W_0F386D_X86_64): Ditto. (EVEX_W_0F3A07_X86_64): Ditto. (EVEX_W_0F3A77_X86_64): Ditto. (X86_64_EVEX_0F384A): Ditto. (X86_64_EVEX_0F386D): Ditto. (X86_64_EVEX_0F3A07): Ditto. (X86_64_EVEX_0F3A77): Ditto. (OP_VEX): Pull out all GPR mode out of the vector length switch. * i386-gen.c (isa_dependencies): Add AMX-AVX512. (cpu_flags): Ditto. * i386-init.h: Regenerated. * i386-mnem.h: Ditto. * i386-opc.h (CpuAMX_AVX512): New. (i386_cpu_flags): Add cpuamx_avx512. * i386-opc.tbl: Add AMX-AVX512 instructions. * i386-tbl.h: Regenerated. --- gas/config/tc-i386.c | 1 + gas/doc/c-i386.texi | 4 +- .../gas/i386/x86-64-amx-avx512-intel.d | 35 + gas/testsuite/gas/i386/x86-64-amx-avx512.d | 34 + gas/testsuite/gas/i386/x86-64-amx-avx512.s | 55 + gas/testsuite/gas/i386/x86-64.exp | 2 + opcodes/i386-dis-evex-len.h | 23 + opcodes/i386-dis-evex-prefix.h | 27 + opcodes/i386-dis-evex-w.h | 12 + opcodes/i386-dis-evex-x86-64.h | 15 + opcodes/i386-dis-evex.h | 6 +- opcodes/i386-dis.c | 52 +- opcodes/i386-gen.c | 3 + opcodes/i386-init.h | 718 ++--- opcodes/i386-mnem.h | 2576 +++++++++-------- opcodes/i386-opc.h | 3 + opcodes/i386-opc.tbl | 15 + opcodes/i386-tbl.h | 415 ++- 18 files changed, 2210 insertions(+), 1786 deletions(-) create mode 100644 gas/testsuite/gas/i386/x86-64-amx-avx512-intel.d create mode 100644 gas/testsuite/gas/i386/x86-64-amx-avx512.d create mode 100644 gas/testsuite/gas/i386/x86-64-amx-avx512.s diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index fb347ced16d..2d4fe94e4ec 100644 --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -1186,6 +1186,7 @@ static const arch_entry cpu_arch[] = SUBARCH (amx_tf32, AMX_TF32, ANY_AMX_TF32, false), SUBARCH (amx_fp8, AMX_FP8, ANY_AMX_FP8, false), SUBARCH (amx_movrs, AMX_MOVRS, ANY_AMX_MOVRS, false), + SUBARCH (amx_avx512, AMX_AVX512, ANY_AMX_AVX512, false), SUBARCH (amx_tile, AMX_TILE, ANY_AMX_TILE, false), SUBARCH (movdiri, MOVDIRI, MOVDIRI, false), SUBARCH (movdir64b, MOVDIR64B, MOVDIR64B, false), diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi index b4b17b96ab2..bfd15b1ac10 100644 --- a/gas/doc/c-i386.texi +++ b/gas/doc/c-i386.texi @@ -233,6 +233,7 @@ accept various extension mnemonics. For example, @code{amx_tf32}, @code{amx_fp8} @code{amx_movrs}, +@code{amx_avx512}, @code{amx_tile}, @code{vmx}, @code{vmfunc}, @@ -1707,7 +1708,8 @@ supported on the CPU specified. The choices for @var{cpu_type} are: @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk} @item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_fp16} @item @samp{.amx_complex} @tab @samp{.amx_transpose} @tab @samp{.amx_tf32} -@item @samp{.amx_fp8} @tab @samp{.amx_movrs} @tab @samp{.amx_tile} +@item @samp{.amx_fp8} @tab @samp{.amx_movrs} @tab @samp{.amx_avx512} +@item @samp{.amx_tile} @item @samp{.kl} @tab @samp{.widekl} @tab @samp{.uintr} @tab @samp{.hreset} @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5} @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme} diff --git a/gas/testsuite/gas/i386/x86-64-amx-avx512-intel.d b/gas/testsuite/gas/i386/x86-64-amx-avx512-intel.d new file mode 100644 index 00000000000..33e6d01a558 --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-amx-avx512-intel.d @@ -0,0 +1,35 @@ +#objdump: -dw -Mintel +#name: x86_64 AMX-AVX512 insns (Intel disassembly) +#source: x86-64-amx-avx512.s + +.*: +file format .* + +Disassembly of section \.text: + +#... +[a-f0-9]+ <_intel>: +\s*[a-f0-9]+:\s*62 62 6e 48 4a f5\s+tcvtrowd2ps zmm30,tmm5,edx +\s*[a-f0-9]+:\s*62 62 6e 48 4a f2\s+tcvtrowd2ps zmm30,tmm2,edx +\s*[a-f0-9]+:\s*62 63 7e 48 07 f5 7b\s+tcvtrowd2ps zmm30,tmm5,0x7b +\s*[a-f0-9]+:\s*62 63 7e 48 07 f2 7b\s+tcvtrowd2ps zmm30,tmm2,0x7b +\s*[a-f0-9]+:\s*62 62 6f 48 6d f5\s+tcvtrowps2bf16h zmm30,tmm5,edx +\s*[a-f0-9]+:\s*62 62 6f 48 6d f2\s+tcvtrowps2bf16h zmm30,tmm2,edx +\s*[a-f0-9]+:\s*62 63 7f 48 07 f5 7b\s+tcvtrowps2bf16h zmm30,tmm5,0x7b +\s*[a-f0-9]+:\s*62 63 7f 48 07 f2 7b\s+tcvtrowps2bf16h zmm30,tmm2,0x7b +\s*[a-f0-9]+:\s*62 62 6e 48 6d f5\s+tcvtrowps2bf16l zmm30,tmm5,edx +\s*[a-f0-9]+:\s*62 62 6e 48 6d f2\s+tcvtrowps2bf16l zmm30,tmm2,edx +\s*[a-f0-9]+:\s*62 63 7e 48 77 f5 7b\s+tcvtrowps2bf16l zmm30,tmm5,0x7b +\s*[a-f0-9]+:\s*62 63 7e 48 77 f2 7b\s+tcvtrowps2bf16l zmm30,tmm2,0x7b +\s*[a-f0-9]+:\s*62 62 6c 48 6d f5\s+tcvtrowps2phh zmm30,tmm5,edx +\s*[a-f0-9]+:\s*62 62 6c 48 6d f2\s+tcvtrowps2phh zmm30,tmm2,edx +\s*[a-f0-9]+:\s*62 63 7c 48 07 f5 7b\s+tcvtrowps2phh zmm30,tmm5,0x7b +\s*[a-f0-9]+:\s*62 63 7c 48 07 f2 7b\s+tcvtrowps2phh zmm30,tmm2,0x7b +\s*[a-f0-9]+:\s*62 62 6d 48 6d f5\s+tcvtrowps2phl zmm30,tmm5,edx +\s*[a-f0-9]+:\s*62 62 6d 48 6d f2\s+tcvtrowps2phl zmm30,tmm2,edx +\s*[a-f0-9]+:\s*62 63 7f 48 77 f5 7b\s+tcvtrowps2phl zmm30,tmm5,0x7b +\s*[a-f0-9]+:\s*62 63 7f 48 77 f2 7b\s+tcvtrowps2phl zmm30,tmm2,0x7b +\s*[a-f0-9]+:\s*62 62 6d 48 4a f5\s+tilemovrow zmm30,tmm5,edx +\s*[a-f0-9]+:\s*62 62 6d 48 4a f2\s+tilemovrow zmm30,tmm2,edx +\s*[a-f0-9]+:\s*62 63 7d 48 07 f5 7b\s+tilemovrow zmm30,tmm5,0x7b +\s*[a-f0-9]+:\s*62 63 7d 48 07 f2 7b\s+tilemovrow zmm30,tmm2,0x7b +#pass diff --git a/gas/testsuite/gas/i386/x86-64-amx-avx512.d b/gas/testsuite/gas/i386/x86-64-amx-avx512.d new file mode 100644 index 00000000000..d2f8ac6e51e --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-amx-avx512.d @@ -0,0 +1,34 @@ +#objdump: -dw +#name: x86_64 AMX-AVX512 insns +#source: x86-64-amx-avx512.s + +.*: +file format .* + +Disassembly of section \.text: + +0+ <_start>: +\s*[a-f0-9]+:\s*62 62 6e 48 4a f5\s+tcvtrowd2ps %edx,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 62 6e 48 4a f2\s+tcvtrowd2ps %edx,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 63 7e 48 07 f5 7b\s+tcvtrowd2ps \$0x7b,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 63 7e 48 07 f2 7b\s+tcvtrowd2ps \$0x7b,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 62 6f 48 6d f5\s+tcvtrowps2bf16h %edx,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 62 6f 48 6d f2\s+tcvtrowps2bf16h %edx,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 63 7f 48 07 f5 7b\s+tcvtrowps2bf16h \$0x7b,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 63 7f 48 07 f2 7b\s+tcvtrowps2bf16h \$0x7b,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 62 6e 48 6d f5\s+tcvtrowps2bf16l %edx,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 62 6e 48 6d f2\s+tcvtrowps2bf16l %edx,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 63 7e 48 77 f5 7b\s+tcvtrowps2bf16l \$0x7b,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 63 7e 48 77 f2 7b\s+tcvtrowps2bf16l \$0x7b,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 62 6c 48 6d f5\s+tcvtrowps2phh %edx,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 62 6c 48 6d f2\s+tcvtrowps2phh %edx,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 63 7c 48 07 f5 7b\s+tcvtrowps2phh \$0x7b,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 63 7c 48 07 f2 7b\s+tcvtrowps2phh \$0x7b,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 62 6d 48 6d f5\s+tcvtrowps2phl %edx,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 62 6d 48 6d f2\s+tcvtrowps2phl %edx,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 63 7f 48 77 f5 7b\s+tcvtrowps2phl \$0x7b,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 63 7f 48 77 f2 7b\s+tcvtrowps2phl \$0x7b,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 62 6d 48 4a f5\s+tilemovrow %edx,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 62 6d 48 4a f2\s+tilemovrow %edx,%tmm2,%zmm30 +\s*[a-f0-9]+:\s*62 63 7d 48 07 f5 7b\s+tilemovrow \$0x7b,%tmm5,%zmm30 +\s*[a-f0-9]+:\s*62 63 7d 48 07 f2 7b\s+tilemovrow \$0x7b,%tmm2,%zmm30 +#pass diff --git a/gas/testsuite/gas/i386/x86-64-amx-avx512.s b/gas/testsuite/gas/i386/x86-64-amx-avx512.s new file mode 100644 index 00000000000..6df493430a0 --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-amx-avx512.s @@ -0,0 +1,55 @@ +# Check 64bit AMX-AVX512 instructions + + .text +_start: + tcvtrowd2ps %edx, %tmm5, %zmm30 + tcvtrowd2ps %edx, %tmm2, %zmm30 + tcvtrowd2ps $123, %tmm5, %zmm30 + tcvtrowd2ps $123, %tmm2, %zmm30 + tcvtrowps2bf16h %edx, %tmm5, %zmm30 + tcvtrowps2bf16h %edx, %tmm2, %zmm30 + tcvtrowps2bf16h $123, %tmm5, %zmm30 + tcvtrowps2bf16h $123, %tmm2, %zmm30 + tcvtrowps2bf16l %edx, %tmm5, %zmm30 + tcvtrowps2bf16l %edx, %tmm2, %zmm30 + tcvtrowps2bf16l $123, %tmm5, %zmm30 + tcvtrowps2bf16l $123, %tmm2, %zmm30 + tcvtrowps2phh %edx, %tmm5, %zmm30 + tcvtrowps2phh %edx, %tmm2, %zmm30 + tcvtrowps2phh $123, %tmm5, %zmm30 + tcvtrowps2phh $123, %tmm2, %zmm30 + tcvtrowps2phl %edx, %tmm5, %zmm30 + tcvtrowps2phl %edx, %tmm2, %zmm30 + tcvtrowps2phl $123, %tmm5, %zmm30 + tcvtrowps2phl $123, %tmm2, %zmm30 + tilemovrow %edx, %tmm5, %zmm30 + tilemovrow %edx, %tmm2, %zmm30 + tilemovrow $123, %tmm5, %zmm30 + tilemovrow $123, %tmm2, %zmm30 + +_intel: + .intel_syntax noprefix + tcvtrowd2ps zmm30, tmm5, edx + tcvtrowd2ps zmm30, tmm2, edx + tcvtrowd2ps zmm30, tmm5, 123 + tcvtrowd2ps zmm30, tmm2, 123 + tcvtrowps2bf16h zmm30, tmm5, edx + tcvtrowps2bf16h zmm30, tmm2, edx + tcvtrowps2bf16h zmm30, tmm5, 123 + tcvtrowps2bf16h zmm30, tmm2, 123 + tcvtrowps2bf16l zmm30, tmm5, edx + tcvtrowps2bf16l zmm30, tmm2, edx + tcvtrowps2bf16l zmm30, tmm5, 123 + tcvtrowps2bf16l zmm30, tmm2, 123 + tcvtrowps2phh zmm30, tmm5, edx + tcvtrowps2phh zmm30, tmm2, edx + tcvtrowps2phh zmm30, tmm5, 123 + tcvtrowps2phh zmm30, tmm2, 123 + tcvtrowps2phl zmm30, tmm5, edx + tcvtrowps2phl zmm30, tmm2, edx + tcvtrowps2phl zmm30, tmm5, 123 + tcvtrowps2phl zmm30, tmm2, 123 + tilemovrow zmm30, tmm5, edx + tilemovrow zmm30, tmm2, edx + tilemovrow zmm30, tmm5, 123 + tilemovrow zmm30, tmm2, 123 diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp index 44a8d7c8260..edacbaa0f20 100644 --- a/gas/testsuite/gas/i386/x86-64.exp +++ b/gas/testsuite/gas/i386/x86-64.exp @@ -538,6 +538,8 @@ run_dump_test "x86-64-amx-fp8-bad" run_dump_test "x86-64-amx-movrs" run_dump_test "x86-64-amx-movrs-intel" run_list_test "x86-64-amx-movrs-inval" +run_dump_test "x86-64-amx-avx512" +run_dump_test "x86-64-amx-avx512-intel" run_dump_test "x86-64-movrs" run_dump_test "x86-64-movrs-intel" run_dump_test "x86-64-movrs-avx10_2-512" diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h index 2b4361f7ae6..434e051bb63 100644 --- a/opcodes/i386-dis-evex-len.h +++ b/opcodes/i386-dis-evex-len.h @@ -47,6 +47,8 @@ static const struct dis386 evex_len_table[][3] = { /* EVEX_LEN_0F384A_X86_64_W_0 */ { { X86_64_EVEX_PFX_TABLE (PREFIX_VEX_0F384A_X86_64_W_0_L_0) }, + { Bad_Opcode }, + { PREFIX_TABLE (PREFIX_EVEX_0F384A_X86_64_W_0_L_2) }, }, /* EVEX_LEN_0F385A */ @@ -63,6 +65,13 @@ static const struct dis386 evex_len_table[][3] = { { VEX_W_TABLE (EVEX_W_0F385B_L_2) }, }, + /* EVEX_LEN_0F386D_X86_64_W_0_M_1 */ + { + { Bad_Opcode }, + { Bad_Opcode }, + { PREFIX_TABLE (PREFIX_EVEX_0F386D_X86_64_W_0_L_2) }, + }, + /* EVEX_LEN_0F38C6 */ { { Bad_Opcode }, @@ -91,6 +100,13 @@ static const struct dis386 evex_len_table[][3] = { { VEX_W_TABLE (VEX_W_0F3A01_L_1) }, }, + /* EVEX_LEN_0F3A07_X86_64_W_0 */ + { + { Bad_Opcode }, + { Bad_Opcode }, + { PREFIX_TABLE (PREFIX_EVEX_0F3A07_X86_64_W_0_L_2) }, + }, + /* EVEX_LEN_0F3A18 */ { { Bad_Opcode }, @@ -161,6 +177,13 @@ static const struct dis386 evex_len_table[][3] = { { VEX_W_TABLE (EVEX_W_0F3A43_L_n) }, }, + /* EVEX_LEN_0F3A77_X86_64_W_0 */ + { + { Bad_Opcode }, + { Bad_Opcode }, + { PREFIX_TABLE (PREFIX_EVEX_0F3A77_X86_64_W_0_L_2) }, + }, + /* EVEX_LEN_MAP5_6E */ { { PREFIX_TABLE (PREFIX_EVEX_MAP5_6E_L_0) }, diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h index f4c65b3c06d..4683a347b50 100644 --- a/opcodes/i386-dis-evex-prefix.h +++ b/opcodes/i386-dis-evex-prefix.h @@ -243,6 +243,12 @@ { VEX_W_TABLE (EVEX_W_0F383A_P_1) }, { "%XEvpminuw", { XM, Vex, EXx }, 0 }, }, + /* PREFIX_EVEX_0F384A_W_0_L_2 */ + { + { Bad_Opcode }, + { "tcvtrowd2ps", { XM, Rtmm, VexGdq }, 0 }, + { "tilemovrow", { XM, Rtmm, VexGdq }, 0 }, + }, /* PREFIX_EVEX_0F3852 */ { { "vdpphp%XS", { XM, Vex, EXx }, 0 }, @@ -264,6 +270,13 @@ { Bad_Opcode }, { "vp2intersectY%DQ", { MaskG, Vex, EXx, EXxEVexS }, 0 }, }, + /* PREFIX_EVEX_0F386D_W_0_L_2 */ + { + { "tcvtrowps2phh", { XM, Rtmm, VexGdq }, 0 }, + { "tcvtrowps2bf16l", { XM, Rtmm, VexGdq }, 0 }, + { "tcvtrowps2phl", { XM, Rtmm, VexGdq }, 0 }, + { "tcvtrowps2bf16h", { XM, Rtmm, VexGdq }, 0 }, + }, /* PREFIX_EVEX_0F3872 */ { { Bad_Opcode }, @@ -306,6 +319,13 @@ { "%XEvfmsub213s%XW", { XMScalar, VexScalar, EXdq, EXxEVexR }, 0 }, { "v4fnmadds%XS", { XMScalar, VexScalar, Mxmm }, 0 }, }, + /* PREFIX_EVEX_0F3A07_W_0_L_2 */ + { + { "tcvtrowps2phh", { XM, Rtmm, Ib }, 0 }, + { "tcvtrowd2ps", { XM, Rtmm, Ib }, 0 }, + { "tilemovrow", { XM, Rtmm, Ib }, 0 }, + { "tcvtrowps2bf16h", { XM, Rtmm, Ib }, 0 }, + }, /* PREFIX_EVEX_0F3A08 */ { { "vrndscalep%XH", { XM, EXxh, EXxEVexS, Ib }, 0 }, @@ -377,6 +397,13 @@ { Bad_Opcode }, { "vfpclasss%XW", { MaskG, EXdq, Ib }, 0 }, }, + /* PREFIX_EVEX_0F3A77_W_0_L_2 */ + { + { Bad_Opcode }, + { "tcvtrowps2bf16l", { XM, Rtmm, Ib }, 0 }, + { Bad_Opcode }, + { "tcvtrowps2phl", { XM, Rtmm, Ib }, 0 }, + }, /* PREFIX_EVEX_0F3AC2 */ { { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 }, diff --git a/opcodes/i386-dis-evex-w.h b/opcodes/i386-dis-evex-w.h index 4ca3664e1ad..74628d4b138 100644 --- a/opcodes/i386-dis-evex-w.h +++ b/opcodes/i386-dis-evex-w.h @@ -365,6 +365,10 @@ { "vbroadcasti32x8", { XM, Mymm }, PREFIX_DATA }, { "vbroadcasti64x4", { XM, Mymm }, PREFIX_DATA }, }, + /* EVEX_W_0F386D_X86_64 */ + { + { EVEX_LEN_TABLE (EVEX_LEN_0F386D_X86_64_W_0) }, + }, /* EVEX_W_0F3870 */ { { Bad_Opcode }, @@ -388,6 +392,10 @@ { Bad_Opcode }, { "vpmultishiftqb", { XM, Vex, EXx }, PREFIX_DATA }, }, + /* EVEX_W_0F3A07_X86_64 */ + { + { EVEX_LEN_TABLE (EVEX_LEN_0F3A07_X86_64_W_0) }, + }, /* EVEX_W_0F3A18_L_n */ { { "vinsertf32x4", { XM, Vex, EXxmm, Ib }, PREFIX_DATA }, @@ -456,6 +464,10 @@ { Bad_Opcode }, { "vpshrdw", { XM, Vex, EXx, Ib }, 0 }, }, + /* EVEX_W_0F3A77_X86_64 */ + { + { EVEX_LEN_TABLE (EVEX_LEN_0F3A77_X86_64_W_0) }, + }, /* EVEX_W_MAP4_8A */ { { MOD_TABLE (MOD_EVEX_MAP4_8A_W_0) }, diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h index 056a479536f..f7b61bce12d 100644 --- a/opcodes/i386-dis-evex-x86-64.h +++ b/opcodes/i386-dis-evex-x86-64.h @@ -3,6 +3,21 @@ { Bad_Opcode }, { VEX_W_TABLE (EVEX_W_0F384A_X86_64) }, }, + /* X86_64_EVEX_0F386D */ + { + { Bad_Opcode }, + { VEX_W_TABLE (EVEX_W_0F386D_X86_64) }, + }, + /* X86_64_EVEX_0F3A07 */ + { + { Bad_Opcode }, + { VEX_W_TABLE (EVEX_W_0F3A07_X86_64) }, + }, + /* X86_64_EVEX_0F3A77 */ + { + { Bad_Opcode }, + { VEX_W_TABLE (EVEX_W_0F3A77_X86_64) }, + }, /* X86_64_EVEX_MAP5_6F_M_0 */ { { Bad_Opcode }, diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h index 86cef4bd07b..bf15e41163d 100644 --- a/opcodes/i386-dis-evex.h +++ b/opcodes/i386-dis-evex.h @@ -415,7 +415,7 @@ static const struct dis386 evex_table[][256] = { { Bad_Opcode }, { Bad_Opcode }, { Bad_Opcode }, - { Bad_Opcode }, + { X86_64_TABLE (X86_64_EVEX_0F386D) }, { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F386E) }, { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F386F) }, /* 70 */ @@ -591,7 +591,7 @@ static const struct dis386 evex_table[][256] = { { VEX_W_TABLE (VEX_W_0F3A04) }, { "%XEvpermilp%XD", { XM, EXx, Ib }, PREFIX_DATA }, { Bad_Opcode }, - { Bad_Opcode }, + { X86_64_TABLE (X86_64_EVEX_0F3A07) }, /* 08 */ { PREFIX_TABLE (PREFIX_EVEX_0F3A08) }, { "vrndscalep%XD", { XM, EXx, EXxEVexS, Ib }, PREFIX_DATA }, @@ -717,7 +717,7 @@ static const struct dis386 evex_table[][256] = { { Bad_Opcode }, { Bad_Opcode }, { Bad_Opcode }, - { Bad_Opcode }, + { X86_64_TABLE (X86_64_EVEX_0F3A77) }, /* 78 */ { Bad_Opcode }, { Bad_Opcode }, diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c index 6f77f38460a..c22d812bd19 100644 --- a/opcodes/i386-dis.c +++ b/opcodes/i386-dis.c @@ -1214,9 +1214,11 @@ enum PREFIX_EVEX_0F3838, PREFIX_EVEX_0F3839, PREFIX_EVEX_0F383A, + PREFIX_EVEX_0F384A_X86_64_W_0_L_2, PREFIX_EVEX_0F3852, PREFIX_EVEX_0F3853, PREFIX_EVEX_0F3868, + PREFIX_EVEX_0F386D_X86_64_W_0_L_2, PREFIX_EVEX_0F3872, PREFIX_EVEX_0F3874, PREFIX_EVEX_0F389A, @@ -1224,6 +1226,7 @@ enum PREFIX_EVEX_0F38AA, PREFIX_EVEX_0F38AB, + PREFIX_EVEX_0F3A07_X86_64_W_0_L_2, PREFIX_EVEX_0F3A08, PREFIX_EVEX_0F3A0A, PREFIX_EVEX_0F3A26, @@ -1235,6 +1238,7 @@ enum PREFIX_EVEX_0F3A57, PREFIX_EVEX_0F3A66, PREFIX_EVEX_0F3A67, + PREFIX_EVEX_0F3A77_X86_64_W_0_L_2, PREFIX_EVEX_0F3AC2, PREFIX_EVEX_MAP4_4x, @@ -1384,6 +1388,9 @@ enum X86_64_VEX_MAP7_F8_L_0_W_0_R_0, X86_64_EVEX_0F384A, + X86_64_EVEX_0F386D, + X86_64_EVEX_0F3A07, + X86_64_EVEX_0F3A77, X86_64_EVEX_MAP5_6F_M_0, }; @@ -1583,10 +1590,12 @@ enum EVEX_LEN_0F384A_X86_64_W_0, EVEX_LEN_0F385A, EVEX_LEN_0F385B, + EVEX_LEN_0F386D_X86_64_W_0, EVEX_LEN_0F38C6, EVEX_LEN_0F38C7, EVEX_LEN_0F3A00, EVEX_LEN_0F3A01, + EVEX_LEN_0F3A07_X86_64_W_0, EVEX_LEN_0F3A18, EVEX_LEN_0F3A19, EVEX_LEN_0F3A1A, @@ -1597,6 +1606,7 @@ enum EVEX_LEN_0F3A3A, EVEX_LEN_0F3A3B, EVEX_LEN_0F3A43, + EVEX_LEN_0F3A77_X86_64_W_0, EVEX_LEN_MAP5_6E, EVEX_LEN_MAP5_7E, @@ -1816,12 +1826,14 @@ enum EVEX_W_0F3859, EVEX_W_0F385A_L_n, EVEX_W_0F385B_L_2, + EVEX_W_0F386D_X86_64, EVEX_W_0F3870, EVEX_W_0F3872_P_2, EVEX_W_0F387A, EVEX_W_0F387B, EVEX_W_0F3883, + EVEX_W_0F3A07_X86_64, EVEX_W_0F3A18_L_n, EVEX_W_0F3A19_L_n, EVEX_W_0F3A1A_L_2, @@ -1836,6 +1848,7 @@ enum EVEX_W_0F3A43_L_n, EVEX_W_0F3A70, EVEX_W_0F3A72, + EVEX_W_0F3A77_X86_64, EVEX_W_MAP4_8A, EVEX_W_MAP4_8F_R_0, @@ -14070,6 +14083,29 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED) return true; } + switch (bytemode) + { + case v_mode: + case dq_mode: + if (ins->rex & REX_W) + names = att_names64; + else if (bytemode == v_mode + && !(sizeflag & DFLAG)) + names = att_names16; + else + names = att_names32; + oappend_register (ins, names[reg]); + return true; + case b_mode: + names = att_names8rex; + oappend_register (ins, names[reg]); + return true; + case q_mode: + names = att_names64; + oappend_register (ins, names[reg]); + return true; + } + switch (ins->vex.length) { case 128: @@ -14079,22 +14115,6 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED) names = att_names_xmm; ins->evex_used |= EVEX_len_used; break; - case v_mode: - case dq_mode: - if (ins->rex & REX_W) - names = att_names64; - else if (bytemode == v_mode - && !(sizeflag & DFLAG)) - names = att_names16; - else - names = att_names32; - break; - case b_mode: - names = att_names8rex; - break; - case q_mode: - names = att_names64; - break; case mask_bd_mode: case mask_mode: if (reg > 0x7) diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c index bc4d64bced7..0b08eb11e52 100644 --- a/opcodes/i386-gen.c +++ b/opcodes/i386-gen.c @@ -273,6 +273,8 @@ static const dependency isa_dependencies[] = "AMX_TILE" }, { "AMX_MOVRS", "AMX_TILE" }, + { "AMX_AVX512", + "AMX_TILE|AVX10_2" }, { "KL", "SSE2" }, { "WIDEKL", @@ -444,6 +446,7 @@ static bitfield cpu_flags[] = BITFIELD (AMX_TF32), BITFIELD (AMX_FP8), BITFIELD (AMX_MOVRS), + BITFIELD (AMX_AVX512), BITFIELD (AMX_TILE), BITFIELD (MOVDIRI), BITFIELD (MOVDIR64B), diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h index 65bccdbfdb7..9ca9471df39 100644 --- a/opcodes/i386-opc.h +++ b/opcodes/i386-opc.h @@ -258,6 +258,8 @@ enum i386_cpu CpuAMX_FP8, /* AMX-MOVRS Instructions support required. */ CpuAMX_MOVRS, + /* AMX-AVX512 Instructions support required. */ + CpuAMX_AVX512, /* AMX-TILE instructions required */ CpuAMX_TILE, /* GFNI instructions required */ @@ -512,6 +514,7 @@ typedef union i386_cpu_flags unsigned int cpuamx_tf32:1; unsigned int cpuamx_fp8:1; unsigned int cpuamx_movrs:1; + unsigned int cpuamx_avx512:1; unsigned int cpuamx_tile:1; unsigned int cpugfni:1; unsigned int cpuvaes:1; diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl index 31359328de6..c7510b342b4 100644 --- a/opcodes/i386-opc.tbl +++ b/opcodes/i386-opc.tbl @@ -3243,6 +3243,21 @@ t2rpntlvwrs, 0xf8 | , AMX_TRANSPOSE&APX_F(AMX_MOVRS), Si tileloaddrs, 0xf24a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM } tileloaddrst1, 0x664a, APX_F(AMX_MOVRS), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM } +tcvtrowd2ps, 0xf34a, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM } +tcvtrowd2ps, 0xf307, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM } + +tcvtrowps2bf16h, 0xf26d, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM } +tcvtrowps2bf16h, 0xf207, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM } +tcvtrowps2bf16l, 0xf36d, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM } +tcvtrowps2bf16l, 0xf377, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM } +tcvtrowps2phh, 0x6d, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM } +tcvtrowps2phh, 0x07, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM } +tcvtrowps2phl, 0x666d, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM } +tcvtrowps2phl, 0xf277, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM } + +tilemovrow, 0x664a, AMX_AVX512, Modrm|EVex512|Space0F38|Src2VVVV|VexW0|NoSuf, { Reg32, RegTMM, RegZMM } +tilemovrow, 0x6607, AMX_AVX512, Modrm|EVex512|Space0F3A|VexW0|NoSuf, { Imm8, RegTMM, RegZMM } + // AMX instructions end. // KEYLOCKER instructions.