Message ID | 20250113082614.1716559-1-haochen.jiang@intel.com |
---|---|
Headers |
Return-Path: <binutils-bounces~patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B16C385783B for <patchwork@sourceware.org>; Mon, 13 Jan 2025 08:28:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B16C385783B Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=kIr55hv9 X-Original-To: binutils@sourceware.org Delivered-To: binutils@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by sourceware.org (Postfix) with ESMTPS id 3707C385841F for <binutils@sourceware.org>; Mon, 13 Jan 2025 08:26:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3707C385841F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3707C385841F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736756779; cv=none; b=HJuDvBe7wNnHyh8rD1D5bQdR0NJ6PbVGGh8UK3rBiHbllQ1tWHGVmCuyGVbMMpSCrylvls8/MmOZLewwW6DCI3b37qCUSPtvBU53zwsS+4b00E7zaG9qpxuwYI0GELaa5oW/h3+D+52rRgrylYwj4HHDY26etz4boh3eD4OWAM8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1736756779; c=relaxed/simple; bh=WxA4G10z01crjNE0t7IWDH3wnLT5SQh2/Mtfv6BPd0Y=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=gDdz3l5Ao8Re9wbf11v5Ul7VM85mPuB8/w34TpVdXCANIdjvHuaBhKGjW4ocUJWA07hk1lYBUENBwpEABj+WAens52IvTIH57vICNFXJArHYdTeiTDJrRM9xpjZtspo0BCp+5BzmV2DpHyGT0A5vXx5VW1H8mIWcn1l1FvizTxs= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3707C385841F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736756780; x=1768292780; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=WxA4G10z01crjNE0t7IWDH3wnLT5SQh2/Mtfv6BPd0Y=; b=kIr55hv95pQPv5cdEhu/iwIJrwkmVKJG5AQkWoUpPYXn64xtcwWTTCBz eHAF+P78PrZDFogbgbfTM1SxBZAsVI1mHVx6R5XHmrA8n88zdbU14bNes Yy2vVFTU+UMYf0yN3gTew56O0tS+m0CooGYdwq8v1JBD6w/K8zMsIhmzy ASizNibjD5eprka+r/WSIR0Tzr7YLS/0hlUv7rRjPl12nkp93yvDtqYYX 9ECcChjIT29cxQkTpV413w3iYZLgFD82s9k4v610AO+EdtYnvkFrZq0gR oosnN81DHZ79XLtxLcQROjOyv75CVG+isoz2Dsd+JY9Cq2eThEDQdzw9X A==; X-CSE-ConnectionGUID: XvOHOBFCSN+ojjJqac0FGw== X-CSE-MsgGUID: EdtBD6GaTiaRPKLPGlqWIg== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="54419268" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="54419268" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2025 00:26:19 -0800 X-CSE-ConnectionGUID: gYFRnAPqQvS0hkmQBO7klg== X-CSE-MsgGUID: D/N/TKV1QEWL5s9s1Va9JA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="109391312" Received: from shliclel4217.sh.intel.com ([10.239.240.127]) by orviesa004.jf.intel.com with ESMTP; 13 Jan 2025 00:26:17 -0800 From: Haochen Jiang <haochen.jiang@intel.com> To: binutils@sourceware.org Cc: hjl.tools@gmail.com, jbeulich@suse.com, ludloff@gmail.com Subject: [PATCH 0/3] Refine AVX10.2 mnemonics Date: Mon, 13 Jan 2025 16:26:11 +0800 Message-Id: <20250113082614.1716559-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Binutils mailing list <binutils.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/binutils>, <mailto:binutils-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/binutils/> List-Post: <mailto:binutils@sourceware.org> List-Help: <mailto:binutils-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/binutils>, <mailto:binutils-request@sourceware.org?subject=subscribe> Errors-To: binutils-bounces~patchwork=sourceware.org@sourceware.org |
Series | Refine AVX10.2 mnemonics | |
Message
Jiang, Haochen
Jan. 13, 2025, 8:26 a.m. UTC
Hi all, Since AVX10.2 got published, there are several discussions regarding to the mnemonics. After internal discussion, we will make three changes for AVX10.2 mnemonics. Quick Conclusion: - NE would be removed for all AVX10.2 new insns - VCOMSBF16 -> VCOMISBF16 - P for packed omitted for AI data types The details of why doing so will be embedding at the end of this email. All of them will get refined in AVX10.2 SPEC (ETA published this week). Since Binutils 2.44 window is near, I choose to send them out. Upcoming will be the three patches to change the mnemonics according to the conclusion, where patch 1 on BF16 arithmetic insns, patch 2 on VCOMISBF16 and patch 3 on convert insns. These three patches would probably be the last unsent patches for Intel Diamond Rapids in Binutils 2.44 if no exception happened (The only potential exception would be MOVRS APX_F EVEX.W issue). Really appreciate the review and discussion on AVX10.2 and AMX since August. It has been a really long run. Ok for trunk? Thx, Haochen Details for the change: - NE removal for AI data types default roundings NE is a total mess after we review the instructions we had currently. The name itself is ambiguous. It should be Rounding to Nearest Even, but could be mis-interpretated to No Exception (That is actually how I interpretated at the beginning of this year and Jan’s understanding) or No Embedded Rounding. While the ambiguous name, it appears here and there w/o following a consistent rule. The biggest disaster is in AVX-NE-CONVERT, where almost all of the insns are up-convert (actually only one insn in that CPUID is down-convert which needs rounding) and should not have NE, but NE just appears everywhere. Given the current inconsistency and mess, we intend to clean it up in mnemonics since AVX10.2. A question would be since NE itself is ambiguous and misleading, why do we need them? Therefore, our decision is to remove NE from ALL new instructions in AVX10.2, with BF16 documentation added into SDM in the future (I would expect in this year). It will reduce the mnemonics length, also easier for user to use since they will no longer to remember if NE is needed in mnemonics. And since all the rounding modes are in insn description for now, it will not lead to guess in rounding mode. For old insns, the option for now is to leave as-is for the implementation is there for some time. However, it is discussable whether we could change that. But at least for Binutils 2.44 and GCC 15, it won't be changed due to the time. - VCOMSBF16 rename to VCOMISBF16. VCOMSBF16 actually got the same functionality as previous VCOMISD/S/H, except for the type. They are all compare and set three EFLAGS. Thus, it should be VCOMISBF16, not a brand new VCOMSBF16. - P prefix omitted for packed on BF16 and future AI data types For legacy double, float and FP16, we will use PD/PS/PH for packed and SD/SS/SH for scalar. Since the very beginning of BF16, P for Packed is omitted and that omit continues to now. We suppose there is an assumption that AI data types like BF16 would always be packed in calculation so the omit is safe at that time. However, in AVX10.2, P is not omitted for most instructions. Therefore, we decide to remove the P before BF16 in AVX10.2 BF16 instructions to keep consistent. It will also apply to AMX-AVX512 TCVTROWPS2PBF16[H,L], which will change to TCVTROWPS2BF16[H,L] (already affected in ISE056 and the AMX-AVX512 patch reflects that). Based on the P omit, we should add explicit S to indicate scalar use for BF16. For example, VCOMISBF16 is still VCOMISBF16. When we go through all the instructions, we found that there is a problem for VBCSTNEBF162PS in AVX-NE-CONVERT. It should add S before BF16, but since BCST itself is meaningful here, we will keep as-is for now. This will also apply to future AI data types, including TF32 and FP8 (Default packed, add S to indicate scalar explicitly). At the end of the day, there will be some significant changes in BF16 insns. For example, VADDNEPBF16 will become VADDBF16.
Comments
On Mon, Jan 13, 2025 at 9:26 AM Haochen Jiang <haochen.jiang@intel.com> wrote: > Since AVX10.2 got published, there are several discussions regarding to > the mnemonics. After internal discussion, we will make three changes for > AVX10.2 mnemonics. > > Quick Conclusion: > > - NE would be removed for all AVX10.2 new insns > - VCOMSBF16 -> VCOMISBF16 > - P for packed omitted for AI data types > > The details of why doing so will be embedding at the end of this email. > > All of them will get refined in AVX10.2 SPEC (ETA published this week). > Since Binutils 2.44 window is near, I choose to send them out. > > Upcoming will be the three patches to change the mnemonics according to > the conclusion, where patch 1 on BF16 arithmetic insns, patch 2 on > VCOMISBF16 and patch 3 on convert insns. > > These three patches would probably be the last unsent patches for Intel > Diamond Rapids in Binutils 2.44 if no exception happened (The only > potential exception would be MOVRS APX_F EVEX.W issue). Really appreciate > the review and discussion on AVX10.2 and AMX since August. It has been > a really long run. > > Ok for trunk? One more: don't forget the 256-bit {er} variants of VCVTDQ2PD at F3.0F__.W0.E6 and VCVTUDQ2PD at F3.0F__.W0.7A – they are missing from the AVX10.2 spec; they should be supported, similar to the existing 512-bit {er} variants: attempts to encode {er} should be ignored, as documented in the SDM. -- C.
On 13.01.2025 09:26, Haochen Jiang wrote: > Hi all, > > Since AVX10.2 got published, there are several discussions regarding to > the mnemonics. After internal discussion, we will make three changes for > AVX10.2 mnemonics. > > Quick Conclusion: > > - NE would be removed for all AVX10.2 new insns > - VCOMSBF16 -> VCOMISBF16 > - P for packed omitted for AI data types > > The details of why doing so will be embedding at the end of this email. > > All of them will get refined in AVX10.2 SPEC (ETA published this week). > Since Binutils 2.44 window is near, I choose to send them out. > > Upcoming will be the three patches to change the mnemonics according to > the conclusion, where patch 1 on BF16 arithmetic insns, patch 2 on > VCOMISBF16 and patch 3 on convert insns. > > These three patches would probably be the last unsent patches for Intel > Diamond Rapids in Binutils 2.44 if no exception happened (The only > potential exception would be MOVRS APX_F EVEX.W issue). Really appreciate > the review and discussion on AVX10.2 and AMX since August. It has been > a really long run. > > Ok for trunk? Okay, on the assumption that the doc will be updated accordingly in due course, and hence we're not going to end up with any back and forth. Jan
> From: Christian Ludloff <ludloff@gmail.com> > Sent: Monday, January 13, 2025 6:13 PM > > On Mon, Jan 13, 2025 at 9:26 AM Haochen Jiang <haochen.jiang@intel.com> > wrote: > > Since AVX10.2 got published, there are several discussions regarding > > to the mnemonics. After internal discussion, we will make three > > changes for > > AVX10.2 mnemonics. > > > > Quick Conclusion: > > > > - NE would be removed for all AVX10.2 new insns > > - VCOMSBF16 -> VCOMISBF16 > > - P for packed omitted for AI data types > > > > The details of why doing so will be embedding at the end of this email. > > > > All of them will get refined in AVX10.2 SPEC (ETA published this week). > > Since Binutils 2.44 window is near, I choose to send them out. > > > > Upcoming will be the three patches to change the mnemonics according > > to the conclusion, where patch 1 on BF16 arithmetic insns, patch 2 on > > VCOMISBF16 and patch 3 on convert insns. > > > > These three patches would probably be the last unsent patches for > > Intel Diamond Rapids in Binutils 2.44 if no exception happened (The > > only potential exception would be MOVRS APX_F EVEX.W issue). Really > > appreciate the review and discussion on AVX10.2 and AMX since August. > > It has been a really long run. > > > > Ok for trunk? > > One more: don't forget the 256-bit {er} variants of VCVTDQ2PD at > F3.0F__.W0.E6 and VCVTUDQ2PD at F3.0F__.W0.7A – they are missing from > the AVX10.2 spec; they should be supported, similar to the existing 512-bit > {er} variants: attempts to encode {er} should be ignored, as documented in the > SDM. > Let me find a way to fix that. Hope it could catch up with Binutils 2.44. Thx, Haochen
On Tue, Jan 14, 2025, 06:51 Jiang, Haochen <haochen.jiang@intel.com> wrote: > > One more: don't forget the 256-bit {er} variants of VCVTDQ2PD at > > F3.0F__.W0.E6 and VCVTUDQ2PD at F3.0F__.W0.7A – they are missing from > > the AVX10.2 spec; they should be supported, similar to the existing > 512-bit > > {er} variants: attempts to encode {er} should be ignored, as documented > in the > > SDM. > > > > Let me find a way to fix that. Hope it could catch up with Binutils 2.44. Hopefully all the AVX10.2 silicon got it right in the first place. -- C. >
>> > One more: don't forget the 256-bit {er} variants of VCVTDQ2PD at >> > F3.0F__.W0.E6 and VCVTUDQ2PD at F3.0F__.W0.7A – they are missing from >> > the AVX10.2 spec; they should be supported, similar to the existing 512-bit >> > {er} variants: attempts to encode {er} should be ignored, as documented in the >> > SDM. > Hopefully all the AVX10.2 silicon got it right in the first place. Fwiw, the latest AVX10.2 spec (#361050-003 from Jan 14) is still missing the 256-bit {er} variants of VCVT[,U]DQ2PD. For now, we have the unofficial confirmation here: https://sourceware.org/pipermail/binutils/2025-January/138698.html -- Christian
> From: Christian Ludloff <ludloff@gmail.com> > Sent: Friday, January 17, 2025 4:24 AM > > >> > One more: don't forget the 256-bit {er} variants of VCVTDQ2PD at > >> > F3.0F__.W0.E6 and VCVTUDQ2PD at F3.0F__.W0.7A – they are missing > >> > from the AVX10.2 spec; they should be supported, similar to the > >> > existing 512-bit {er} variants: attempts to encode {er} should be > >> > ignored, as documented in the SDM. > > > Hopefully all the AVX10.2 silicon got it right in the first place. > > Fwiw, the latest AVX10.2 spec (#361050-003 from Jan 14) is still missing the > 256-bit {er} variants of VCVT[,U]DQ2PD. > > For now, we have the unofficial confirmation here: > > https://sourceware.org/pipermail/binutils/2025-January/138698.html I suppose it is not included due to it actually has no rounding operand normally. We just ignore that rounding if someone got the bytecode with rounding. So when doc is there, it won't be included. Maybe we could get a better way to get that. (Actually when AVX10.2 things went into SDM, everything would be aligned.) Let me try to find a way. Thx, Haochen
> > >> > One more: don't forget the 256-bit {er} variants of VCVTDQ2PD at > > >> > F3.0F__.W0.E6 and VCVTUDQ2PD at F3.0F__.W0.7A – they are missing > > >> > from the AVX10.2 spec; they should be supported, similar to the > > >> > existing 512-bit {er} variants: attempts to encode {er} should be > > >> > ignored, as documented in the SDM. > > > > > Hopefully all the AVX10.2 silicon got it right in the first place. > > > > Fwiw, the latest AVX10.2 spec (#361050-003 from Jan 14) is still missing the > > 256-bit {er} variants of VCVT[,U]DQ2PD. > > > > For now, we have the unofficial confirmation here: > > > > https://sourceware.org/pipermail/binutils/2025-January/138698.html > > I suppose it is not included due to it actually has no rounding operand > normally. We just ignore that rounding if someone got the bytecode with > rounding. So when doc is there, it won't be included. Maybe we could get > a better way to get that. (Actually when AVX10.2 things went into SDM, > everything would be aligned.) Let me try to find a way. The AVX10.2 spec went to great length to list all those existing instructions for which their LL=256 variant now permits U=0 – it included CVT[,U]DQ2{PH,PS} but not CVT[,U]DQ2PD. Perhaps clone table entries for PD from table entries for PH/PS, and likewise clone instruction pages for PD from PH/PS but add the SDM statement "Attempt to encode this instruction with EVEX embedded rounding is ignored." on the PD pages. For SDM, it is probably best to add {er} to the four affected CVTs, perhaps as {er.ignored}. That way their "Attempt to..." statements would no longer be dangling as much. Fwiw, I have seen various tools emit the {er} for LL=512, as they handle PH/PS/PD identically, rather than special-case PD. And I expect to see this carry over to LL=256. Which is why I think it is important that the spec and the chips get it right (no #UD). Thanks for your hard work! -- C.