From patchwork Thu Jun 25 22:38:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 39802 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AD127387090F; Thu, 25 Jun 2020 22:38:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AD127387090F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1593124692; bh=Z9mQBcXqFgpmIQCU3/6EzltEvmNZF8C9tJZ4+z9XJ3U=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=lE4F2Fj6V7oesMxgx3m2vs9sgshgLtJ/+FvOmrMbtCy74qBFTyOpDSUA60F4dhnl+ 1UHRcHc6V1lZQn2l42kCJkez57xI8pfw8wHaO19WxAlRbOfjLXkLW77Mi4Wov7cZA9 k8Dyc0dDWJBo2GLui6NPwNsoYdEMDqJpmL8y3Amg= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) by sourceware.org (Postfix) with ESMTPS id B05A8387086C for ; Thu, 25 Jun 2020 22:38:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B05A8387086C Received: by mail-pf1-x443.google.com with SMTP id 207so3537244pfu.3 for ; Thu, 25 Jun 2020 15:38:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=Z9mQBcXqFgpmIQCU3/6EzltEvmNZF8C9tJZ4+z9XJ3U=; b=KkdTFN8ufHshMqp+XJXqt7//d2NLx+j1mOnODvxk69M5LRLegkJRn1+8tUZ/6w6Vcl Ij1x25VFlkcs0jow65WBrvGbO4GRKxNfPBOhhTmqs3v1gdW7d0NDRlE2b+Fr2WaYeloN 1LbsUWbQIMUq0gZXbSKyw7Xwf225K2G0+Al0zFSv1unn2dCmhd+ACTw0Gtyy+h+a8MJS zfwL00Xa4HFenK7c0y3ZBjChX0jOFYewv6xBhqqr+7DEULFTJmoYMBaZumz1IF2E8Ep1 /jjgScSw6VG6GZYg8UG9/bVNrpyKVIzbRFzs17mo5XBb61yU6Aatuje0M3xZ5CIGtWwO abhw== X-Gm-Message-State: AOAM533iu1/BIuZf70vLFf9ns+R0ZDTPRUiMiYH3vqodhZpaMBW1YVSF ljbauARpea3fxjUkarh0Cvo2I4BQ X-Google-Smtp-Source: ABdhPJy9Trw7LgTh2tyPj600dxt/KRkPWBM0shxua01ws3e6fpl33LKEEK5rEb79HPKC6YrRTtb44Q== X-Received: by 2002:a63:af01:: with SMTP id w1mr87391pge.23.1593124689428; Thu, 25 Jun 2020 15:38:09 -0700 (PDT) Received: from gnu-cfl-2.localdomain (c-69-181-90-243.hsd1.ca.comcast.net. [69.181.90.243]) by smtp.gmail.com with ESMTPSA id gt22sm8712953pjb.2.2020.06.25.15.38.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jun 2020 15:38:08 -0700 (PDT) Received: from gnu-cfl-2.localdomain (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id EDA021A0115 for ; Thu, 25 Jun 2020 15:38:07 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH] x86: Detect Intel Advanced Matrix Extensions Date: Thu, 25 Jun 2020 15:38:07 -0700 Message-Id: <20200625223807.3447984-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Intel AMX is an extensible architecture. New accelerators can be added and the existing accelerator may be enhanced to provide higher performance. The initial features are AMX-BF16, AMX-TILE and AMX-INT8, which are usable only if the operating system supports both XTILECFG state and XTILEDATA state. Add AMX-BF16, AMX-TILE and AMX-INT8 support to HAS_CPU_FEATURE and CPU_FEATURE_USABLE. --- sysdeps/x86/cpu-features.c | 18 ++++++++++++++++++ sysdeps/x86/cpu-features.h | 20 ++++++++++++++++++++ sysdeps/x86/tst-get-cpu-features.c | 6 ++++++ 3 files changed, 44 insertions(+) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 79bc0d7216..c351bdd54a 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -239,6 +239,24 @@ get_common_indices (struct cpu_features *cpu_features, } } + /* Are XTILECFG and XTILEDATA states usable? */ + if ((xcrlow & (bit_XTILECFG_state | bit_XTILEDATA_state)) + == (bit_XTILECFG_state | bit_XTILEDATA_state)) + { + /* Determine if AMX_BF16 is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, AMX_BF16)) + cpu_features->usable[index_arch_AMX_BF16_Usable] + |= bit_arch_AMX_BF16_Usable; + /* Determine if AMX_TILE is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, AMX_TILE)) + cpu_features->usable[index_arch_AMX_TILE_Usable] + |= bit_arch_AMX_TILE_Usable; + /* Determine if AMX_INT8 is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, AMX_INT8)) + cpu_features->usable[index_arch_AMX_INT8_Usable] + |= bit_arch_AMX_INT8_Usable; + } + /* For _dl_runtime_resolve, set xsave_state_size to xsave area size + integer register save size and align it to 64 bytes. */ if (cpu_features->basic.max_cpuid >= 0xd) diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h index 574f055e0c..78d0692fab 100644 --- a/sysdeps/x86/cpu-features.h +++ b/sysdeps/x86/cpu-features.h @@ -156,6 +156,9 @@ extern const struct cpu_features *__get_cpu_features (void) #define bit_arch_AVX512_VP2INTERSECT_Usable (1u << 24) #define bit_arch_AVX512_BF16_Usable (1u << 25) #define bit_arch_PKU_Usable (1u << 26) +#define bit_arch_AMX_BF16_Usable (1u << 27) +#define bit_arch_AMX_TILE_Usable (1u << 28) +#define bit_arch_AMX_INT8_Usable (1u << 29) #define index_arch_AVX_Usable USABLE_FEATURE_INDEX_1 #define index_arch_AVX2_Usable USABLE_FEATURE_INDEX_1 @@ -184,6 +187,9 @@ extern const struct cpu_features *__get_cpu_features (void) #define index_arch_AVX512_VP2INTERSECT_Usable USABLE_FEATURE_INDEX_1 #define index_arch_AVX512_BF16_Usable USABLE_FEATURE_INDEX_1 #define index_arch_PKU_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AMX_BF16_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AMX_TILE_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AMX_INT8_Usable USABLE_FEATURE_INDEX_1 #define feature_AVX_Usable usable #define feature_AVX2_Usable usable @@ -212,6 +218,9 @@ extern const struct cpu_features *__get_cpu_features (void) #define feature_AVX512_VP2INTERSECT_Usable usable #define feature_AVX512_BF16_Usable usable #define feature_PKU_Usable usable +#define feature_AMX_BF16_Usable usable +#define feature_AMX_TILE_Usable usable +#define feature_AMX_INT8_Usable usable /* CPU features. */ @@ -347,6 +356,9 @@ extern const struct cpu_features *__get_cpu_features (void) #define bit_cpu_TSXLDTRK (1u << 16) #define bit_cpu_PCONFIG (1u << 18) #define bit_cpu_IBT (1u << 20) +#define bit_cpu_AMX_BF16 (1u << 22) +#define bit_cpu_AMX_TILE (1u << 24) +#define bit_cpu_AMX_INT8 (1u << 25) #define bit_cpu_IBRS_IBPB (1u << 26) #define bit_cpu_STIBP (1u << 27) #define bit_cpu_L1D_FLUSH (1u << 28) @@ -527,6 +539,9 @@ extern const struct cpu_features *__get_cpu_features (void) #define index_cpu_SERIALIZE COMMON_CPUID_INDEX_7 #define index_cpu_HYBRID COMMON_CPUID_INDEX_7 #define index_cpu_TSXLDTRK COMMON_CPUID_INDEX_7 +#define index_cpu_AMX_BF16 COMMON_CPUID_INDEX_7 +#define index_cpu_AMX_TILE COMMON_CPUID_INDEX_7 +#define index_cpu_AMX_INT8 COMMON_CPUID_INDEX_7 #define index_cpu_PCONFIG COMMON_CPUID_INDEX_7 #define index_cpu_IBT COMMON_CPUID_INDEX_7 #define index_cpu_IBRS_IBPB COMMON_CPUID_INDEX_7 @@ -709,6 +724,9 @@ extern const struct cpu_features *__get_cpu_features (void) #define reg_SERIALIZE edx #define reg_HYBRID edx #define reg_TSXLDTRK edx +#define reg_AMX_BF16 edx +#define reg_AMX_TILE edx +#define reg_AMX_INT8 edx #define reg_PCONFIG edx #define reg_IBT edx #define reg_IBRS_IBPB edx @@ -819,6 +837,8 @@ extern const struct cpu_features *__get_cpu_features (void) #define bit_Opmask_state (1u << 5) #define bit_ZMM0_15_state (1u << 6) #define bit_ZMM16_31_state (1u << 7) +#define bit_XTILECFG_state (1u << 17) +#define bit_XTILEDATA_state (1u << 18) # if defined (_LIBC) && !IS_IN (nonlib) /* Unused for x86. */ diff --git a/sysdeps/x86/tst-get-cpu-features.c b/sysdeps/x86/tst-get-cpu-features.c index c60918cf00..3d44af202e 100644 --- a/sysdeps/x86/tst-get-cpu-features.c +++ b/sysdeps/x86/tst-get-cpu-features.c @@ -185,6 +185,9 @@ do_test (void) CHECK_CPU_FEATURE (SERIALIZE); CHECK_CPU_FEATURE (HYBRID); CHECK_CPU_FEATURE (TSXLDTRK); + CHECK_CPU_FEATURE (AMX_BF16); + CHECK_CPU_FEATURE (AMX_TILE); + CHECK_CPU_FEATURE (AMX_INT8); CHECK_CPU_FEATURE (PCONFIG); CHECK_CPU_FEATURE (IBT); CHECK_CPU_FEATURE (IBRS_IBPB); @@ -239,6 +242,9 @@ do_test (void) CHECK_CPU_FEATURE_USABLE (AVX512_4VNNIW); CHECK_CPU_FEATURE_USABLE (AVX512_4FMAPS); CHECK_CPU_FEATURE_USABLE (AVX512_VP2INTERSECT); + CHECK_CPU_FEATURE_USABLE (AMX_BF16); + CHECK_CPU_FEATURE_USABLE (AMX_TILE); + CHECK_CPU_FEATURE_USABLE (AMX_INT8); CHECK_CPU_FEATURE_USABLE (XOP); CHECK_CPU_FEATURE_USABLE (FMA4); CHECK_CPU_FEATURE_USABLE (XSAVEC);