From patchwork Mon Nov 29 07:46:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liu, Hongtao" X-Patchwork-Id: 48243 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 69D043944409 for ; Mon, 29 Nov 2021 07:46:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 69D043944409 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638172009; bh=+XZaKFl/1gqbsssDnPDyuqM3qEnwThcJJZ4A6fKNwco=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ZDFvGJDhVW7yzGiYg3/bcxpL9xZQPPwSw/J08HF6Ec/av+7N7C7g+7gFzTg6edaKj 7GBmJW3Hv+Rof8oJQd7A5nYwevUH2Hbj2qRaUoxusU4ZyuqcxUXDLKyy3o68EkAbJ1 WamRIGWTGU20FzBYgjqaM2PNJU8llZ1o/kDajRaw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id 6480B3894C3E for ; Mon, 29 Nov 2021 07:46:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6480B3894C3E X-IronPort-AV: E=McAfee;i="6200,9189,10182"; a="322147539" X-IronPort-AV: E=Sophos;i="5.87,272,1631602800"; d="scan'208";a="322147539" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Nov 2021 23:46:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,272,1631602800"; d="scan'208";a="510890342" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga007.fm.intel.com with ESMTP; 28 Nov 2021 23:46:18 -0800 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 1AT7kGc9029769; Sun, 28 Nov 2021 23:46:17 -0800 To: gcc-patches@gcc.gnu.org Subject: [PATCH] Optimize _Float16 usage for non AVX512FP16. Date: Mon, 29 Nov 2021 15:46:16 +0800 Message-Id: <20211129074616.78603-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: "Liu, Hongtao" Reply-To: liuhongt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" As discussed in PR, this patch do optimizations: 1. No memory is needed to move HI/HFmode between GPR and SSE registers under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o AVX512FP16. 2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant initialization cound be eliminated. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and x86_64-pc-linux-gnu{-m32\ -march=cadcadelake,\ -march=cascadelake} Ok for trunk? gcc/ChangeLog: PR target/102811 * config/i386/i386.c (inline_secondary_memory_needed): HImode move between GPR and SSE registers is supported under TARGET_SSE2 and above. * config/i386/i386.md (extendhfsf2): Optimize expander. (truncsfhf2): Ditto. * config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to align with V8HImode. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102811-2.c: New test. * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new scan-assembler-times. --- gcc/config/i386/i386.c | 5 +++-- gcc/config/i386/i386.md | 18 +++++++++++---- gcc/config/i386/sse.md | 2 +- .../i386/avx512vl-vcvtps2ph-pr102811.c | 2 +- gcc/testsuite/gcc.target/i386/pr102811-2.c | 22 +++++++++++++++++++ 5 files changed, 41 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr102811-2.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7cf599f57f7..2657e7817ae 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19437,8 +19437,9 @@ inline_secondary_memory_needed (machine_mode mode, reg_class_t class1, if (msize > UNITS_PER_WORD) return true; - /* In addition to SImode moves, AVX512FP16 also enables HImode moves. */ - int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode); + /* In addition to SImode moves, HImode moves are supported for SSE2 and above, + Use vmovw with AVX512FP16, or pinsrw/pextrw without AVX512FP16. */ + int minsize = GET_MODE_SIZE (TARGET_SSE2 ? HImode : SImode); if (msize < minsize) return true; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 2cb3e727588..070758edb66 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4617,9 +4617,18 @@ (define_expand "extendhfsf2" if (!TARGET_AVX512FP16) { rtx res = gen_reg_rtx (V4SFmode); - rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode)); + rtx tmp = gen_reg_rtx (V8HFmode); + rtx zero = force_reg (V8HFmode, CONST0_RTX (V8HFmode)); - ix86_expand_vector_set (false, tmp, operands[1], 0); + if (TARGET_AVX2) + { + rtx dup = gen_reg_rtx (V8HFmode); + emit_move_insn (dup, gen_rtx_VEC_DUPLICATE (V8HFmode, operands[1])); + emit_move_insn (tmp, gen_rtx_VEC_MERGE (V8HFmode, dup, + zero, const1_rtx)); + } + else + emit_insn (gen_sse2_pinsrph (tmp, zero, operands[1], const1_rtx)); emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp))); emit_move_insn (operands[0], gen_lowpart (SFmode, res)); DONE; @@ -4833,9 +4842,10 @@ (define_expand "truncsfhf2" if (!TARGET_AVX512FP16) { rtx res = gen_reg_rtx (V8HFmode); - rtx tmp = force_reg (V4SFmode, CONST0_RTX (V4SFmode)); + rtx tmp = gen_reg_rtx (V4SFmode); + rtx zero = force_reg (V4SFmode, CONST0_RTX (V4SFmode)); - ix86_expand_vector_set (false, tmp, operands[1], 0); + emit_insn (gen_vec_setv4sf_0 (tmp, zero, operands[1])); emit_insn (gen_vcvtps2ph (gen_lowpart (V8HImode, res), tmp, GEN_INT (4))); emit_move_insn (operands[0], gen_lowpart (HFmode, res)); DONE; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 5229b23af98..b371b140eb1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17272,7 +17272,7 @@ (define_mode_iterator PINSR_MODE (V2DI "TARGET_SSE4_1 && TARGET_64BIT")]) (define_mode_attr sse2p4_1 - [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1") + [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse2") (V4SI "sse4_1") (V2DI "sse4_1")]) (define_mode_attr pinsr_evex_isa diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c index dfbfb167953..9a6c432c866 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mf16c -mno-avx512fp16" } */ -/* { dg-final { scan-assembler-times "vpxor\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "vpxor\[ \\t\]" 1 } } */ /* { dg-final { scan-assembler-times "vcvtph2ps\[ \\t\]" 2 } } */ /* { dg-final { scan-assembler-times "vcvtps2ph\[ \\t\]" 1 } } */ /* { dg-final { scan-assembler-not "__truncsfhf2\[ \\t\]"} } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102811-2.c b/gcc/testsuite/gcc.target/i386/pr102811-2.c new file mode 100644 index 00000000000..e511c665ae8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102811-2.c @@ -0,0 +1,22 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mf16c -mno-avx512fp16" } */ +/* { dg-final { scan-assembler-times "pextrw" 1 } } */ +/* { dg-final { scan-assembler-times "pinsrw" 1 } } */ +/* { dg-final { scan-assembler-not "\\\(%rsp\\\)"} } */ +short test (_Float16 a) +{ + union{ + short b; + _Float16 a;}u; + u.a = a; + return u.b; +} + +_Float16 test1 (short a) +{ + union{ + _Float16 b; + short a;}u; + u.a = a; + return u.b; +}