From patchwork Mon Sep 27 10:42:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 45464 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7BD5A385843B for ; Mon, 27 Sep 2021 10:43:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7BD5A385843B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1632739405; bh=9eUZnnElB+Wp9VuMAIxB1i3Yu8jdaAXwqQSTPd7Jsx8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=FwWIsW76jAXRVCq09GgW0hOd7zGFgyxn1P8eVJ4bZBwkXX9h+wCm4oBmNGy9OqyfP laAn3F9u0/5HSpjMyvfUcN2mt2slAbL5vk9YK3JzpWMIpOkYFBpyDREnIXYbi1iRp+ 8UcupDtwpoR9P2V/uMFeivkdLp2pHo/UfBtBYms8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id D5B743858C3A for ; Mon, 27 Sep 2021 10:42:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D5B743858C3A X-IronPort-AV: E=McAfee;i="6200,9189,10119"; a="288113423" X-IronPort-AV: E=Sophos;i="5.85,326,1624345200"; d="scan'208";a="288113423" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Sep 2021 03:42:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,326,1624345200"; d="scan'208";a="537658456" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 27 Sep 2021 03:42:53 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 18RAgpA1010780; Mon, 27 Sep 2021 03:42:51 -0700 To: ubizjak@gmail.com Subject: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation. Date: Mon, 27 Sep 2021 18:42:51 +0800 Message-Id: <20210927104251.81107-1-hongyu.wang@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, SPOOFED_FREEMAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Hongyu Wang via Gcc-patches From: Hongyu Wang Reply-To: Hongyu Wang Cc: hongtao.liu@intel.com, gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi Uros, This patch intends to support V4HF/V2HF vector type and basic operations. For 32bit target, V4HF vector is parsed same as __m64 type, V2HF is parsed by stack and returned from GPR since it is not specified by ABI. We found for 64bit vector in ia32, when mmx disabled there seems no mov_internal, so we add a define_insn for v4hf mode. It would be very ppreciated if you know why the handling of 64bit vector looks as is and give some advice. Bootstraped and regtested on x86_64-pc-linux-gnu{-m32,} and sde. OK for master? gcc/ChangeLog: PR target/102230 * config/i386/i386.h (VALID_AVX512FP16_REG_MODE): Add V4HF and V2HF mode check. (VALID_SSE2_REG_VHF_MODE): Likewise. (VALID_MMX_REG_MODE): Likewise. (SSE_REG_MODE_P): Replace VALID_AVX512FP16_REG_MODE with vector mode condition. * config/i386/i386.c (classify_argument): Parse V4HF/V2HF via sse regs. (function_arg_32): Add V4HFmode. (function_arg_advance_32): Likewise. * config/i386/i386.md (mode): Add V4HF/V2HF. (MODE_SIZE): Likewise. * config/i386/mmx.md (MMXMODE): Add V4HF mode. (V_32): Add V2HF mode. (*mov_internal): Adjust sse alternatives to support V4HF mode vector move. (*mov_internal): Adjust sse alternatives to support V2HF mode move. * config/i386/sse.md (VHF_32_64): New mode iterator. (3): New define_insn for add/sub/mul/div. (*movv4hf_internal_sse): New define_insn for -mno-mmx and -msse. gcc/testsuite/ChangeLog: PR target/102230 * gcc.target/i386/avx512fp16-floatvnhf.c: Remove xfail. * gcc.target/i386/avx512fp16-trunc-extendvnhf.c: Ditto. * gcc.target/i386/avx512fp16-truncvnhf.c: Ditto. * gcc.target/i386/avx512fp16-64-32-vecop-1.c: New test. * gcc.target/i386/avx512fp16-64-32-vecop-2.c: Ditto. * gcc.target/i386/pr102230.c: Ditto. --- gcc/config/i386/i386.c | 4 + gcc/config/i386/i386.h | 12 ++- gcc/config/i386/i386.md | 5 +- gcc/config/i386/mmx.md | 27 ++++--- gcc/config/i386/sse.md | 49 ++++++++++++ .../i386/avx512fp16-64-32-vecop-1.c | 30 ++++++++ .../i386/avx512fp16-64-32-vecop-2.c | 75 +++++++++++++++++++ .../gcc.target/i386/avx512fp16-floatvnhf.c | 12 +-- .../i386/avx512fp16-trunc-extendvnhf.c | 12 +-- .../gcc.target/i386/avx512fp16-truncvnhf.c | 12 +-- gcc/testsuite/gcc.target/i386/pr102230.c | 38 ++++++++++ 11 files changed, 243 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr102230.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ba89e111d28..b3e4add4b9e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2462,6 +2462,8 @@ classify_argument (machine_mode mode, const_tree type, case E_V2SFmode: case E_V2SImode: case E_V4HImode: + case E_V4HFmode: + case E_V2HFmode: case E_V8QImode: classes[0] = X86_64_SSE_CLASS; return 1; @@ -2902,6 +2904,7 @@ pass_in_reg: case E_V8QImode: case E_V4HImode: + case E_V4HFmode: case E_V2SImode: case E_V2SFmode: case E_V1TImode: @@ -3149,6 +3152,7 @@ pass_in_reg: case E_V8QImode: case E_V4HImode: + case E_V4HFmode: case E_V2SImode: case E_V2SFmode: case E_V1TImode: diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 8a4251b4926..9f3cad31f96 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1033,7 +1033,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == TImode) #define VALID_AVX512FP16_REG_MODE(MODE) \ - ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode) + ((MODE) == V8HFmode || (MODE) == V16HFmode || (MODE) == V32HFmode \ + || (MODE) == V4HFmode || (MODE) == V2HFmode) #define VALID_SSE2_REG_MODE(MODE) \ ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode \ @@ -1041,7 +1042,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode) #define VALID_SSE2_REG_VHF_MODE(MODE) \ - (VALID_SSE2_REG_MODE (MODE) || (MODE) == V8HFmode) + (VALID_SSE2_REG_MODE (MODE) || (MODE) == V8HFmode \ + || (MODE) == V4HFmode || (MODE) == V2HFmode) #define VALID_SSE_REG_MODE(MODE) \ ((MODE) == V1TImode || (MODE) == TImode \ @@ -1054,7 +1056,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); #define VALID_MMX_REG_MODE(MODE) \ ((MODE) == V1DImode || (MODE) == DImode \ || (MODE) == V2SImode || (MODE) == SImode \ - || (MODE) == V4HImode || (MODE) == V8QImode) + || (MODE) == V4HImode || (MODE) == V8QImode \ + || (MODE) == V4HFmode) #define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode) @@ -1087,7 +1090,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode \ || (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode \ || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode \ - || (MODE) == V16SFmode || VALID_AVX512FP16_REG_MODE (MODE)) + || (MODE) == V16SFmode || (MODE) == V32HFmode || (MODE) == V16HFmode \ + || (MODE) == V8HFmode) #define X87_FLOAT_MODE_P(MODE) \ (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode)) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index c6279e620c9..758d7d1e3c0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -498,7 +498,7 @@ ;; Main data type used by the insn (define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,OI,XI,HF,SF,DF,XF,TF,V32HF,V16HF,V8HF, - V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF" + V16SF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF,V8DF,V4HF,V2HF" (const_string "unknown")) ;; The CPU unit operations uses. @@ -1106,7 +1106,8 @@ (V1TI "16") (V2TI "32") (V4TI "64") (V2DF "16") (V4DF "32") (V8DF "64") (V4SF "16") (V8SF "32") (V16SF "64") - (V8HF "16") (V16HF "32") (V32HF "64")]) + (V8HF "16") (V16HF "32") (V32HF "64") + (V4HF "8") (V2HF "4")]) ;; Double word integer modes as mode attribute. (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "OI")]) diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index b0093778fc6..68e1c4b2dbd 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -48,7 +48,7 @@ (define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI (V1DI "TARGET_SSE2")]) ;; All 8-byte vector modes handled by MMX -(define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF]) +(define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF V4HF]) (define_mode_iterator MMXMODE124 [V8QI V4HI V2SI V2SF]) ;; Mix-n-match @@ -57,8 +57,8 @@ (define_mode_iterator MMXMODE24 [V4HI V2SI]) (define_mode_iterator MMXMODE248 [V4HI V2SI V1DI]) -;; All 4-byte integer vector modes -(define_mode_iterator V_32 [V4QI V2HI V1SI]) +;; All 4-byte integer/float16 vector modes +(define_mode_iterator V_32 [V4QI V2HI V1SI V2HF]) ;; 4-byte integer vector modes (define_mode_iterator VI_32 [V4QI V2HI]) @@ -191,6 +191,8 @@ (eq_attr "alternative" "11,12") (cond [(match_test "mode == V2SFmode") (const_string "V4SF") + (match_test "mode == V4HFmode") + (const_string "V4SF") (ior (not (match_test "TARGET_SSE2")) (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") @@ -198,14 +200,16 @@ (const_string "TI")) (and (eq_attr "alternative" "13") - (ior (and (match_test "mode == V2SFmode") - (not (match_test "TARGET_MMX_WITH_SSE"))) - (not (match_test "TARGET_SSE2")))) + (ior (ior (and (match_test "mode == V2SFmode") + (not (match_test "TARGET_MMX_WITH_SSE"))) + (not (match_test "TARGET_SSE2"))) + (match_test "mode == V4HFmode"))) (const_string "V2SF") (and (eq_attr "alternative" "14") - (ior (match_test "mode == V2SFmode") - (not (match_test "TARGET_SSE2")))) + (ior (ior (match_test "mode == V2SFmode") + (not (match_test "TARGET_SSE2"))) + (match_test "mode == V4HFmode"))) (const_string "V2SF") ] (const_string "DI"))) @@ -289,12 +293,17 @@ (const_string "*"))) (set (attr "mode") (cond [(eq_attr "alternative" "2,3") - (cond [(match_test "TARGET_AVX") + (cond [(match_test "mode == V2HFmode") + (const_string "V4SF") + (match_test "TARGET_AVX") (const_string "TI") (match_test "optimize_function_for_size_p (cfun)") (const_string "V4SF") ] (const_string "TI")) + (and (eq_attr "alternative" "4,5") + (match_test "mode == V2HFmode")) + (const_string "SF") ] (const_string "SI"))) (set (attr "preferred_for_speed") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a446dedb2ec..b7832926287 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -671,6 +671,9 @@ [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX512VL") (V16QI "TARGET_AVX512VL")]) +(define_mode_iterator VHF_32_64 + [V4HF V2HF]) + (define_mode_attr avx512 [(V16QI "avx512vl") (V32QI "avx512vl") (V64QI "avx512bw") (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw") @@ -1313,6 +1316,36 @@ ] (symbol_ref "true")))]) +(define_insn "*movv4hf_internal_sse" + [(set (match_operand:V4HF 0 "nonimmediate_operand" + "=v,v,v,m") + (match_operand:V4HF 1 "nonimmediate_or_sse_const_operand" + " C,v,m,v"))] + "!TARGET_MMX && TARGET_SSE2 + && (register_operand (operands[0], V4HFmode) + || register_operand (operands[1], V4HFmode))" +{ + switch (get_attr_type (insn)) + { + case TYPE_SSELOG1: + return standard_sse_constant_opcode (insn, operands); + + case TYPE_SSEMOV: + return ix86_output_ssemov (insn, operands); + + default: + gcc_unreachable (); + } +} + [(set_attr "type" "sselog1,ssemov,ssemov,ssemov") + (set_attr "prefix" "maybe_vex") + (set (attr "mode") + (cond [(eq_attr "alternative" "1") + (const_string "V4SF")] + (const_string "V2SF")))] +) + + ;; If mem_addr points to a memory region with less than whole vector size bytes ;; of accessible memory and k is a mask that would prevent reading the inaccessible ;; bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be transformed to vpblendd @@ -2165,6 +2198,22 @@ (set_attr "prefix" "") (set_attr "mode" "")]) +(define_insn "3" + [(set (match_operand:VHF_32_64 0 "register_operand" "=v") + (plusminusmultdiv:VHF_32_64 + (match_operand:VHF_32_64 1 "register_operand" "v") + (match_operand:VHF_32_64 2 "register_operand" "v")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL" + "vph\t{%2, %1, %0|%0, %1, %2}" + [(set (attr "type") + (cond [(match_test " == MULT") + (const_string "ssemul") + (match_test " == DIV") + (const_string "ssediv")] + (const_string "sseadd"))) + (set_attr "prefix" "evex") + (set_attr "mode" "V8HF")]) + ;; Standard scalar operation patterns which preserve the rest of the ;; vector for combiner. (define_insn "*_vm3" diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c new file mode 100644 index 00000000000..754e909d77b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-1.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ + +/* { dg-final { scan-assembler-times "vaddph" 2 } } */ +/* { dg-final { scan-assembler-times "vsubph" 2 } } */ +/* { dg-final { scan-assembler-times "vmulph" 2 } } */ +/* { dg-final { scan-assembler-times "vdivph" 2 } } */ + +#define DO_PRAGMA(X) _Pragma(#X) + +#define VEC_OP_VV(size, op, name) \ +void \ +__attribute__ ((noinline, noclone, optimize("tree-slp-vectorize"))) \ +vecop_v##size##hf##name (_Float16 * restrict dst, \ + _Float16 * restrict src1, _Float16 * restrict src2) \ +{ \ + int i; \ + DO_PRAGMA (GCC unroll size) \ + for (i = 0; i < size; i++) \ + dst[i] = src1[i] op src2[i]; \ +} + +VEC_OP_VV(4, +, add) +VEC_OP_VV(2, +, add) +VEC_OP_VV(4, -, sub) +VEC_OP_VV(2, -, sub) +VEC_OP_VV(4, *, mul) +VEC_OP_VV(2, *, mul) +VEC_OP_VV(4, /, div) +VEC_OP_VV(2, /, div) diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c b/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c new file mode 100644 index 00000000000..4dc6f9fb92e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-64-32-vecop-2.c @@ -0,0 +1,75 @@ +/* { dg-do run { target avx512fp16 } } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ + +static void vec_op_test (void); +#define DO_TEST vec_op_test +#define AVX512FP16 +#define AVX512VL +#include "avx512f-check.h" +#include "avx512fp16-64-32-vecop-1.c" + +_Float16 a[4], b[4], fexp[4], fref[4]; + +#define EMULATE_VEC_OP_VV(size, op, name) \ +void \ +__attribute__ ((noinline, noclone)) \ +scalar_vecop_v##size##hf##name ( \ + _Float16 * restrict dst, _Float16 * restrict src1, \ + _Float16 * restrict src2) \ +{ \ + int i; \ + for (i = 0; i < size; i++) \ + dst[i] = src1[i] op src2[i]; \ +} + +EMULATE_VEC_OP_VV (4, +, add) +EMULATE_VEC_OP_VV (2, +, add) +EMULATE_VEC_OP_VV (4, -, sub) +EMULATE_VEC_OP_VV (2, -, sub) +EMULATE_VEC_OP_VV (4, *, mul) +EMULATE_VEC_OP_VV (2, *, mul) +EMULATE_VEC_OP_VV (4, /, div) +EMULATE_VEC_OP_VV (2, /, div) + +void init() +{ + int i; + for (i = 0; i < 4; i++) + { + a[i] = i + 0.5; + b[i] = i * 1.5; + fexp[i] = fref[i] = 2.75 * i; + } +} + +int check_cond(void *a, void *b, int size) +{ + int i; + unsigned short *pa = (unsigned short *)a, + *pb = (unsigned short *)b; + for (i = 0; i < size; i++) + if (pa[i] != pb[i]) + return 0; + return 1; +} + +#define TEST_VEC_OP_VV(size, name) \ +{ \ + init (); \ + scalar_vecop_v##size##hf##name (a, b, fexp); \ + vecop_v##size##hf##name (a, b, fref); \ + if (!check_cond ((void *)fexp, (void *)fref, size)) \ + abort(); \ +} + +static void vec_op_test() +{ + TEST_VEC_OP_VV (4, add) + TEST_VEC_OP_VV (2, add) + TEST_VEC_OP_VV (4, sub) + TEST_VEC_OP_VV (2, sub) + TEST_VEC_OP_VV (4, mul) + TEST_VEC_OP_VV (2, mul) + TEST_VEC_OP_VV (4, div) + TEST_VEC_OP_VV (2, div) +} diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-floatvnhf.c b/gcc/testsuite/gcc.target/i386/avx512fp16-floatvnhf.c index 112ac3e74d5..8471a1d1d10 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-floatvnhf.c +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-floatvnhf.c @@ -43,16 +43,16 @@ FLOATHFVV(2, udi) /* { dg-final { scan-assembler-times "vcvtqq2phz\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtuqq2phz\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtqq2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtuqq2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtdq2ph\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtudq2ph\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtdq2phy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtudq2phy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vcvtdq2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtudq2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtuw2ph\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtw2ph\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-trunc-extendvnhf.c b/gcc/testsuite/gcc.target/i386/avx512fp16-trunc-extendvnhf.c index 286ea9f2624..2ef901a0375 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-trunc-extendvnhf.c +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-trunc-extendvnhf.c @@ -41,15 +41,15 @@ EXTENDHFVV(8, sf) EXTENDHFVV(4, sf) /* { dg-final { scan-assembler-times "vcvtpd2phz\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtpd2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtps2phx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtps2phxy\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vcvtps2phxx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtph2pd\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vcvtph2psx\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-truncvnhf.c b/gcc/testsuite/gcc.target/i386/avx512fp16-truncvnhf.c index ee55cd12300..7a51c9dd077 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-truncvnhf.c +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-truncvnhf.c @@ -43,16 +43,16 @@ FIX_TRUNCHFVV(2, udi) /* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2qq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2uqq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vcvttph2dq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvttph2udq\[ \\t\]+\[^\{\n\]*\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttph2uw\[ \\t\]+\[^\{\n\]*\[^\n\r]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvttph2w\[ \\t\]+\[^\{\n\]*\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102230.c b/gcc/testsuite/gcc.target/i386/pr102230.c new file mode 100644 index 00000000000..60cf1c32afe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102230.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16" } */ + +typedef _Float16 v4hf __attribute__ ((vector_size (8))); +typedef _Float16 v2hf __attribute__ ((vector_size (4))); + +v4hf +v4hf_abi_1 (v4hf a) +{ + return a; +} + +v4hf +v4hf_abi_3 (v4hf a, v4hf b, v4hf c) +{ + return c; +} + +/* { dg-final { scan-assembler-times "movq\[[\\t \]*%mm2, %mm0" 1 { target { ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovaps\[[\\t \]*%xmm2, %xmm0" 1 { target { ! ia32 } } } } */ + +v4hf +v4hf_abi_4 (v4hf a, v4hf b, v4hf c, v4hf d) +{ + return d; +} + +/* { dg-final { scan-assembler-times "movq\[[\\t \]*4\[(\]%esp\[)\], %mm0" 1 { target { ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovaps\[[\\t \]*%xmm3, %xmm0" 1 { target { ! ia32 } } } } */ + +v2hf +v2hf_test (v2hf a, v2hf b, v2hf c, v2hf d) +{ + return b; +} + +/* { dg-final { scan-assembler-times "movl\[[\\t \]*8\[(\]%esp\[)\], %eax" 1 { target { ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovaps\[[\\t \]*%xmm1, %xmm0" 1 { target { ! ia32 } } } } */