From patchwork Wed Sep 15 08:09:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 45012 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3F5093858018 for ; Wed, 15 Sep 2021 08:12:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3F5093858018 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1631693551; bh=LkCcVF39jOTKSFcGcvr4JJDvR7Y+oJcqDf1iFp9CVHg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=gF43nglw/zE3EBAM4fUcS0s0q2uOqFPTxJUYPN9KvpYy80dKXg/ZpEt3wofESxXl0 anruwF6IdXAmfLF8PT8pc7l/6GLIaJYeuceZt6Scm9rb6bCGH5vvvaNcpkIngGy07m E+xjAwS6NVN2/0w8gtrYfDn0QVGnLNFaFib1UUNg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 883373857C6B for ; Wed, 15 Sep 2021 08:10:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 883373857C6B X-IronPort-AV: E=McAfee;i="6200,9189,10107"; a="209345048" X-IronPort-AV: E=Sophos;i="5.85,294,1624345200"; d="scan'208";a="209345048" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2021 01:10:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,294,1624345200"; d="scan'208";a="452445322" Received: from scymds02.sc.intel.com ([10.82.73.244]) by orsmga002.jf.intel.com with ESMTP; 15 Sep 2021 01:10:00 -0700 Received: from shgcc10.sh.intel.com (shgcc10.sh.intel.com [10.239.154.125]) by scymds02.sc.intel.com with ESMTP id 18F89p6m023358; Wed, 15 Sep 2021 01:09:58 -0700 To: ubizjak@gmail.com Subject: [PATCH 4/4] [PATCH 4/4] x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY Date: Wed, 15 Sep 2021 16:09:51 +0800 Message-Id: <20210915080951.10362-5-lili.cui@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210915080951.10362-1-lili.cui@intel.com> References: <20210915080951.10362-1-lili.cui@intel.com> X-Spam-Status: No, score=-15.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "lili.cui--- via Gcc-patches" From: "Li, Pan2 via Gcc-patches" Reply-To: lili.cui@intel.com Cc: hongtao.liu@intel.com, gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" From: "H.J. Lu" 1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters. 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP splitters. 3. Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling avx_partial_xmm_update attribute. Don't convert AVX partial XMM register update if there is no partial SSE register dependency for SSE conversion. gcc/ * config/i386/i386-features.c (remove_partial_avx_dependency): Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before generating vxorps. * config/i386/i386.h (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New. (TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. * config/i386/i386.md (SSE FP to FP splitters): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY. (SSE INT to FP splitter): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY. * config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. gcc/testsuite/ * gcc.target/i386/avx-covert-1.c: New file. * gcc.target/i386/avx-fp-covert-1.c: Likewise. * gcc.target/i386/avx-int-covert-1.c: Likewise. * gcc.target/i386/sse-covert-1.c: Likewise. * gcc.target/i386/sse-fp-covert-1.c: Likewise. * gcc.target/i386/sse-int-covert-1.c: Likewise. --- gcc/config/i386/i386-features.c | 6 ++++-- gcc/config/i386/i386.h | 4 ++++ gcc/config/i386/i386.md | 9 ++++++--- gcc/config/i386/x86-tune.def | 15 +++++++++++++++ gcc/testsuite/gcc.target/i386/avx-covert-1.c | 19 +++++++++++++++++++ .../gcc.target/i386/avx-fp-covert-1.c | 15 +++++++++++++++ .../gcc.target/i386/avx-int-covert-1.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/sse-covert-1.c | 19 +++++++++++++++++++ .../gcc.target/i386/sse-fp-covert-1.c | 15 +++++++++++++++ .../gcc.target/i386/sse-int-covert-1.c | 14 ++++++++++++++ 10 files changed, 125 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx-covert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-int-covert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/sse-covert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/sse-int-covert-1.c diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c index ae5ea02a002..91bfa06d4bf 100644 --- a/gcc/config/i386/i386-features.c +++ b/gcc/config/i386/i386-features.c @@ -2218,14 +2218,16 @@ remove_partial_avx_dependency (void) machine_mode dest_mode = GET_MODE (dest); machine_mode src_mode; - if (TARGET_USE_VECTOR_FP_CONVERTS) + if (TARGET_USE_VECTOR_FP_CONVERTS + || !TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY) { src_mode = GET_MODE (XEXP (src, 0)); if (src_mode == E_SFmode || src_mode == E_DFmode) continue; } - if (TARGET_USE_VECTOR_CONVERTS) + if (TARGET_USE_VECTOR_CONVERTS + || !TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY) { src_mode = GET_MODE (XEXP (src, 0)); if (src_mode == E_SImode || src_mode == E_DImode) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index e76bb55c080..ec60b89753e 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -334,6 +334,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; ix86_tune_features[X86_TUNE_PARTIAL_REG_DEPENDENCY] #define TARGET_SSE_PARTIAL_REG_DEPENDENCY \ ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY] +#define TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY \ + ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY] +#define TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY \ + ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY] #define TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ ix86_tune_features[X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL] #define TARGET_SSE_UNALIGNED_STORE_OPTIMAL \ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 13f6f57cdcc..c82a9dc1f67 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4535,7 +4535,8 @@ (float_extend:DF (match_operand:SF 1 "nonimmediate_operand")))] "!TARGET_AVX - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed + && TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY + && epilogue_completed && optimize_function_for_speed_p (cfun) && (!REG_P (operands[1]) || (!TARGET_AVX && REGNO (operands[0]) != REGNO (operands[1]))) @@ -4708,7 +4709,8 @@ (float_truncate:SF (match_operand:DF 1 "nonimmediate_operand")))] "!TARGET_AVX - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed + && TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY + && epilogue_completed && optimize_function_for_speed_p (cfun) && (!REG_P (operands[1]) || (!TARGET_AVX && REGNO (operands[0]) != REGNO (operands[1]))) @@ -5243,7 +5245,8 @@ [(set (match_operand:MODEF 0 "sse_reg_operand") (float:MODEF (match_operand:SWI48 1 "nonimmediate_operand")))] "!TARGET_AVX - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed + && TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY + && epilogue_completed && optimize_function_for_speed_p (cfun) && (!EXT_REX_SSE_REG_P (operands[0]) || TARGET_AVX512VL)" diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 088edb6c4ca..58e8ead56b4 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -64,6 +64,21 @@ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency", m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 | m_BDVER | m_ZNVER | m_TREMONT | m_GENERIC) +/* X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY: This knob avoids + partial write to the destination in scalar SSE conversion from FP + to FP. */ +DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY, + "sse_partial_reg_fp_converts_dependency", + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 + | m_BDVER | m_ZNVER | m_GENERIC) + +/* X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY: This knob avoids partial + write to the destination in scalar SSE conversion from integer to FP. */ +DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY, + "sse_partial_reg_converts_dependency", + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 + | m_BDVER | m_ZNVER | m_GENERIC) + /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies are resolved on SSE register parts instead of whole registers, so we may maintain just lower part of scalar values in proper format leaving the diff --git a/gcc/testsuite/gcc.target/i386/avx-covert-1.c b/gcc/testsuite/gcc.target/i386/avx-covert-1.c new file mode 100644 index 00000000000..b6c794ecbb8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-covert-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=^sse_partial_reg_fp_converts_dependency,^sse_partial_reg_converts_dependency" } */ + +extern float f; +extern double d; +extern int i; + +void +foo (void) +{ + d = f; + f = i; +} + +/* { dg-final { scan-assembler "vcvtss2sd" } } */ +/* { dg-final { scan-assembler "vcvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "vcvtps2pd" } } */ +/* { dg-final { scan-assembler-not "vcvtdq2ps" } } */ +/* { dg-final { scan-assembler-not "vxorps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c b/gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c new file mode 100644 index 00000000000..c40c48b1b2d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=^sse_partial_reg_fp_converts_dependency" } */ + +extern float f; +extern double d; + +void +foo (void) +{ + d = f; +} + +/* { dg-final { scan-assembler "vcvtss2sd" } } */ +/* { dg-final { scan-assembler-not "vcvtps2pd" } } */ +/* { dg-final { scan-assembler-not "vxorps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-int-covert-1.c b/gcc/testsuite/gcc.target/i386/avx-int-covert-1.c new file mode 100644 index 00000000000..01bb64e66cc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-int-covert-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=^sse_partial_reg_converts_dependency" } */ + +extern float f; +extern int i; + +void +foo (void) +{ + f = i; +} + +/* { dg-final { scan-assembler "vcvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "vxorps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse-covert-1.c b/gcc/testsuite/gcc.target/i386/sse-covert-1.c new file mode 100644 index 00000000000..c30af694505 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse-covert-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mfpmath=sse -mtune-ctrl=^sse_partial_reg_fp_converts_dependency,^sse_partial_reg_converts_dependency" } */ + +extern float f; +extern double d; +extern int i; + +void +foo (void) +{ + d = f; + f = i; +} + +/* { dg-final { scan-assembler "cvtss2sd" } } */ +/* { dg-final { scan-assembler "cvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "cvtps2pd" } } */ +/* { dg-final { scan-assembler-not "cvtdq2ps" } } */ +/* { dg-final { scan-assembler-not "pxor" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c b/gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c new file mode 100644 index 00000000000..b6567e60e3e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mfpmath=sse -mtune-ctrl=^sse_partial_reg_fp_converts_dependency" } */ + +extern float f; +extern double d; + +void +foo (void) +{ + d = f; +} + +/* { dg-final { scan-assembler "cvtss2sd" } } */ +/* { dg-final { scan-assembler-not "cvtps2pd" } } */ +/* { dg-final { scan-assembler-not "pxor" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse-int-covert-1.c b/gcc/testsuite/gcc.target/i386/sse-int-covert-1.c new file mode 100644 index 00000000000..107f7241def --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse-int-covert-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mfpmath=sse -mtune-ctrl=^sse_partial_reg_converts_dependency" } */ + +extern float f; +extern int i; + +void +foo (void) +{ + f = i; +} + +/* { dg-final { scan-assembler "cvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "pxor" } } */