From patchwork Mon Nov 21 02:11:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 60898 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B50533858412 for ; Mon, 21 Nov 2022 02:14:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B50533858412 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668996875; bh=qfK+KRojX4b8asH+IcvNDgXffP62uN5k/cHPRgemWd4=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=t5vkCBqGmSOXLlIAFtlv9YUQfrtO0Uen52Tiaq6SprCm/dg4MXriH4bb8cmp0CHtn 0xCAdsclqlhKZI6EeT68IoIcRWmWAGxkUf2kbiZ0Cp2eOLc/C6XmaqLiV9n2lM8G+1 y2RnLjt59Pf9scu1vYmBu9n0Cc05fehsBcz/BcCg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by sourceware.org (Postfix) with ESMTPS id B212C3858C52 for ; Mon, 21 Nov 2022 02:13:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B212C3858C52 X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="315275737" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="315275737" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 18:13:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="746720174" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="746720174" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga002.fm.intel.com with ESMTP; 20 Nov 2022 18:13:52 -0800 Received: from shliclel4051.sh.intel.com (shliclel4051.sh.intel.com [10.239.240.51]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 771621005188; Mon, 21 Nov 2022 10:13:50 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com, ubizjak@gmail.com Subject: [PATCH] [x86] Some tidy up for RA related hooks. Date: Mon, 21 Nov 2022 10:11:50 +0800 Message-Id: <20221121021150.3348406-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" When i'm working at [1] for ix86_can_change_mode_class, I notice there're some incorrectness/misoptimization in current RA-related hook. This patch tries to do some fix and tidy up for them: 1. We also need to guard size of TO to be less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class. 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move above SSE2, so no need for the condition of AVX512FP16 for those evex sse registers. 3. Allocate DI/HImode to sse register for SSE2 above just like SImode since we've supported 16-bit data move between sse and gpr above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI) 0) or else RA will spill it. This enable optimization for pieces-memset-{3,37,39}.c 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok. [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_can_change_mode_class): Also guard size of TO. (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to .. (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode. (VALID_SSE_REG_MODE): Add DI/HImode. * config/i386/mmx.md (*mov_internal): Add ix86_hard_reg_move_ok to condition. gcc/testsuite/ChangeLog: * gcc.target/i386/pieces-memset-3.c: Remove xfail. * gcc.target/i386/pieces-memset-37.c: Remove xfail. * gcc.target/i386/pieces-memset-39.c: Remove xfail. --- gcc/config/i386/i386.cc | 9 ++------- gcc/config/i386/i386.h | 16 ++++++++-------- gcc/config/i386/mmx.md | 6 ++++-- gcc/testsuite/gcc.target/i386/pieces-memset-3.c | 4 ++-- gcc/testsuite/gcc.target/i386/pieces-memset-37.c | 4 ++-- gcc/testsuite/gcc.target/i386/pieces-memset-39.c | 4 ++-- 6 files changed, 20 insertions(+), 23 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 292b32c5e99..030c26965ab 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to, the vec_dupv4hi pattern. NB: SSE2 can load 16bit data to sse register via pinsrw. */ int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4; - if (GET_MODE_SIZE (from) < mov_size) + if (GET_MODE_SIZE (from) < mov_size + || GET_MODE_SIZE (to) < mov_size) return false; } @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) || VALID_AVX512F_SCALAR_MODE (mode))) return true; - /* For AVX512FP16, vmovw supports movement of HImode - and HFmode between GPR and SSE registers. */ - if (TARGET_AVX512FP16 - && VALID_AVX512FP16_SCALAR_MODE (mode)) - return true; - /* For AVX-5124FMAPS or AVX-5124VNNIW allow V64SF and V64SI modes for special regnos. */ if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 3869db8f2d3..d9a1fb0e420 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode) #define VALID_AVX512F_SCALAR_MODE(MODE) \ - ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode \ - || (MODE) == SFmode) - -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \ - ((MODE) == HImode || (MODE) == HFmode) + ((MODE) == DImode || (MODE) == DFmode \ + || (MODE) == SImode || (MODE) == SFmode \ + || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode) #define VALID_AVX512F_REG_MODE(MODE) \ ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \ @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode \ || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode \ || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ - || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode \ - || (MODE) == HFmode || (MODE) == BFmode) + || (MODE) == V2DImode || (MODE) == V2QImode \ + || (MODE) == DFmode || (MODE) == DImode \ + || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode) #define VALID_SSE_REG_MODE(MODE) \ ((MODE) == V1TImode || (MODE) == TImode \ || (MODE) == V4SFmode || (MODE) == V4SImode \ - || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode) + || (MODE) == SFmode || (MODE) == SImode \ + || (MODE) == TFmode || (MODE) == TDmode) #define VALID_MMX_REG_MODE_3DNOW(MODE) \ ((MODE) == V2SFmode || (MODE) == SFmode) diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index d5134cc351e..63aff287795 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -133,7 +133,8 @@ (define_insn "*mov_internal" (match_operand:MMXMODE 1 "nonimm_or_0_operand" "rCo,rC,C,rm,rC,C ,!y,m ,?!y,?!y,r ,C,v,m,v,v,r,*x,!y"))] "(TARGET_MMX || TARGET_MMX_WITH_SSE) - && !(MEM_P (operands[0]) && MEM_P (operands[1]))" + && !(MEM_P (operands[0]) && MEM_P (operands[1])) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) { @@ -286,7 +287,8 @@ (define_insn "*mov_internal" "=r ,m ,v,v,v,m,r,v") (match_operand:V_32 1 "general_operand" "rmC,rC,C,v,m,v,v,r"))] - "!(MEM_P (operands[0]) && MEM_P (operands[1]))" + "!(MEM_P (operands[0]) && MEM_P (operands[1])) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) { diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c index 765441a7c4a..4f105f58b26 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c @@ -13,6 +13,6 @@ foo (int x) /* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */ /* No need to dynamically realign the stack here. */ -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ /* Nor use a frame pointer. */ -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c index 0c5056be54d..fd09bd153ce 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c @@ -10,6 +10,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst) /* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */ /* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ /* No need to dynamically realign the stack here. */ -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ /* Nor use a frame pointer. */ -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c index e33644c2f10..0ed88b274bd 100644 --- a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c @@ -11,6 +11,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst) /* { dg-final { scan-assembler-not "vinserti64x4" } } */ /* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */ /* No need to dynamically realign the stack here. */ -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ /* Nor use a frame pointer. */ -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */