From patchwork Thu Jun 30 18:42:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 55612 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5E4A6383EC56 for ; Thu, 30 Jun 2022 18:42:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 6C344386C590 for ; Thu, 30 Jun 2022 18:42:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6C344386C590 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=G5m49jmeK3IYfKz/l1QpPTGdFHD8psv/zS3kARX8QuQ=; b=pCsOiSV/41Orus2DQYVt3VtUOj F61bNDqtc6Nmi2a1inddPEPaUM7HlE0OL6udolKEbEGDwqlfjKyEqNZsD4hVGB/Npt2Bo6XvwrayL Sa6dQT1wtHPjpdKZPniHZHTJkELTA32xtWOzqIFDLfxn77L5NDxceOcDbjUIlbje1He6ADeHX5t8n DAVjCGw6VZM0gB7omRb4gYCV3p4aWP4r0Ad9DtEgT51GjAmuXp70rYzEeJUC0yViLNDnMD1XpSSqG ZPzbB1Xu9ZhwoDU2FlhIW5WMe/Ar4vfo++9KeXSiZVYYUp7gOUQ81e+Zu9vCL8OTNtjLtmFcDUQou EYzCDAlQ==; Received: from [185.62.158.67] (port=61829 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1o6z7B-0005ix-Nw; Thu, 30 Jun 2022 14:42:09 -0400 From: "Roger Sayle" To: Subject: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups. Date: Thu, 30 Jun 2022 19:42:06 +0100 Message-ID: <006901d88cb1$1a06e870$4e14b950$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdiMsDnG2/4bjteZSH25WoiMMiZcMg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" This patch is a follow-up to Hongtao's fix for PR target/105854. That fix is perfectly correct, but the thing that caught my eye was why is the compiler generating a shift by zero at all. Digging deeper it turns out that we can easily optimize __builtin_ia32_palignr for alignments of 0 and 64 respectively, which may be simplified to moves from the highpart or lowpart. After adding optimizations to simplify the 64-bit DImode palignr, I started to add the corresponding optimizations for vpalignr (i.e. 128-bit). The first oddity is that sse.md uses TImode and a special SSESCALARMODE iterator, rather than V1TImode, and indeed the comment above SSESCALARMODE hints that this should be "dropped in favor of VIMAX_AVX2_AVX512BW". Hence this patch includes the migration of _palignr to use VIMAX_AVX2_AVX512BW, basically using V1TImode instead of TImode for 128-bit palignr. But it was only after I'd implemented this clean-up that I stumbled across the strange semantics of 128-bit [v]palignr. According to https://www.felixcloutier.com/x86/palignr, the semantics are subtly different based upon how the instruction is encoded. PALIGNR leaves the highpart unmodified, whilst VEX.128 encoded VPALIGNR clears the highpart, and (unless I'm mistaken) it looks like GCC currently uses the exact same RTL/templates for both, treating one as an alternative for the other. Hence I thought I'd post what I have so far (part optimization and part clean-up), to then ask the x86 experts for their opinions. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-,32}, with no new failures. Ok for mainline? 2022-06-30 Roger Sayle gcc/ChangeLog * config/i386/i386-builtin.def (__builtin_ia32_palignr128): Change CODE_FOR_ssse3_palignrti to CODE_FOR_ssse3_palignrv1ti. * config/i386/i386-expand.cc (expand_vec_perm_palignr): Use V1TImode and gen_ssse3_palignv1ti instead of TImode. * config/i386/sse.md (SSESCALARMODE): Delete. (define_mode_attr ssse3_avx2): Handle V1TImode instead of TImode. (_palignr): Use VIMAX_AVX2_AVX512BW as a mode iterator instead of SSESCALARMODE. (ssse3_palignrdi): Optimize cases when operands[3] is 0 or 64, using a single move instruction (if required). (define_split): Likewise split UNSPEC_PALIGNR $0 into a move. (define_split): Likewise split UNSPEC_PALIGNR $64 into a move. gcc/testsuite/ChangeLog * gcc.target/i386/ssse3-palignr-2.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index e6daad4..fd16093 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -900,7 +900,7 @@ BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_ssse3_psignv4si3, "__builtin_ia32_psig BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_ssse3_psignv2si3, "__builtin_ia32_psignd", IX86_BUILTIN_PSIGND, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI) /* SSSE3. */ -BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_ssse3_palignrti, "__builtin_ia32_palignr128", IX86_BUILTIN_PALIGNR128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT_CONVERT) +BDESC (OPTION_MASK_ISA_SSSE3, 0, CODE_FOR_ssse3_palignrv1ti, "__builtin_ia32_palignr128", IX86_BUILTIN_PALIGNR128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT_CONVERT) BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_ssse3_palignrdi, "__builtin_ia32_palignr", IX86_BUILTIN_PALIGNR, UNKNOWN, (int) V1DI_FTYPE_V1DI_V1DI_INT_CONVERT) /* SSE4.1 */ diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 8bc5430..6a3fcde 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -19548,9 +19548,11 @@ expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool single_insn_only_p) shift = GEN_INT (min * GET_MODE_UNIT_BITSIZE (d->vmode)); if (GET_MODE_SIZE (d->vmode) == 16) { - target = gen_reg_rtx (TImode); - emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, dcopy.op1), - gen_lowpart (TImode, dcopy.op0), shift)); + target = gen_reg_rtx (V1TImode); + emit_insn (gen_ssse3_palignrv1ti (target, + gen_lowpart (V1TImode, dcopy.op1), + gen_lowpart (V1TImode, dcopy.op0), + shift)); } else { diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8cd0f61..974deca 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -575,10 +575,6 @@ (define_mode_iterator VIMAX_AVX2 [(V2TI "TARGET_AVX2") V1TI]) -;; ??? This should probably be dropped in favor of VIMAX_AVX2_AVX512BW. -(define_mode_iterator SSESCALARMODE - [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI]) - (define_mode_iterator VI12_AVX2 [(V32QI "TARGET_AVX2") V16QI (V16HI "TARGET_AVX2") V8HI]) @@ -712,7 +708,7 @@ (V4HI "ssse3") (V8HI "ssse3") (V16HI "avx2") (V32HI "avx512bw") (V4SI "ssse3") (V8SI "avx2") (V2DI "ssse3") (V4DI "avx2") - (TI "ssse3") (V2TI "avx2") (V4TI "avx512bw")]) + (V1TI "ssse3") (V2TI "avx2") (V4TI "avx512bw")]) (define_mode_attr sse4_1_avx2 [(V16QI "sse4_1") (V32QI "avx2") (V64QI "avx512bw") @@ -21092,10 +21088,10 @@ (set_attr "mode" "")]) (define_insn "_palignr" - [(set (match_operand:SSESCALARMODE 0 "register_operand" "=x,") - (unspec:SSESCALARMODE - [(match_operand:SSESCALARMODE 1 "register_operand" "0,") - (match_operand:SSESCALARMODE 2 "vector_operand" "xBm,m") + [(set (match_operand:VIMAX_AVX2_AVX512BW 0 "register_operand" "=x,") + (unspec:VIMAX_AVX2_AVX512BW + [(match_operand:VIMAX_AVX2_AVX512BW 1 "register_operand" "0,") + (match_operand:VIMAX_AVX2_AVX512BW 2 "vector_operand" "xBm,m") (match_operand:SI 3 "const_0_to_255_mul_8_operand")] UNSPEC_PALIGNR))] "TARGET_SSSE3" @@ -21146,6 +21142,23 @@ [(set (match_dup 0) (lshiftrt:V1TI (match_dup 0) (match_dup 3)))] { + if (operands[3] == const0_rtx) + { + if (!rtx_equal_p (operands[0], operands[2])) + emit_move_insn (operands[0], operands[2]); + else + emit_note (NOTE_INSN_DELETED); + DONE; + } + else if (INTVAL (operands[3]) == 64) + { + if (!rtx_equal_p (operands[0], operands[1])) + emit_move_insn (operands[0], operands[1]); + else + emit_note (NOTE_INSN_DELETED); + DONE; + } + /* Emulate MMX palignrdi with SSE psrldq. */ rtx op0 = lowpart_subreg (V2DImode, operands[0], GET_MODE (operands[0])); @@ -21175,6 +21188,24 @@ (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) (set_attr "mode" "DI,TI,TI")]) +(define_split + [(set (match_operand:DI 0 "register_operand") + (unspec:DI [(match_operand:DI 1 "register_operand") + (match_operand:DI 2 "register_mmxmem_operand") + (const_int 0)] + UNSPEC_PALIGNR))] + "" + [(set (match_dup 0) (match_dup 2))]) + +(define_split + [(set (match_operand:DI 0 "register_operand") + (unspec:DI [(match_operand:DI 1 "register_operand") + (match_operand:DI 2 "register_mmxmem_operand") + (const_int 64)] + UNSPEC_PALIGNR))] + "" + [(set (match_dup 0) (match_dup 1))]) + ;; Mode iterator to handle singularity w/ absence of V2DI and V4DI ;; modes for abs instruction on pre AVX-512 targets. (define_mode_iterator VI1248_AVX512VL_AVX512BW diff --git a/gcc/testsuite/gcc.target/i386/ssse3-palignr-2.c b/gcc/testsuite/gcc.target/i386/ssse3-palignr-2.c new file mode 100644 index 0000000..791222d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/ssse3-palignr-2.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mssse3" } */ + +typedef long long __attribute__ ((__vector_size__ (8))) T; + +T x; +T y; +T z; + +void foo() +{ + z = __builtin_ia32_palignr (x, y, 0); +} + +void bar() +{ + z = __builtin_ia32_palignr (x, y, 64); +} +/* { dg-final { scan-assembler-not "punpcklqdq" } } */ +/* { dg-final { scan-assembler-not "pshufd" } } */ +/* { dg-final { scan-assembler-not "psrldq" } } */