From patchwork Fri Nov  4 00:04:32 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Hongyu Wang <hongyu.wang@intel.com>
X-Patchwork-Id: 59871
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id ABE943858C2D
	for <patchwork@sourceware.org>; Fri,  4 Nov 2022 00:07:09 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ABE943858C2D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1667520429;
	bh=5k+QUm8GCx7c7pVIunHmdB+16+Uk0/IbHQ9pzJgqOIE=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=Y3Q+RKWAtvoG9ir+6ybSnDI46Z43jgonhb9AEVBZziJY/efm5e0ufqg31vHIC1j8Q
	 uVUO4+sSUseCrD8IwfCrva9FtfTsnalVKZq3aqmiqJ4Yx5ufyIncXP+8ct7U3QCUbO
	 BKu6YcuFGjfxh+03KcEX/VtkTx4etE/7jJdZJWRw=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by sourceware.org (Postfix) with ESMTPS id 2F2813858D3C
 for <gcc-patches@gcc.gnu.org>; Fri,  4 Nov 2022 00:06:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2F2813858D3C
X-IronPort-AV: E=McAfee;i="6500,9779,10520"; a="309830879"
X-IronPort-AV: E=Sophos;i="5.96,135,1665471600"; d="scan'208";a="309830879"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Nov 2022 17:06:35 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6500,9779,10520"; a="703881785"
X-IronPort-AV: E=Sophos;i="5.96,135,1665471600"; d="scan'208";a="703881785"
Received: from shvmail03.sh.intel.com ([10.239.245.20])
 by fmsmga004.fm.intel.com with ESMTP; 03 Nov 2022 17:06:33 -0700
Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com
 [10.239.240.127])
 by shvmail03.sh.intel.com (Postfix) with ESMTP id 0B0451005705;
 Fri,  4 Nov 2022 08:06:33 +0800 (CST)
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Optimize VEC_PERM_EXPR with same permutation index and
 operation [PR98167]
Date: Fri,  4 Nov 2022 08:04:32 +0800
Message-Id: <20221104000432.15254-1-hongyu.wang@intel.com>
X-Mailer: git-send-email 2.18.1
X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT,
 FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Hongyu Wang via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Hongyu Wang <hongyu.wang@intel.com>
Reply-To: Hongyu Wang <hongyu.wang@intel.com>
Cc: hongtao.liu@intel.com
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi,

This is a follow-up patch for PR98167

The sequence
     c1 = VEC_PERM_EXPR (a, a, mask)
     c2 = VEC_PERM_EXPR (b, b, mask)
     c3 = c1 op c2
can be optimized to
     c = a op b
     c3 = VEC_PERM_EXPR (c, c, mask)
for all integer vector operation, and float operation with
full permutation.

Bootstrapped & regrtested on x86_64-pc-linux-gnu.

Ok for trunk?

gcc/ChangeLog:

	PR target/98167
	* match.pd: New perm + vector op patterns for int and fp vector.

gcc/testsuite/ChangeLog:

	PR target/98167
	* gcc.target/i386/pr98167.c: New test.
---
 gcc/match.pd                            | 49 +++++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr98167.c | 44 ++++++++++++++++++++++
 2 files changed, 93 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98167.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..b85ad34f609 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8189,3 +8189,52 @@ and,
  (bit_and (negate @0) integer_onep@1)
  (if (!TYPE_OVERFLOW_SANITIZED (type))
   (bit_and @0 @1)))
+
+/* Optimize
+   c1 = VEC_PERM_EXPR (a, a, mask)
+   c2 = VEC_PERM_EXPR (b, b, mask)
+   c3 = c1 op c2
+   -->
+   c = a op b
+   c3 = VEC_PERM_EXPR (c, c, mask)
+   For all integer non-div operations.  */
+(for op (plus minus mult bit_and bit_ior bit_xor
+	 lshift rshift)
+ (simplify
+  (op (vec_perm @0 @0 VECTOR_CST@2) (vec_perm @1 @1 VECTOR_CST@2))
+    (if (VECTOR_INTEGER_TYPE_P (type))
+     (vec_perm (op @0 @1) (op @0 @1) @2))))
+
+/* Similar for float arithmetic when permutation constant covers
+   all vector elements.  */
+(for op (plus minus mult)
+ (simplify
+  (op (vec_perm @0 @0 VECTOR_CST@2) (vec_perm @1 @1 VECTOR_CST@2))
+    (if (VECTOR_FLOAT_TYPE_P (type))
+     (with
+      {
+	tree perm_cst = @2;
+	vec_perm_builder builder;
+	bool full_perm_p = false;
+	if (tree_to_vec_perm_builder (&builder, perm_cst))
+	  {
+	    /* Create a vec_perm_indices for the integer vector.  */
+	    int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+	    vec_perm_indices sel (builder, 1, nelts);
+
+	    /* Check if perm indices covers all vector elements.  */
+	    int count = 0, i, j;
+	    for (i = 0; i < nelts; i++)
+	      for (j = 0; j < nelts; j++)
+		{
+		  if (sel[j].to_constant () == i)
+		    {
+		      count++;
+		      break;
+		    }
+		}
+	    full_perm_p = count == nelts;
+	  }
+       }
+       (if (full_perm_p)
+	(vec_perm (op @0 @1) (op @0 @1) @2))))))
diff --git a/gcc/testsuite/gcc.target/i386/pr98167.c b/gcc/testsuite/gcc.target/i386/pr98167.c
new file mode 100644
index 00000000000..40e0ac11332
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98167.c
@@ -0,0 +1,44 @@
+/* PR target/98167 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+/* { dg-final { scan-assembler-times "vpshufd\t" 8 } } */
+/* { dg-final { scan-assembler-times "vpermilps\t" 3 } } */
+
+#define VEC_PERM_4 \
+  2, 3, 1, 0
+#define VEC_PERM_8 \
+  4, 5, 6, 7, 3, 2, 1, 0
+#define VEC_PERM_16 \
+  8, 9, 10, 11, 12, 13, 14, 15, 7, 6, 5, 4, 3, 2, 1, 0
+
+#define TYPE_PERM_OP(type, size, op, name) \
+  typedef type v##size##s##type __attribute__ ((vector_size(4*size))); \
+  v##size##s##type type##foo##size##i_##name (v##size##s##type a, \
+					      v##size##s##type b) \
+  { \
+    v##size##s##type a1 = __builtin_shufflevector (a, a, \
+						   VEC_PERM_##size); \
+    v##size##s##type b1 = __builtin_shufflevector (b, b, \
+						   VEC_PERM_##size); \
+    return a1 op b1; \
+  }
+
+#define INT_PERMS(op, name) \
+  TYPE_PERM_OP (int, 4, op, name) \
+
+#define FP_PERMS(op, name) \
+  TYPE_PERM_OP (float, 4, op, name) \
+
+INT_PERMS (+, add)
+INT_PERMS (-, sub)
+INT_PERMS (*, mul)
+INT_PERMS (|, ior)
+INT_PERMS (^, xor)
+INT_PERMS (&, and)
+INT_PERMS (<<, shl)
+INT_PERMS (>>, shr)
+FP_PERMS (+, add)
+FP_PERMS (-, sub)
+FP_PERMS (*, mul)
+