From patchwork Tue Jan 18 06:26:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 50134 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D820B3858006 for ; Tue, 18 Jan 2022 06:27:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D820B3858006 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1642487235; bh=svac3ZCUSd+on2gnmu799ODhSOeOanZYLe6TJGK8Ftg=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ubEaUfApWrtT2Wfx2/GSXSOAVuZSEncoVrIISjS9+v9gesz5JtBVQG+hIl/Eda0RY lRV2P7GcW0RrSQWgfljSIVvD7lHiDYVzGUNmRfnifISqvfmnj7k1Y+3D36k9jwD8Cv QX2wGZv/AXNNezNUuWZywgOWyTj/ArNjPmZRFfkw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id BA6303858D3C for ; Tue, 18 Jan 2022 06:26:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA6303858D3C X-IronPort-AV: E=McAfee;i="6200,9189,10230"; a="331099522" X-IronPort-AV: E=Sophos;i="5.88,296,1635231600"; d="scan'208";a="331099522" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2022 22:26:44 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,296,1635231600"; d="scan'208";a="622038868" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga002.fm.intel.com with ESMTP; 17 Jan 2022 22:26:44 -0800 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 20I6QgID027209; Mon, 17 Jan 2022 22:26:43 -0800 To: gcc-patches@gcc.gnu.org Subject: [PATCH] Enhance vec_pack_trunc for integral mode mask. Date: Tue, 18 Jan 2022 14:26:42 +0800 Message-Id: <20220118062642.48846-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" For testcase in PR, the patch supports QI:4 -> HI:16 pack with multi steps(first pack QI:4 -> QI:8 through vec_pack_sbool_trunc_qi, then pack QI:8 -> HI:16 through vec_pack_trunc_hi). Similar for QI:2 -> HI:16 which is test4 in mask-pack-prefer-128.c. Bootstrapped both with and w/o '--with-arch=native --with-cpu=native' on CLX. Regtested for x86_64-pc-linux-gnu{-m32,} and x86_64-pc-linux-gnu{-m32\ -march=cascadelake,\ -march=cascadelake} on CLX. This patch can partially fix the regression in PR since it enable vectorization for epilogue. I'm also working on another patch to prevent mask pack for testcase in PR, and there will be a separate patch. Ok for trunk? gcc/ChangeLog: PR target/103771 * tree-vect-stmts.c (supportable_narrowing_operation): Enhance integral mode mask pack by multi steps which takes vec_pack_sbool_trunc_optab as start when elements number is less than BITS_PER_UNITS. gcc/testsuite/ChangeLog: * gcc.target/i386/mask-pack-prefer128.c: New test. * gcc.target/i386/mask-pack-prefer128.c: New test. * gcc.target/i386/pr103771.c: New test. --- .../gcc.target/i386/mask-pack-prefer128.c | 8 ++++++++ .../gcc.target/i386/mask-pack-prefer256.c | 8 ++++++++ gcc/testsuite/gcc.target/i386/pr103771.c | 18 ++++++++++++++++++ gcc/tree-vect-stmts.c | 11 ++++++++--- 4 files changed, 42 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/mask-pack-prefer128.c create mode 100644 gcc/testsuite/gcc.target/i386/mask-pack-prefer256.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103771.c diff --git a/gcc/testsuite/gcc.target/i386/mask-pack-prefer128.c b/gcc/testsuite/gcc.target/i386/mask-pack-prefer128.c new file mode 100644 index 00000000000..c9ea37c7ed3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mask-pack-prefer128.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-march=skylake-avx512 -O3 -fopenmp-simd -fdump-tree-vect-details -mprefer-vector-width=128" } */ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 10 "vect" } } */ +/* { dg-final { scan-assembler-not "maskmov" } } */ + +#include "mask-pack.c" diff --git a/gcc/testsuite/gcc.target/i386/mask-pack-prefer256.c b/gcc/testsuite/gcc.target/i386/mask-pack-prefer256.c new file mode 100644 index 00000000000..841f51b4041 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mask-pack-prefer256.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-march=skylake-avx512 -O3 -fopenmp-simd -fdump-tree-vect-details -mprefer-vector-width=256" } */ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 10 "vect" } } */ +/* { dg-final { scan-assembler-not "maskmov" } } */ + +#include "mask-pack.c" diff --git a/gcc/testsuite/gcc.target/i386/pr103771.c b/gcc/testsuite/gcc.target/i386/pr103771.c new file mode 100644 index 00000000000..a1a9952b6a8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103771.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-march=cascadelake -O3 -fdump-tree-vect-details -mprefer-vector-width=128" } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + +typedef unsigned char uint8_t; + +static uint8_t x264_clip_uint8 (int x) +{ + return x & (~255) ? (-x) >> 31 : x; +} + +void +mc_weight (uint8_t* __restrict dst, uint8_t* __restrict src, + int i_width,int i_scale) +{ + for(int x = 0; x < i_width; x++) + dst[x] = x264_clip_uint8 (src[x] * i_scale); +} diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index f2625a2ff40..4baf5e36127 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -12111,6 +12111,7 @@ supportable_narrowing_operation (enum tree_code code, tree intermediate_type, prev_type; machine_mode intermediate_mode, prev_mode; int i; + unsigned HOST_WIDE_INT n_elts; bool uns; *multi_step_cvt = 0; @@ -12120,8 +12121,10 @@ supportable_narrowing_operation (enum tree_code code, c1 = VEC_PACK_TRUNC_EXPR; if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype) && VECTOR_BOOLEAN_TYPE_P (vectype) - && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype) - && SCALAR_INT_MODE_P (TYPE_MODE (vectype))) + && SCALAR_INT_MODE_P (TYPE_MODE (vectype)) + && (TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype) + || (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&n_elts) + && n_elts < BITS_PER_UNIT))) optab1 = vec_pack_sbool_trunc_optab; else optab1 = optab_for_tree_code (c1, vectype, optab_default); @@ -12213,7 +12216,9 @@ supportable_narrowing_operation (enum tree_code code, if (VECTOR_BOOLEAN_TYPE_P (intermediate_type) && VECTOR_BOOLEAN_TYPE_P (prev_type) && intermediate_mode == prev_mode - && SCALAR_INT_MODE_P (prev_mode)) + && SCALAR_INT_MODE_P (prev_mode) + && (TYPE_VECTOR_SUBPARTS (intermediate_type).is_constant (&n_elts) + && n_elts < BITS_PER_UNIT)) interm_optab = vec_pack_sbool_trunc_optab; else interm_optab