From patchwork Mon Nov 29 18:14:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 48260 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 126E8385AC2C for ; Mon, 29 Nov 2021 18:14:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id D5B29385840B for ; Mon, 29 Nov 2021 18:14:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D5B29385840B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=4vciGAfgEoKuHlA1mn8JWsa3piokKJl8Po4XWN7999g=; b=hqhvq68s/P7yfIQ/b2phE8cZs8 M2LFPtey8YRrRmdN4I/OXYzqNpEix56bxdmL2VUb7pdPZXr9mQwlTJ9HU2U62X/GtSpdBR8IBppe6 i0w/1aFUKTIkIiQ3rWDnyeVsjzOqjHgfayDbQTj9mvloW2qesdXSGN2bEULvYNlopk/57Hu1aDRar QCyr1i3LkXMxrbfUE21Pls0/DHJvxVLvqR9VRUbosfqsz5tcEiDXwgVCsbIQ5Wi6uA1k+f92ooZpd ASnJ4Cq68nh6euV/zeIO/5QjbJXXkdLIM4k+QRiP8m/IrAvJl14z34ULx5HikB9H0zV6TYcnMy1DC DH7wlyzA==; Received: from [185.62.158.67] (port=60520 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mrlAJ-0000Pf-9B; Mon, 29 Nov 2021 13:14:11 -0500 From: "Roger Sayle" To: "'Uros Bizjak'" Subject: [PATCH take #2] x86_64: PR target/100711: Splitters for pandn Date: Mon, 29 Nov 2021 18:14:08 -0000 Message-ID: <01a001d7e54c$e86e7ff0$b94b7fd0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdflSt9MvuebHRHaRCux0KjXj6EhXQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 'GCC Patches' Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi Uros, Many thanks for the review. Here's a revised version of the patch incorporating all of your suggestions. This has been (re)tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2021-11-29 Roger Sayle Uroš Bizjak gcc/ChangeLog PR target/100711 * config/i386/sse.md (define_split): New splitters to simplify not;vec_duplicate;and as vec_duplicate;andn. gcc/testsuite/ChangeLog PR target/100711 * gcc.target/i386/pr100711-1.c: New test case. * gcc.target/i386/pr100711-2.c: New test case. Thanks in advance, Roger --- > -----Original Message----- > From: Uros Bizjak > Sent: 28 November 2021 19:45 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [PATCH] x86_64: PR target/100711: Splitters for pandn > > On Sun, Nov 28, 2021 at 2:25 PM Roger Sayle > wrote: > > > > > > This patch addresses PR target/100711 by introducing define_split > > patterns so that not/broadcast/pand may be simplified (by combine) to > > broadcast/pandn. This introduces two splitters one for optimizing > > pandn on TARGET_SSE for V4SI and V2DI, and another for vpandn on > > TARGET_AVX2 for V16QI, V8HI, V32QI, V16HI and V8SI. Each splitter has > > its own new testcase. > > > > I've also confirmed that not/broadcast/pandn is already getting > > simplified to broadcast/pand by the middle-end optimizers. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check with no new failures. Ok for mainline? > > > > > > 2021-11-28 Roger Sayle > > > > gcc/ChangeLog > > PR target/100711 > > * config/i386/sse.md (define_split): New splitters to simplify > > not;vec_duplicate;and as vec_duplicate;andn. > > > > gcc/testsuite/ChangeLog > > PR target/100711 > > * gcc.target/i386/pr100711-1.c: New test case. > > * gcc.target/i386/pr100711-2.c: New test case. > > > +;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; > +vpandn (define_split > + [(set (match_operand:VI48_128 0 "register_operand") > + (and:VI48_128 > + (vec_duplicate:VI48_128 > + (not: > + (match_operand: 1 "register_operand"))) > + (match_operand:VI48_128 2 "register_operand")))] > > You can use "vector_operand" here, the resulting PANDN can handle these. > > + "TARGET_SSE && can_create_pseudo_p ()" > > This is a combine splitter, so can_create_pseudo_p () is not needed, because it > runs only during the combine phase. > > FYI, the combine splitter is somehow different than normal splitter, the > important part from the documentation is, that the insn is *not* matched by > some define_insn pattern, and the split results in exactly two patterns: > > The insn combiner phase also splits putative insns. If three insns are merged into > one insn with a complex expression that cannot be matched by some > 'define_insn' pattern, the combiner phase attempts to split the complex pattern > into two insns that are recognized. Usually it can break the complex pattern into > two patterns by splitting out some subexpression. However, in some other > cases, such as performing an addition of a large constant in two insns on a RISC > machine, the way to split the addition into two insns is machine-dependent. > > + [(set (match_dup 3) > + (vec_duplicate:VI48_128 (match_dup 1))) > + (set (match_dup 0) > + (and:VI48_128 (not:VI48_128 (match_dup 3)) > + (match_dup 2)))] > + "operands[3] = gen_reg_rtx (mode);") > + > +;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; > +vpandn (define_split > + [(set (match_operand:VI124_AVX2 0 "register_operand") > + (and:VI124_AVX2 > + (vec_duplicate:VI124_AVX2 > + (not: > + (match_operand: 1 "register_operand"))) > + (match_operand:VI124_AVX2 2 "register_operand")))] > + "TARGET_AVX2 && can_create_pseudo_p ()" > + [(set (match_dup 3) > + (vec_duplicate:VI124_AVX2 (match_dup 1))) > + (set (match_dup 0) > + (and:VI124_AVX2 (not:VI124_AVX2 (match_dup 3)) (match_dup 2)))] > + "operands[3] = gen_reg_rtx (mode);") > > Same here as above. > > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > > Please add -msse2 here, 32bit targets do not enable SSE by default, and please > check if they handle DImode long long at all. > > Also, please run tests for x86_64 and i386 targets. The testsuite should be ran > with: > > make -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}" > > (Eventually, you can use check-gcc instead of check and/or add i386.exp after -- > target-board.) > > Uros. > > +typedef int v4si __attribute__((vector_size (16))); typedef long long > +v2di __attribute__((vector_size (16))); > + > +v4si foo (int a, v4si b) > +{ > + return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b; } > + > +v2di bar (long long a, v2di b) > +{ > + return (__extension__ (v2di) {~a, ~a}) & b; } > > > > > Thanks in advance, > > Roger > > -- diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b109c2a..b791900 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -16323,6 +16323,38 @@ ] (const_string "")))]) +;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn +(define_split + [(set (match_operand:VI48_128 0 "register_operand") + (and:VI48_128 + (vec_duplicate:VI48_128 + (not: + (match_operand: 1 "register_operand"))) + (match_operand:VI48_128 2 "vector_operand")))] + "TARGET_SSE" + [(set (match_dup 3) + (vec_duplicate:VI48_128 (match_dup 1))) + (set (match_dup 0) + (and:VI48_128 (not:VI48_128 (match_dup 3)) + (match_dup 2)))] + "operands[3] = gen_reg_rtx (mode);") + +;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn +(define_split + [(set (match_operand:VI124_AVX2 0 "register_operand") + (and:VI124_AVX2 + (vec_duplicate:VI124_AVX2 + (not: + (match_operand: 1 "register_operand"))) + (match_operand:VI124_AVX2 2 "vector_operand")))] + "TARGET_AVX2" + [(set (match_dup 3) + (vec_duplicate:VI124_AVX2 (match_dup 1))) + (set (match_dup 0) + (and:VI124_AVX2 (not:VI124_AVX2 (match_dup 3)) + (match_dup 2)))] + "operands[3] = gen_reg_rtx (mode);") + (define_insn "*andnot3_mask" [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") (vec_merge:VI48_AVX512VL diff --git a/gcc/testsuite/gcc.target/i386/pr100711-1.c b/gcc/testsuite/gcc.target/i386/pr100711-1.c new file mode 100644 index 0000000..c6928f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100711-1.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ + +typedef int v4si __attribute__((vector_size (16))); +typedef long long v2di __attribute__((vector_size (16))); + +v4si foo (int a, v4si b) +{ + return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b; +} + +v2di bar (long long a, v2di b) +{ + return (__extension__ (v2di) {~a, ~a}) & b; +} + +/* { dg-final { scan-assembler-times "pandn" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100711-2.c b/gcc/testsuite/gcc.target/i386/pr100711-2.c new file mode 100644 index 0000000..ccaf168 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100711-2.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx2" } */ + +typedef char v16qi __attribute__ ((vector_size (16))); +typedef short v8hi __attribute__ ((vector_size (16))); +typedef int v4si __attribute__ ((vector_size (16))); + +typedef char v32qi __attribute__ ((vector_size (32))); +typedef short v16hi __attribute__ ((vector_size (32))); +typedef int v8si __attribute__ ((vector_size (32))); + +v16qi foo_v16qi (char a, v16qi b) +{ + return (__extension__ (v16qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b; +} + +v8hi foo_v8hi (short a, v8hi b) +{ + return (__extension__ (v8hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,}) & b; +} + +v4si foo_v4si (int a, v4si b) +{ + return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b; +} + +v32qi foo_v32qi (char a, v32qi b) +{ + return (__extension__ (v32qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b; +} + +v16hi foo_v16hi (short a, v16hi b) +{ + return (__extension__ (v16hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b; +} + +v8si foo_v8si (int a, v8si b) +{ + return (__extension__ (v8si) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,}) & b; +} + +/* { dg-final { scan-assembler-times "vpandn" 6 } } */