From patchwork Tue Jan 4 18:46:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 49536 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7AAFF3858428 for ; Tue, 4 Jan 2022 18:47:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7AAFF3858428 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1641322051; bh=Oii/3X9Q39uvWtW34xkG729JPEvtWVFy2hLNvW8ROjA=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=JCcJu4jxPDkgIYc22z+dcs37MZmgLYAsCfNozoEyDnWHAT63g2ybrwte+4+3/UGB6 h6/FTgga8CxuHz0ab4/tSRy6XTsL6A1D+k6VYrfswvDVChPDyGITHHl/nZfU0jhty7 MhhquSTv2Z6nheIzyDgMTgNJDYbbO/Xz+KbKkxrA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by sourceware.org (Postfix) with ESMTPS id 5312E3858C2C for ; Tue, 4 Jan 2022 18:47:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5312E3858C2C Received: by mail-qv1-xf2b.google.com with SMTP id kc16so35274482qvb.3 for ; Tue, 04 Jan 2022 10:47:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Oii/3X9Q39uvWtW34xkG729JPEvtWVFy2hLNvW8ROjA=; b=tDAET/cqrA3oQBjwPTAWwxDy4VSJQp3Y809u7ccQs37YmdCQLutOZyEYPnC6C8hH4W A500H6vOCsWWkgaV2CtwP5y0TaAJq29GeN1tNa7DeA3xibH/vSH5nQzM6B6dxv7wKpOj dH1ujfW2+rxziUjQv62iA+Wu1EQ/UIRcx46LLUO6BMR3QA+SOdK87tOkHbEHutQMNS7q XoVGy+DXRmvohv25ufQ3mmRuE8moDkaYmGhxaFvYovLRn1aq3vJkRe//y9TDIGXTJyi0 kvq1uRyKxCEqX4Eq4a3F9Xn2MukV8ZJwHDljc2JYyrqp99eWHFCtDobnK2SuMJwlPkRl Y+nw== X-Gm-Message-State: AOAM5314IE7kt5Dxw1cRO/Hy1KzPNB/8oVh+xPPB54PkEe6tRK3/qWZo CiGSePSSk6GdX2hxwW5Of0mIBiFuP+TecTC+38k84vEtxnSsxw== X-Google-Smtp-Source: ABdhPJyBtbxZfxmr7f4OF8Dbqo1UeSOviYK/aQCaZZCYSByRhXsNhVALsLD8oBRqF0b7QPJnQYMHgchRk7hQVA6zmLE= X-Received: by 2002:a05:6214:301e:: with SMTP id ke30mr47020219qvb.2.1641322021602; Tue, 04 Jan 2022 10:47:01 -0800 (PST) MIME-Version: 1.0 Date: Tue, 4 Jan 2022 19:46:50 +0100 Message-ID: Subject: [PATCH] i386: Introduce V2QImode vectorized logic [PR103861] To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Add V2QImode logic operations with SSE and GP registers and split them to V4QImode SSE instructions or SImode GP instructions. The patch also fixes PR target/103900. 2022-01-04 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/mmx.md (one_cmplv2qi3): New insn pattern. (one_cmplv2qi3 splitters): New post-reload splitters. (*andnotv2qi3): New insn pattern. (andnotv2qi3 splitters): New post-reload splitters. (v2qi3): New insn pattern. (v2qi3 splitters): New post-reload splitters. gcc/testsuite/ChangeLog: PR target/103861 * gcc.target/i386/warn-vect-op-2.c: Adjust warnings. * gcc.target/i386/pr103900.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 5b33d3cfc1c..fc8ec5e4d49 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -2745,6 +2745,45 @@ "TARGET_SSE2" "operands[2] = force_reg (mode, CONSTM1_RTX (mode));") +(define_insn "one_cmplv2qi2" + [(set (match_operand:V2QI 0 "register_operand" "=r,&x,&v") + (not:V2QI + (match_operand:V2QI 1 "register_operand" "0,x,v")))] + "" + "#" + [(set_attr "isa" "*,sse2,avx512vl") + (set_attr "type" "negnot,sselog,sselog") + (set_attr "mode" "SI,TI,TI")]) + +(define_split + [(set (match_operand:V2QI 0 "general_reg_operand") + (not:V2QI + (match_operand:V2QI 1 "general_reg_operand")))] + "reload_completed" + [(set (match_dup 0) + (not:SI (match_dup 1)))] +{ + operands[1] = gen_lowpart (SImode, operands[1]); + operands[0] = gen_lowpart (SImode, operands[0]); +}) + +(define_split + [(set (match_operand:V2QI 0 "sse_reg_operand") + (not:V2QI + (match_operand:V2QI 1 "sse_reg_operand")))] + "TARGET_SSE2 && reload_completed" + [(set (match_dup 0) + (xor:V4QI + (match_dup 0) (match_dup 1)))] +{ + emit_insn + (gen_rtx_SET (gen_rtx_REG (V16QImode, REGNO (operands[0])), + CONSTM1_RTX (V16QImode))); + + operands[1] = gen_lowpart (V4QImode, operands[1]); + operands[0] = gen_lowpart (V4QImode, operands[0]); +}) + (define_insn "mmx_andnot3" [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,x,v") (and:MMXMODEI @@ -2775,6 +2814,69 @@ (set_attr "type" "sselog") (set_attr "mode" "TI")]) +(define_insn "*andnotv2qi3" + [(set (match_operand:V2QI 0 "register_operand" "=&r,r,x,x,v") + (and:V2QI + (not:V2QI (match_operand:V2QI 1 "register_operand" "0,r,0,x,v")) + (match_operand:V2QI 2 "register_operand" "r,r,x,x,v"))) + (clobber (reg:CC FLAGS_REG))] + "" + "#" + [(set_attr "isa" "*,bmi,sse2_noavx,avx,avx512vl") + (set_attr "type" "alu,bitmanip,sselog,sselog,sselog") + (set_attr "mode" "SI,SI,TI,TI,TI")]) + +(define_split + [(set (match_operand:V2QI 0 "general_reg_operand") + (and:V2QI + (not:V2QI (match_operand:V2QI 1 "general_reg_operand")) + (match_operand:V2QI 2 "general_reg_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI && reload_completed" + [(parallel + [(set (match_dup 0) + (and:SI (not:SI (match_dup 1)) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[2] = gen_lowpart (SImode, operands[2]); + operands[1] = gen_lowpart (SImode, operands[1]); + operands[0] = gen_lowpart (SImode, operands[0]); +}) + +(define_split + [(set (match_operand:V2QI 0 "general_reg_operand") + (and:V2QI + (not:V2QI (match_operand:V2QI 1 "general_reg_operand")) + (match_operand:V2QI 2 "general_reg_operand"))) + (clobber (reg:CC FLAGS_REG))] + "!TARGET_BMI && reload_completed" + [(set (match_dup 0) + (not:SI (match_dup 1))) + (parallel + [(set (match_dup 0) + (and:SI (match_dup 0) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[2] = gen_lowpart (SImode, operands[2]); + operands[1] = gen_lowpart (SImode, operands[1]); + operands[0] = gen_lowpart (SImode, operands[0]); +}) + +(define_split + [(set (match_operand:V2QI 0 "sse_reg_operand") + (and:V2QI + (not:V2QI (match_operand:V2QI 1 "sse_reg_operand")) + (match_operand:V2QI 2 "sse_reg_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_SSE2 && reload_completed" + [(set (match_dup 0) + (and:V4QI (not:V4QI (match_dup 1)) (match_dup 2)))] +{ + operands[2] = gen_lowpart (V4QImode, operands[2]); + operands[1] = gen_lowpart (V4QImode, operands[1]); + operands[0] = gen_lowpart (V4QImode, operands[0]); +}) + (define_expand "mmx_3" [(set (match_operand:MMXMODEI 0 "register_operand") (any_logic:MMXMODEI @@ -2821,6 +2923,50 @@ (set_attr "type" "sselog") (set_attr "mode" "TI")]) +(define_insn "v2qi3" + [(set (match_operand:V2QI 0 "register_operand" "=r,x,x,v") + (any_logic:V2QI + (match_operand:V2QI 1 "register_operand" "%0,0,x,v") + (match_operand:V2QI 2 "register_operand" "r,x,x,v"))) + (clobber (reg:CC FLAGS_REG))] + "" + "#" + [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") + (set_attr "type" "alu,sselog,sselog,sselog") + (set_attr "mode" "SI,TI,TI,TI")]) + +(define_split + [(set (match_operand:V2QI 0 "general_reg_operand") + (any_logic:V2QI + (match_operand:V2QI 1 "general_reg_operand") + (match_operand:V2QI 2 "general_reg_operand"))) + (clobber (reg:CC FLAGS_REG))] + "reload_completed" + [(parallel + [(set (match_dup 0) + (any_logic:SI (match_dup 1) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[2] = gen_lowpart (SImode, operands[2]); + operands[1] = gen_lowpart (SImode, operands[1]); + operands[0] = gen_lowpart (SImode, operands[0]); +}) + +(define_split + [(set (match_operand:V2QI 0 "sse_reg_operand") + (any_logic:V2QI + (match_operand:V2QI 1 "sse_reg_operand") + (match_operand:V2QI 2 "sse_reg_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_SSE2 && reload_completed" + [(set (match_dup 0) + (any_logic:V4QI (match_dup 1) (match_dup 2)))] +{ + operands[2] = gen_lowpart (V4QImode, operands[2]); + operands[1] = gen_lowpart (V4QImode, operands[1]); + operands[0] = gen_lowpart (V4QImode, operands[0]); +}) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel integral element swizzling diff --git a/gcc/testsuite/gcc.target/i386/pr103900.c b/gcc/testsuite/gcc.target/i386/pr103900.c new file mode 100644 index 00000000000..8793b492a05 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103900.c @@ -0,0 +1,25 @@ +/* PR target/103900 */ +/* { dg-do compile } */ +/* { dg-options "-O -fno-tree-dce -fno-tree-dse" } */ + +typedef unsigned char __attribute__((__vector_size__(2))) T; +typedef unsigned char __attribute__((__vector_size__(32))) U; +typedef int __attribute__((__vector_size__(64))) V; +typedef int __attribute__((__vector_size__(32))) W; +T foo0_v128u8_0; +U foo0_v256u8_0; +T foo0_v16u16_0; +int foo0_v128u64_0, foo0_v512u64_0; + +void +foo0() { + V v512u128_0; + T v16u8_0; + foo0_v128u64_0 += (short)v16u8_0; + T v16u8_1 = ~__builtin_shufflevector(foo0_v128u8_0, foo0_v256u8_0, 0, 5); + W v256u128_1; + V v512u8_r = + foo0_v512u64_0 + v512u128_0; + (union {U b;}){}.b + (U)v256u128_1; + T v16u8_r = v16u8_0 + v16u8_1 + foo0_v16u16_0; +} diff --git a/gcc/testsuite/gcc.target/i386/warn-vect-op-2.c b/gcc/testsuite/gcc.target/i386/warn-vect-op-2.c index 5e378b6bd04..4560f7070bb 100644 --- a/gcc/testsuite/gcc.target/i386/warn-vect-op-2.c +++ b/gcc/testsuite/gcc.target/i386/warn-vect-op-2.c @@ -14,7 +14,7 @@ int main (int argc, char *argv[]) v0 + v1, /* { dg-warning "expanded piecewise" } */ v0 - v1, /* { dg-warning "expanded piecewise" } */ v0 > v1, /* { dg-warning "expanded piecewise" } */ - v0 & v1, /* { dg-warning "expanded in parallel" } */ + v0 & v1, /* { dg-warning "expanded piecewise" } */ __builtin_shuffle (v0, v1), /* { dg-warning "expanded piecewise" } */ __builtin_shuffle (v0, v1, v1) /* { dg-warning "expanded piecewise" } */ };