From patchwork Wed Jan 5 22:26:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 49610 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C12C13857C66 for ; Wed, 5 Jan 2022 22:27:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C12C13857C66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1641421650; bh=0ky0X86txughtIbo4iQkxHZ8facKPaFuT2zPVn+mcVk=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=CUrynrluPRIXEUIqBGT6/L5CbezphjfFOfYse7rSgwc04YwcvycoIFD9pleN5TC/T a8+f3+8QTFmFfGh5lznEd9elz1EVdj2XUooq4LV/MyAUr5JTlnYNnDQiQ8vxxhbySx JdAn7Q1ebEQIUJp/93xEdIhhIyo1q+2sSlkupEWs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id B8ECB3858439 for ; Wed, 5 Jan 2022 22:26:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B8ECB3858439 Received: by mail-qt1-x833.google.com with SMTP id p19so457572qtw.12 for ; Wed, 05 Jan 2022 14:26:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=0ky0X86txughtIbo4iQkxHZ8facKPaFuT2zPVn+mcVk=; b=7FP426Pac4rvbHThUdCv6u2gzX1zNIjeBlrswKznjpTq63UH85omtN273xE22UzvOe ucawE85ndSLvPFmO17R1OADxztGUEFqAYQI9oIwJHU0uN1O0U6oIb1Zn/v+LxxBi34V2 a5jKhP74bkKyC2v0WlI9GP67LryY7u762Lr2shzQHIj4kxgFqOggtPc9HltzEPOJ1DYL 8tli05KmEgJbyOwjJZRfNqBycwoDm/oKDZVJse2OFoXNQT85VM6H8GV9a8RKQIx27J35 qKGXSJVFMWjZny7wgCvKGcH1NzpTmsqGt7IZsejU/nRJq1NGAFgUS8x6cJClnTvUwM/1 nDhA== X-Gm-Message-State: AOAM530UCEAJKtPBL9lA2qT9bsgmB/DPSkb/XXoHv+NzOQ4/phXJT9yI UU3jGjuqRcmv8jrpM/Btbp24CO7o4NAhL7Tb9E0Of1/b5jyVXQ== X-Google-Smtp-Source: ABdhPJzbseb6+JaQMLrWStGT4ThvktDTk+Ci7NjN3e//UPyxnKeLkfEVjyuunExcOOKr0ikLtJnkjUNktuwjjYXh4Ko= X-Received: by 2002:a05:622a:413:: with SMTP id n19mr50075935qtx.69.1641421618085; Wed, 05 Jan 2022 14:26:58 -0800 (PST) MIME-Version: 1.0 Date: Wed, 5 Jan 2022 23:26:46 +0100 Message-ID: Subject: [PATCH] i386: Introduce V2QImode minmax, abs and uavgv2hi3_ceil [PR103861] To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Cc: Richard Biener Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Add V2QImode minmax, abs and uavxv2qi3_ceil operations with SSE registers. 2022-01-05 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/mmx.md (VI_16_32): New mode iterator. (VI1_16_32): Ditto. (mmxvecsize): Handle V2QI mode. (3): Rename from v4qi3. Use VI1_16_32 mode iterator. (3): Rename from v4qi3. Use VI1_16_32 mode iterator. (abs2): Use VI_16_32 mode iterator. (uavgv2qi3_ceil): New insn pattern. gcc/testsuite/ChangeLog: PR target/103861 * gcc.target/i386/pr103861-3.c: New test. * g++.dg/vect/slp-pr98855.cc (dg-final): Check that no vectorization using SLP was performed. I have changed scan-tree-dump patterns in g++.dg/vect/slp-pr98855.cc to check that no SLP vectorization was performed. The existing scan-tree-dump-times was too fragile, since the message was output for every vectorization mode. Another issue is missing vectorization for uavgv2qi3_ceil. I will open a PR for that. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 96d85a54e10..a409bb7c6c6 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -63,6 +63,12 @@ ;; 4-byte integer vector modes (define_mode_iterator VI_32 [V4QI V2HI]) +;; 4-byte and 2-byte integer vector modes +(define_mode_iterator VI_16_32 [V4QI V2QI V2HI]) + +;; 4-byte and 2-byte QImode vector modes +(define_mode_iterator VI1_16_32 [V4QI V2QI]) + ;; V2S* modes (define_mode_iterator V2FI [V2SF V2SI]) @@ -71,7 +77,8 @@ ;; Mapping from integer vector mode to mnemonic suffix (define_mode_attr mmxvecsize - [(V8QI "b") (V4QI "b") (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")]) + [(V8QI "b") (V4QI "b") (V2QI "b") + (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")]) (define_mode_attr mmxdoublemode [(V8QI "V8HI") (V4HI "V4SI")]) @@ -2140,11 +2147,11 @@ (match_operand:V4HI 2 "register_operand")))] "TARGET_MMX_WITH_SSE") -(define_insn "v4qi3" - [(set (match_operand:V4QI 0 "register_operand" "=Yr,*x,Yv") - (smaxmin:V4QI - (match_operand:V4QI 1 "register_operand" "%0,0,Yv") - (match_operand:V4QI 2 "register_operand" "Yr,*x,Yv")))] +(define_insn "3" + [(set (match_operand:VI1_16_32 0 "register_operand" "=Yr,*x,Yv") + (smaxmin:VI1_16_32 + (match_operand:VI1_16_32 1 "register_operand" "%0,0,Yv") + (match_operand:VI1_16_32 2 "register_operand" "Yr,*x,Yv")))] "TARGET_SSE4_1" "@ pb\t{%2, %0|%0, %2} @@ -2218,11 +2225,11 @@ (match_operand:V8QI 2 "register_operand")))] "TARGET_MMX_WITH_SSE") -(define_insn "v4qi3" - [(set (match_operand:V4QI 0 "register_operand" "=x,Yw") - (umaxmin:V4QI - (match_operand:V4QI 1 "register_operand" "%0,Yw") - (match_operand:V4QI 2 "register_operand" "x,Yw")))] +(define_insn "3" + [(set (match_operand:VI1_16_32 0 "register_operand" "=x,Yw") + (umaxmin:VI1_16_32 + (match_operand:VI1_16_32 1 "register_operand" "%0,Yw") + (match_operand:VI1_16_32 2 "register_operand" "x,Yw")))] "TARGET_SSE2" "@ pb\t{%2, %0|%0, %2} @@ -2269,9 +2276,9 @@ "TARGET_SSSE3 && TARGET_MMX_WITH_SSE") (define_insn "abs2" - [(set (match_operand:VI_32 0 "register_operand" "=Yv") - (abs:VI_32 - (match_operand:VI_32 1 "register_operand" "Yv")))] + [(set (match_operand:VI_16_32 0 "register_operand" "=Yv") + (abs:VI_16_32 + (match_operand:VI_16_32 1 "register_operand" "Yv")))] "TARGET_SSSE3" "%vpabs\t{%1, %0|%0, %1}" [(set_attr "type" "sselog1") @@ -4351,6 +4358,26 @@ (set_attr "type" "sseiadd") (set_attr "mode" "TI")]) +(define_insn "uavgv2qi3_ceil" + [(set (match_operand:V2QI 0 "register_operand" "=x,Yw") + (truncate:V2QI + (lshiftrt:V2HI + (plus:V2HI + (plus:V2HI + (zero_extend:V2HI + (match_operand:V2QI 1 "register_operand" "%0,Yw")) + (zero_extend:V2HI + (match_operand:V2QI 2 "register_operand" "x,Yw"))) + (const_vector:V2HI [(const_int 1) (const_int 1)])) + (const_int 1))))] + "TARGET_SSE2" + "@ + pavgb\t{%2, %0|%0, %2} + vpavgb\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseiadd") + (set_attr "mode" "TI")]) + (define_insn "uavgv2hi3_ceil" [(set (match_operand:V2HI 0 "register_operand" "=x,Yw") (truncate:V2HI diff --git a/gcc/testsuite/g++.dg/vect/slp-pr98855.cc b/gcc/testsuite/g++.dg/vect/slp-pr98855.cc index b1010326698..ff59eb95aca 100644 --- a/gcc/testsuite/g++.dg/vect/slp-pr98855.cc +++ b/gcc/testsuite/g++.dg/vect/slp-pr98855.cc @@ -81,6 +81,5 @@ void encrypt_n(const uint8_t in[], uint8_t out[], size_t blocks, uint32_t *EK) } } -// This used to work on { target x86_64-*-* i?86-*-* } but a fix in SLP -// discovery makes us trip over the threshold again. -// { dg-final { scan-tree-dump-times "not vectorized: vectorization is not profitable" 2 "slp1" { xfail *-*-* } } } +// { dg-final { scan-tree-dump "not vectorized: vectorization is not profitable" "slp1" } } +// { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" "slp1" } } diff --git a/gcc/testsuite/gcc.target/i386/pr103861-3.c b/gcc/testsuite/gcc.target/i386/pr103861-3.c new file mode 100644 index 00000000000..e5099ea0a83 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103861-3.c @@ -0,0 +1,66 @@ +/* PR target/103861 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -msse4" } */ + +char r[2], a[2], b[2]; +unsigned char ur[2], ua[2], ub[2]; + +void maxs (void) +{ + int i; + + for (i = 0; i < 2; i++) + r[i] = a[i] > b[i] ? a[i] : b[i]; +} + +/* { dg-final { scan-assembler "pmaxsb" } } */ + +void maxu (void) +{ + int i; + + for (i = 0; i < 2; i++) + ur[i] = ua[i] > ub[i] ? ua[i] : ub[i]; +} + +/* { dg-final { scan-assembler "pmaxub" } } */ + +void mins (void) +{ + int i; + + for (i = 0; i < 2; i++) + r[i] = a[i] < b[i] ? a[i] : b[i]; +} + +/* { dg-final { scan-assembler "pminsb" } } */ + +void minu (void) +{ + int i; + + for (i = 0; i < 2; i++) + ur[i] = ua[i] < ub[i] ? ua[i] : ub[i]; +} + +/* { dg-final { scan-assembler "pminub" } } */ + +void _abs (void) +{ + int i; + + for (i = 0; i < 2; i++) + r[i] = a[i] < 0 ? -a[i] : a[i]; +} + +/* { dg-final { scan-assembler "pabsb" } } */ + +void avgu (void) +{ + int i; + + for (i = 0; i < 2; i++) + ur[i] = (ua[i] + ub[i] + 1) >> 1; +} + +/* { dg-final { scan-assembler "pavgb" { xfail *-*-* } } } */