From patchwork Thu Jan 13 19:50:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 49999 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8D339385841B for ; Thu, 13 Jan 2022 19:51:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8D339385841B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1642103489; bh=+A+KUT4Xl/dOBq8NQoYj+Rj8dTOfrvHBL006kFs5bl8=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=CuFZDwx/eXbZZhFNN89RwvUU/cfj0JoiKIFbDZkhBMfKvZ40R0BFlx66m753epsUD iCM47DB5Q7YA9q2nPPSKKWz1ZPJinDBecIWsJnNwUKbMbziuHOzYNAQEzh1lhvqYPe RD4G7YKCHycv+77k6+4PPygBFnqXTFGZdE4dOURM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by sourceware.org (Postfix) with ESMTPS id F0750385841B for ; Thu, 13 Jan 2022 19:50:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F0750385841B Received: by mail-qv1-xf2a.google.com with SMTP id bt14so2216218qvb.13 for ; Thu, 13 Jan 2022 11:50:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=+A+KUT4Xl/dOBq8NQoYj+Rj8dTOfrvHBL006kFs5bl8=; b=xrI5M6nVNCY+mvt4VRd3q8Canni37fceTRdjsnI/kolC4M5lI4Qcq8aHwcRGoZXeae rnKyFO0oX86HPbsOKLnWM05V6A4qy37J6bOYcHNmaF3+Zyp9cinrCDXxIe3QMEPPch7+ Xmb08HvlhSiMXC/VeZfG3Ssnt4+Tk7iiHu21CXofrZRx+weXcLW4gFBL9Bq7hiHtAxKz cvsLkfGITn8C848H07NanF8I2fYSjGFZylBmP3+X/l/+zn69C5KxlEOfxIsPoWZslpjw PhxIbsOB5V9PUQTOwk+NcamSKxetvMHK8flzeVW+0T61FePF8ZuzzujkHfdtehhsy8Hc d7Yg== X-Gm-Message-State: AOAM5326YlH3oa5+T+wdbRFDnzt48lxc8qN773pYaZFysIBRxwxWiSHh KkhREb3i1tHlLy+brRsaBuHvbFjJeSHN2Ls4XgtMkQqdYosEGw== X-Google-Smtp-Source: ABdhPJynrdS1HyOYHAsq7sIoMSS4NvITYGMcpTGJnIQ2lXjdi6XOeCVKu0OJZDcvFS3GDUNZePEJzgtH8rzil886tEg= X-Received: by 2002:a05:6214:2346:: with SMTP id hu6mr5500349qvb.31.1642103459062; Thu, 13 Jan 2022 11:50:59 -0800 (PST) MIME-Version: 1.0 Date: Thu, 13 Jan 2022 20:50:47 +0100 Message-ID: Subject: [PATCH] i386: Introduce V2QImode vectorized shifts [PR103861] To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Add V2QImode shift operations and split them to synthesized double HI/LO QImode operations with integer registers. Also robustify arithmetic split patterns. 2022-01-13 Uroš Bizjak gcc/ChangeLog: PR target/103861 * config/i386/i386.md (*ashlqi_ext_2): New insn pattern. (*qi_ext_2): Ditto. * config/i386/mmx.md (v2qi): New insn_and_split pattern. gcc/testsuite/ChangeLog: PR target/103861 * gcc.target/i386/pr103861.c (shl,ashr,lshr): New tests. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index bcaaa4993b1..c2acb1dbd90 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12413,6 +12413,54 @@ (const_string "*"))) (set_attr "mode" "")]) +(define_insn "*ashlqi_ext_2" + [(set (zero_extract:SWI248 + (match_operand:SWI248 0 "register_operand" "+Q") + (const_int 8) + (const_int 8)) + (subreg:SWI248 + (ashift:QI + (subreg:QI + (zero_extract:SWI248 + (match_operand:SWI248 1 "register_operand" "0") + (const_int 8) + (const_int 8)) 0) + (match_operand:QI 2 "nonmemory_operand" "cI")) 0)) + (clobber (reg:CC FLAGS_REG))] + "/* FIXME: without this LRA can't reload this pattern, see PR82524. */ + rtx_equal_p (operands[0], operands[1])" +{ + switch (get_attr_type (insn)) + { + case TYPE_ALU: + gcc_assert (operands[2] == const1_rtx); + return "add{b}\t%h0, %h0"; + + default: + if (operands[2] == const1_rtx + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return "sal{b}\t%h0"; + else + return "sal{b}\t{%2, %h0|%h0, %2}"; + } +} + [(set (attr "type") + (cond [(and (match_test "TARGET_DOUBLE_WITH_ADD") + (match_operand 2 "const1_operand")) + (const_string "alu") + ] + (const_string "ishift"))) + (set (attr "length_immediate") + (if_then_else + (ior (eq_attr "type" "alu") + (and (eq_attr "type" "ishift") + (and (match_operand 2 "const1_operand") + (ior (match_test "TARGET_SHIFT1") + (match_test "optimize_function_for_size_p (cfun)"))))) + (const_string "0") + (const_string "*"))) + (set_attr "mode" "QI")]) + ;; See comment above `ashl3' about how this works. (define_expand "3" @@ -13143,6 +13191,39 @@ (const_string "0") (const_string "*"))) (set_attr "mode" "")]) + +(define_insn "*qi_ext_2" + [(set (zero_extract:SWI248 + (match_operand:SWI248 0 "register_operand" "+Q") + (const_int 8) + (const_int 8)) + (subreg:SWI248 + (any_shiftrt:QI + (subreg:QI + (zero_extract:SWI248 + (match_operand:SWI248 1 "register_operand" "0") + (const_int 8) + (const_int 8)) 0) + (match_operand:QI 2 "nonmemory_operand" "cI")) 0)) + (clobber (reg:CC FLAGS_REG))] + "/* FIXME: without this LRA can't reload this pattern, see PR82524. */ + rtx_equal_p (operands[0], operands[1])" +{ + if (operands[2] == const1_rtx + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return "{b}\t%h0"; + else + return "{b}\t{%2, %h0|%h0, %2}"; +} + [(set_attr "type" "ishift") + (set (attr "length_immediate") + (if_then_else + (and (match_operand 2 "const1_operand") + (ior (match_test "TARGET_SHIFT1") + (match_test "optimize_function_for_size_p (cfun)"))) + (const_string "0") + (const_string "*"))) + (set_attr "mode" "QI")]) ;; Rotate instructions diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 3d99a5e851b..782da220f98 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -1657,7 +1657,8 @@ (neg:V2QI (match_operand:V2QI 1 "general_reg_operand"))) (clobber (reg:CC FLAGS_REG))] - "reload_completed" + "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)) + && reload_completed" [(parallel [(set (strict_low_part (match_dup 0)) (neg:QI (match_dup 1))) @@ -1683,7 +1684,8 @@ (neg:V2QI (match_operand:V2QI 1 "sse_reg_operand"))) (clobber (reg:CC FLAGS_REG))] - "reload_completed" + "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)) + && TARGET_SSE2 && reload_completed" [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (minus:V16QI (match_dup 0) (match_dup 1)))] @@ -1757,7 +1759,8 @@ (match_operand:V2QI 1 "general_reg_operand") (match_operand:V2QI 2 "general_reg_operand"))) (clobber (reg:CC FLAGS_REG))] - "reload_completed" + "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)) + && reload_completed" [(parallel [(set (strict_low_part (match_dup 0)) (plusminus:QI (match_dup 1) (match_dup 2))) @@ -1790,7 +1793,8 @@ (match_operand:V2QI 1 "sse_reg_operand") (match_operand:V2QI 2 "sse_reg_operand"))) (clobber (reg:CC FLAGS_REG))] - "TARGET_SSE2 && reload_completed" + "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)) + && TARGET_SSE2 && reload_completed" [(set (match_dup 0) (plusminus:V16QI (match_dup 1) (match_dup 2)))] { @@ -2387,6 +2391,38 @@ (const_string "0"))) (set_attr "mode" "TI")]) +(define_insn_and_split "v2qi3" + [(set (match_operand:V2QI 0 "register_operand" "=Q") + (any_shift:V2QI + (match_operand:V2QI 1 "register_operand" "0") + (match_operand:QI 2 "nonmemory_operand" "cI"))) + (clobber (reg:CC FLAGS_REG))] + "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)" + "#" + "&& reload_completed" + [(parallel + [(set (zero_extract:HI (match_dup 3) (const_int 8) (const_int 8)) + (subreg:HI + (any_shift:QI + (subreg:QI + (zero_extract:HI (match_dup 4) + (const_int 8) + (const_int 8)) 0) + (match_dup 2)) 0)) + (clobber (reg:CC FLAGS_REG))]) + (parallel + [(set (strict_low_part (match_dup 0)) + (any_shift:QI (match_dup 1) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[4] = lowpart_subreg (HImode, operands[1], V2QImode); + operands[3] = lowpart_subreg (HImode, operands[0], V2QImode); + operands[1] = lowpart_subreg (QImode, operands[1], V2QImode); + operands[0] = lowpart_subreg (QImode, operands[0], V2QImode); +} + [(set_attr "type" "multi") + (set_attr "mode" "QI")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel integral comparisons diff --git a/gcc/testsuite/gcc.target/i386/pr103861.c b/gcc/testsuite/gcc.target/i386/pr103861.c index 158717645b6..064b617774b 100644 --- a/gcc/testsuite/gcc.target/i386/pr103861.c +++ b/gcc/testsuite/gcc.target/i386/pr103861.c @@ -3,6 +3,7 @@ /* { dg-options "-O2 -dp" } */ typedef char __v2qi __attribute__ ((__vector_size__ (2))); +typedef unsigned char __v2qu __attribute__ ((__vector_size__ (2))); __v2qi and (__v2qi a, __v2qi b) { return a & b; }; @@ -20,4 +21,10 @@ __v2qi minus (__v2qi a, __v2qi b) { return a - b; }; __v2qi neg (__v2qi a) { return -a; }; +__v2qi shl (__v2qi a, int b) { return a << b; }; + +__v2qi ashr (__v2qi a, int b) { return a >> b; }; + +__v2qu lshr (__v2qu a, int b) { return a >> b; }; + /* { dg-final { scan-assembler-not "insvhi" } } */