[1/1,fwprop] : Add the support of forwarding the vec_duplicate rtx

From: Lehua Ding <lehua.ding@rivai.ai>

  From: Lehua Ding <lehua.ding@rivai.ai>

Hi,

When I was adding the new RISC-V auto-vectorization function, I found that converting `vector-reg1 vop vector-vreg2` to `scalar-reg3 vop vectorreg2` is not very easy to handle where `vector-reg1` is a vec_duplicate_expr. For example the bellow gimple IR:

```gimple
<bb2>
vect_cst__51 = [vec_duplicate_expr] z_14(D);

<bb 3>
vect_iftmp.13_53 = .LEN_COND_ADD(mask__40.9_47, vect__6.12_50, vect_cst__51, { 0.0, ... }, curr_cnt_60);
```

I once wanted to add corresponding functions to gimple IR, such as adding .LEN_COND_ADD_VS, and then convert .LEN_COND_ADD to .LEN_COND_ADD_VS in match.pd. This method can be realized, but it will cause too many similar internal functions to be added to gimple IR. It doesn't feel necessary. Later, I tried to combine them on the combine pass but failed. Finally, I thought of adding the ability to support forwarding `(vec_duplciate reg)` in fwprop pass, so I have this patch.

Because the current upstream does not support the RISC-V automatic vectorization function, I found an example in sve that can also be optimized and simply tried it. For the float type, one instruction can be reduced, for example the bellow C code. The difference between the new and old assembly code is that the new one uses the mov instruction to directly move the scalar variable to the vector register. The old assembly code first moves the scalar variable to the vector register outside the loop, and then uses the sel instruction. Compared with the entire assembly code, the new assembly code has one instruction less. In addition, I noticed that some instructions in the new assembly code are ahead of the `ble .L1` instruction. I debugged and found that the modification was made in the ce1 pass. This pass believes that moving up is more beneficial to performance.

In addition, for the int type, compared with the float type, the new assembly code will have one more `fmov s2, w2` instruction, so I can't judge whether the performance is better than the previous one. In fact, I mainly do RISC-V development work.

This patch is an exploratory patch and has not been tested too much. I mainly want to see your suggestions on whether this method is feasible and possible potential problems.

```c
/* compiler options: -O3 -march=armv8.2-a+sve -S */
void test1 (int *pred, float *x, float z, int n)
{
         for (int i = 0; i < n; i += 1)
           {
                 x[i] = pred[i] != 1 ? x[i] : z;
           }
}
```

The old assembly code like this (compiler explorer link: https://godbolt.org/z/hxTnEhaqY):

```asm
test1:
         cmp w2, 0
         ble.L1
         mov x3, 0
         cntw x4
         mov z0.s, s0
         whilelo p0.s, wzr, w2
         ptrue p2.b, all
.L3:
         ld1w z2.s, p0/z, [x0, x3, lsl 2]
         ld1w z1.s, p0/z, [x1, x3, lsl 2]
         cmpne p1.s, p2/z, z2.s, #1
         sel z1.s, p1, z1.s, z0.s
         st1w z1.s, p0, [x1, x3, lsl 2]
         add x3, x3, x4
         while lo p0.s, w3, w2
         b.any.L3
.L1:
         ret
```

The new assembly code like this:

```asm
test1:
         whilelo p0.s, wzr, w2
         mov x3, 0
         cntw x4
         ptrue p2.b, all
         cmp w2, 0
         ble.L1
.L3:
         ld1w z2.s, p0/z, [x0, x3, lsl 2]
         ld1w z1.s, p0/z, [x1, x3, lsl 2]
         cmpne p1.s, p2/z, z2.s, #1
         mov z1.s, p1/m, s0
         st1w z1.s, p0, [x1, x3, lsl 2]
         add x3, x3, x4
         while lo p0.s, w3, w2
         b.any.L3
.L1:
         ret
```

gcc/ChangeLog:

        * config/aarch64/aarch64-sve.md (@aarch64_sel_dup<mode>_vs): Add new pattern to capture new opeands order
        * fwprop.cc (fwprop_propagation::profitable_p): Add new check
        (reg_single_def_for_src_p): Add new function for src rtx
        (forward_propagate_into): Change to new function call

---
 gcc/config/aarch64/aarch64-sve.md | 20 ++++++++++++++++++++
 gcc/fwprop.cc                     | 16 +++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

Message ID	20230113091450.72599-1-lehua.ding@rivai.ai
State	New
Headers	Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 717B3384406A for <patchwork@sourceware.org>; Fri, 13 Jan 2023 09:15:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg156.qq.com (smtpbg156.qq.com [15.184.82.18]) by sourceware.org (Postfix) with ESMTPS id 594F13858D32 for <gcc-patches@gcc.gnu.org>; Fri, 13 Jan 2023 09:15:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 594F13858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp76t1673601302tw8jj0v5 Received: from server1.localdomain ( [58.60.1.20]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 13 Jan 2023 17:15:01 +0800 (CST) X-QQ-SSF: 0140000000200070C000B00A0000000 X-QQ-FEAT: q+EIYT+FhZpd31XkWUlDVstwdDcO09LrsxQSQYgIFzJ12wDcuqF4Dd+znv1RC AZ2Fr51UqnQnow2/Fg0sMuJdeNuw1gnyzKCOyfyEgQOHgy+QOwWgHr4zO9TgrUulHRS+qon 8ApuU6s58dmWOXm/F3rrB9bWsmRZfeEjhOhETF7tJ5x2OiyJa6OGp+FcD13syG1GfVKAgOv w/uNqhdhDL6JG+Qsq8tSEK4pPXRMtGWU8Crk0ki4Js8KzKL5hln5dqzKWtMzPquCiiepz8U KRbfXlQzapyDQWPaZangPKA5sIwG5W/HvM52oKoBXsnsojIsNttkEiHFlTJRiAn/ZyuJ9jf LXbiQXZjYqvxB/X/QMGcQ++kjXB/xdGO9ob1iDaLmU0196nZ8FmAfc7LLZ5r0Jd7uBjshMb Uqhp5idMWHqQhGThzVYIIg== X-QQ-GoodBg: 2 From: lehua.ding@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, juzhe.zhong@rivai.ai, Lehua Ding <lehua.ding@rivai.ai> Subject: [PATCH 1/1] [fwprop]: Add the support of forwarding the vec_duplicate rtx Date: Fri, 13 Jan 2023 17:14:50 +0800 Message-Id: <20230113091450.72599-1-lehua.ding@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvr:qybglogicsvr2 X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_WEB, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	[1/1,fwprop] : Add the support of forwarding the vec_duplicate rtx \| [1/1,fwprop] : Add the support of forwarding the vec_duplicate rtx

[1/1,fwprop] : Add the support of forwarding the vec_duplicate rtx

Commit Message

Patch