RISC-V: Optimize zbb ins sext.b and sext.h in rv64

Message ID 20230324015324.13616-1-wangfeng@eswincomputing.com
State Changes Requested, archived
Delegated to: Jeff Law
Headers
Series RISC-V: Optimize zbb ins sext.b and sext.h in rv64 |

Commit Message

Feng Wang March 24, 2023, 1:53 a.m. UTC
  This patch optimize the combine processing for sext.b/h in rv64.
Please refer to the following test case,
int sextb32(int x)
{ return (x << 24) >> 24; }

The rtl expression is as follows,
(insn 6 3 7 2 (set (reg:SI 138)
        (ashift:SI (subreg/s/u:SI (reg/v:DI 136 [ xD.2271 ]) 0)
            (const_int 24 [0x18]))) "sextb.c":2:13 195 {ashlsi3}
     (expr_list:REG_DEAD (reg/v:DI 136 [ xD.2271 ])
        (nil)))
(insn 7 6 8 2 (set (reg:SI 137)
        (ashiftrt:SI (reg:SI 138)
            (const_int 24 [0x18]))) "sextb.c":2:20 196 {ashrsi3}
     (expr_list:REG_DEAD (reg:SI 138)
        (nil)))

During the combine phase, they will combine into
(set (reg:SI 137)
    (ashiftrt:SI (subreg:SI (ashift:DI (reg:DI 140)
                (const_int 24 [0x18])) 0)
        (const_int 24 [0x18])))

The optimal combine result is
(set (reg:SI 137)
    (sign_extend:SI (subreg:QI (reg:DI 140) 0)))
This can be converted to the sext ins.

Due to the influence of subreg,the current processing
can't obtain the imm of left shifts. Need to peel off
another layer of rtl to obtain it.

gcc/ChangeLog:

        * combine.cc (extract_left_shift): Add SUBREG case.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/zbb-sext-rv64.c: New test.
---
 gcc/combine.cc                                 |  5 +++++
 gcc/testsuite/gcc.target/riscv/zbb-sext-rv64.c | 12 ++++++++++++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-sext-rv64.c
  

Comments

Jeff Law April 22, 2023, 12:08 a.m. UTC | #1
On 3/23/23 19:53, Feng Wang wrote:
> This patch optimize the combine processing for sext.b/h in rv64.
> Please refer to the following test case,
> int sextb32(int x)
> { return (x << 24) >> 24; }
> 
> The rtl expression is as follows,
> (insn 6 3 7 2 (set (reg:SI 138)
>          (ashift:SI (subreg/s/u:SI (reg/v:DI 136 [ xD.2271 ]) 0)
>              (const_int 24 [0x18]))) "sextb.c":2:13 195 {ashlsi3}
>       (expr_list:REG_DEAD (reg/v:DI 136 [ xD.2271 ])
>          (nil)))
> (insn 7 6 8 2 (set (reg:SI 137)
>          (ashiftrt:SI (reg:SI 138)
>              (const_int 24 [0x18]))) "sextb.c":2:20 196 {ashrsi3}
>       (expr_list:REG_DEAD (reg:SI 138)
>          (nil)))
> 
> During the combine phase, they will combine into
> (set (reg:SI 137)
>      (ashiftrt:SI (subreg:SI (ashift:DI (reg:DI 140)
>                  (const_int 24 [0x18])) 0)
>          (const_int 24 [0x18])))
> 
> The optimal combine result is
> (set (reg:SI 137)
>      (sign_extend:SI (subreg:QI (reg:DI 140) 0)))
> This can be converted to the sext ins.
> 
> Due to the influence of subreg,the current processing
> can't obtain the imm of left shifts. Need to peel off
> another layer of rtl to obtain it.
> 
> gcc/ChangeLog:
> 
>          * combine.cc (extract_left_shift): Add SUBREG case.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/zbb-sext-rv64.c: New test.
SUBREGs have painful semantics and we should be very careful just 
stripping them.

For example, you might have a subreg that extracts the *high* part.  Or 
you might have (subreg (mem)) or a paradoxical subreg, etc.

At the *least* this case would need verification that you're getting the 
lowpart.  However, I suspect there's other conditions that need to be 
checked to make this valid.

But I would suggest we look elsewhere.  It could be that combine is 
reassociating the subreg in ways that are undesirable and which 
ultimately makes our job harder. Additionally if we can fix this in a 
generic simplification/folder routine, then multiple passes can benefit.

For example in simplify_context::simplify_binary_operation we get a form 
more amenable to optimization.

> #0  simplify_context::simplify_binary_operation (this=0x7fffffffda68, code=ASHIFTRT, mode=E_SImode, 
>     op0=0x7fffea11eb40, op1=0x7fffea009610) at /home/jlaw/riscv-persist/ventana/gcc/gcc/simplify-rtx.cc:2558
> 2558      gcc_assert (GET_RTX_CLASS (code) != RTX_COMPARE);
> (gdb) p code
> $24 = ASHIFTRT
> (gdb) p mode
> $25 = E_SImode
> (gdb) p debug_rtx (op0)
> (ashift:SI (subreg/s/u:SI (reg/v:DI 74 [ x ]) 0)
>     (const_int 24 [0x18]))
> $26 = void
> (gdb) p debug_rtx (op1)
> (const_int 24 [0x18])
> $27 = void

So that's (ashiftrt (ashift (object) 24) 24), ie sign extension.

ie, we really don't have to think about the fact that the underlying 
object is a SUBREG because the outer operations are very clearly a sign 
extension regardless of the object they're operating on.

With that in mind I would suggest you look at adding a case for detect 
zero/sign extension in simplify_context::simplify_binary_operation_1.

Thanks,
Jeff
  
Jeff Law April 22, 2023, 12:13 a.m. UTC | #2
On 3/23/23 19:53, Feng Wang wrote:
> This patch optimize the combine processing for sext.b/h in rv64.
> Please refer to the following test case,
[ ... ]
I've opened BZ109592 to track this problem.

jeff
  
Feng Wang April 23, 2023, 12:24 a.m. UTC | #3
On 2023-04-22 08:08  Jeff Law<jeffreyalaw@gmail.com> wrote:
>
>
>
>On 3/23/23 19:53, Feng Wang wrote:
>> This patch optimize the combine processing for sext.b/h in rv64.
>> Please refer to the following test case,
>> int sextb32(int x)
>> { return (x << 24) >> 24; }
>>
>> The rtl expression is as follows,
>> (insn 6 3 7 2 (set (reg:SI 138)
>>          (ashift:SI (subreg/s/u:SI (reg/v:DI 136 [ xD.2271 ]) 0)
>>              (const_int 24 [0x18]))) "sextb.c":2:13 195 {ashlsi3}
>>       (expr_list:REG_DEAD (reg/v:DI 136 [ xD.2271 ])
>>          (nil)))
>> (insn 7 6 8 2 (set (reg:SI 137)
>>          (ashiftrt:SI (reg:SI 138)
>>              (const_int 24 [0x18]))) "sextb.c":2:20 196 {ashrsi3}
>>       (expr_list:REG_DEAD (reg:SI 138)
>>          (nil)))
>>
>> During the combine phase, they will combine into
>> (set (reg:SI 137)
>>      (ashiftrt:SI (subreg:SI (ashift:DI (reg:DI 140)
>>                  (const_int 24 [0x18])) 0)
>>          (const_int 24 [0x18])))
>>
>> The optimal combine result is
>> (set (reg:SI 137)
>>      (sign_extend:SI (subreg:QI (reg:DI 140) 0)))
>> This can be converted to the sext ins.
>>
>> Due to the influence of subreg,the current processing
>> can't obtain the imm of left shifts. Need to peel off
>> another layer of rtl to obtain it.
>>
>> gcc/ChangeLog:
>>
>>          * combine.cc (extract_left_shift): Add SUBREG case.
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.target/riscv/zbb-sext-rv64.c: New test.
>SUBREGs have painful semantics and we should be very careful just
>stripping them.
>
>For example, you might have a subreg that extracts the *high* part.  Or
>you might have (subreg (mem)) or a paradoxical subreg, etc.
>
>At the *least* this case would need verification that you're getting the
>lowpart.  However, I suspect there's other conditions that need to be
>checked to make this valid.
>
>But I would suggest we look elsewhere.  It could be that combine is
>reassociating the subreg in ways that are undesirable and which
>ultimately makes our job harder. Additionally if we can fix this in a
>generic simplification/folder routine, then multiple passes can benefit.
>
>For example in simplify_context::simplify_binary_operation we get a form
>more amenable to optimization.
>
>> #0  simplify_context::simplify_binary_operation (this=0x7fffffffda68, code=ASHIFTRT, mode=E_SImode,
>>     op0=0x7fffea11eb40, op1=0x7fffea009610) at /home/jlaw/riscv-persist/ventana/gcc/gcc/simplify-rtx.cc:2558
>> 2558      gcc_assert (GET_RTX_CLASS (code) != RTX_COMPARE);
>> (gdb) p code
>> $24 = ASHIFTRT
>> (gdb) p mode
>> $25 = E_SImode
>> (gdb) p debug_rtx (op0)
>> (ashift:SI (subreg/s/u:SI (reg/v:DI 74 [ x ]) 0)
>>     (const_int 24 [0x18]))
>> $26 = void
>> (gdb) p debug_rtx (op1)
>> (const_int 24 [0x18])
>> $27 = void
>
>So that's (ashiftrt (ashift (object) 24) 24), ie sign extension.
>
>ie, we really don't have to think about the fact that the underlying
>object is a SUBREG because the outer operations are very clearly a sign
>extension regardless of the object they're operating on.
>
>With that in mind I would suggest you look at adding a case for detect
>zero/sign extension in simplify_context::simplify_binary_operation_1.
>
>Thanks,
>Jeff 
You are right, I will modify it according to your suggestion.
Thanks.
Feng Wang
  

Patch

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..fb396a3d974 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7915,6 +7915,11 @@  extract_left_shift (scalar_int_mode mode, rtx x, int count)
 
   switch (code)
     {
+    case SUBREG:
+      x = XEXP (x, 0);
+      if (GET_CODE(x) != ASHIFT)
+        break;
+
     case ASHIFT:
       /* This is the shift itself.  If it is wide enough, we will return
 	 either the value being shifted if the shift count is equal to
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-sext-rv64.c b/gcc/testsuite/gcc.target/riscv/zbb-sext-rv64.c
new file mode 100644
index 00000000000..4086ee56f57
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-sext-rv64.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64g_zbb -mabi=lp64d -O2" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int sextb32(int x)
+{ return (x << 24) >> 24; }
+
+int sexth32(int x)
+{ return (x << 16) >> 16; }
+
+/* { dg-final { scan-assembler "sext.b" } } */
+/* { dg-final { scan-assembler "sext.h" } } */
\ No newline at end of file