[v1,3/3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

Message ID 20220524214703.4022737-4-philipp.tomsich@vrull.eu
State Deferred, archived
Headers
Series RISC-V: Improve sequences with shifted zero-extended operands |

Commit Message

Philipp Tomsich May 24, 2022, 9:47 p.m. UTC
  The current method of treating shifts of extended values on RISC-V
frequently causes sequences of 3 shifts, despite the presence of the
'zero_extendsidi2_shifted' pattern.

Consider:
    unsigned long f(unsigned int a, unsigned long b)
    {
            a = a << 1;
            unsigned long c = (unsigned long) a;
            c = b + (c<<4);
            return c;
    }
which will present at combine-time as:
    Trying 7, 8 -> 9:
        7: r78:SI=r81:DI#0<<0x1
          REG_DEAD r81:DI
        8: r79:DI=zero_extend(r78:SI)
          REG_DEAD r78:SI
        9: r72:DI=r79:DI<<0x4
          REG_DEAD r79:DI
    Failed to match this instruction:
    (set (reg:DI 72 [ _1 ])
        (and:DI (ashift:DI (reg:DI 81)
                (const_int 5 [0x5]))
    	(const_int 68719476704 [0xfffffffe0])))
and produce the following (optimized) assembly:
    f:
	slliw	a5,a0,1
	slli	a5,a5,32
	srli	a5,a5,28
	add	a0,a5,a1
	ret

The current way of handling this (in 'zero_extendsidi2_shifted')
doesn't apply for two reasons:
- this is seen before reload, and
- (more importantly) the constant mask is not 0xfffffffful.

To address this, we introduce a generalized version of shifting
zero-extended values that supports any mask of consecutive ones as
long as the number of training zeros is the inner shift-amount.

With this new split, we generate the following assembly for the
aforementioned function:
    f:
	slli	a0,a0,33
	srli	a0,a0,28
	add	a0,a0,a1
	ret

gcc/ChangeLog:

	* config/riscv/riscv.md (zero_extendsidi2_shifted): Replace
	  with a generalized split that requires no clobber, runs
	  before reload and works for smaller masks.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)
  

Comments

Kito Cheng June 7, 2022, 10:24 a.m. UTC | #1
On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> The current method of treating shifts of extended values on RISC-V
> frequently causes sequences of 3 shifts, despite the presence of the
> 'zero_extendsidi2_shifted' pattern.
>
> Consider:
>     unsigned long f(unsigned int a, unsigned long b)
>     {
>             a = a << 1;
>             unsigned long c = (unsigned long) a;
>             c = b + (c<<4);
>             return c;
>     }
> which will present at combine-time as:
>     Trying 7, 8 -> 9:
>         7: r78:SI=r81:DI#0<<0x1
>           REG_DEAD r81:DI
>         8: r79:DI=zero_extend(r78:SI)
>           REG_DEAD r78:SI
>         9: r72:DI=r79:DI<<0x4
>           REG_DEAD r79:DI
>     Failed to match this instruction:
>     (set (reg:DI 72 [ _1 ])
>         (and:DI (ashift:DI (reg:DI 81)
>                 (const_int 5 [0x5]))
>         (const_int 68719476704 [0xfffffffe0])))
> and produce the following (optimized) assembly:
>     f:
>         slliw   a5,a0,1
>         slli    a5,a5,32
>         srli    a5,a5,28
>         add     a0,a5,a1
>         ret
>
> The current way of handling this (in 'zero_extendsidi2_shifted')
> doesn't apply for two reasons:
> - this is seen before reload, and
> - (more importantly) the constant mask is not 0xfffffffful.
>
> To address this, we introduce a generalized version of shifting
> zero-extended values that supports any mask of consecutive ones as
> long as the number of training zeros is the inner shift-amount.
>
> With this new split, we generate the following assembly for the
> aforementioned function:
>     f:
>         slli    a0,a0,33
>         srli    a0,a0,28
>         add     a0,a0,a1
>         ret
>
> gcc/ChangeLog:
>
>         * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace
>           with a generalized split that requires no clobber, runs
>           before reload and works for smaller masks.
>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> ---
>
>  gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++-----------------
>  1 file changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index b8ab0cf169a..cc10cd90a74 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2119,23 +2119,26 @@ (define_split
>  ;; occur when unsigned int is used for array indexing.  Split this into two
>  ;; shifts.  Otherwise we can get 3 shifts.
>
> -(define_insn_and_split "zero_extendsidi2_shifted"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> -       (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> -                          (match_operand:QI 2 "immediate_operand" "I"))
> -               (match_operand 3 "immediate_operand" "")))
> -   (clobber (match_scratch:DI 4 "=&r"))]
> -  "TARGET_64BIT && !TARGET_ZBA
> -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)"
> -  "#"
> -  "&& reload_completed"
> -  [(set (match_dup 4)
> -       (ashift:DI (match_dup 1) (const_int 32)))
> -   (set (match_dup 0)
> -       (lshiftrt:DI (match_dup 4) (match_dup 5)))]
> -  "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));"
> -  [(set_attr "type" "shift")
> -   (set_attr "mode" "DI")])
> +(define_split
> +  [(set (match_operand:DI 0 "register_operand")
> +       (and:DI (ashift:DI (match_operand:DI 1 "register_operand")
> +                          (match_operand:QI 2 "immediate_operand"))
> +               (match_operand:DI 3 "consecutive_bits_operand")))]
> +  "TARGET_64BIT"
> +  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4)))
> +   (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))]

I would prefer to keep using another register if possible:

like this:
+  [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4)))
+   (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))]

if (can_create_pseudo_p)
  operands[6] = gen_reg_rtx (DImode);
else
  operands[6] = operands[0];


> +{
> +       unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
> +       int leading = clz_hwi (mask);
> +       int trailing = ctz_hwi (mask);
> +
> +       /* The shift-amount must match the number of trailing bits */
> +       if (trailing != UINTVAL (operands[2]))
> +          FAIL;
> +
> +       operands[4] = GEN_INT (leading + trailing);
> +       operands[5] = GEN_INT (leading);
> +})
>
>  ;;
>  ;;  ....................
> --
> 2.34.1
>
  
Philipp Tomsich June 7, 2022, 10:50 a.m. UTC | #2
On Tue, 7 Jun 2022 at 12:24, Kito Cheng <kito.cheng@gmail.com> wrote:
>
> On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich
> <philipp.tomsich@vrull.eu> wrote:
> >
> > The current method of treating shifts of extended values on RISC-V
> > frequently causes sequences of 3 shifts, despite the presence of the
> > 'zero_extendsidi2_shifted' pattern.
> >
> > Consider:
> >     unsigned long f(unsigned int a, unsigned long b)
> >     {
> >             a = a << 1;
> >             unsigned long c = (unsigned long) a;
> >             c = b + (c<<4);
> >             return c;
> >     }
> > which will present at combine-time as:
> >     Trying 7, 8 -> 9:
> >         7: r78:SI=r81:DI#0<<0x1
> >           REG_DEAD r81:DI
> >         8: r79:DI=zero_extend(r78:SI)
> >           REG_DEAD r78:SI
> >         9: r72:DI=r79:DI<<0x4
> >           REG_DEAD r79:DI
> >     Failed to match this instruction:
> >     (set (reg:DI 72 [ _1 ])
> >         (and:DI (ashift:DI (reg:DI 81)
> >                 (const_int 5 [0x5]))
> >         (const_int 68719476704 [0xfffffffe0])))
> > and produce the following (optimized) assembly:
> >     f:
> >         slliw   a5,a0,1
> >         slli    a5,a5,32
> >         srli    a5,a5,28
> >         add     a0,a5,a1
> >         ret
> >
> > The current way of handling this (in 'zero_extendsidi2_shifted')
> > doesn't apply for two reasons:
> > - this is seen before reload, and
> > - (more importantly) the constant mask is not 0xfffffffful.
> >
> > To address this, we introduce a generalized version of shifting
> > zero-extended values that supports any mask of consecutive ones as
> > long as the number of training zeros is the inner shift-amount.
> >
> > With this new split, we generate the following assembly for the
> > aforementioned function:
> >     f:
> >         slli    a0,a0,33
> >         srli    a0,a0,28
> >         add     a0,a0,a1
> >         ret
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace
> >           with a generalized split that requires no clobber, runs
> >           before reload and works for smaller masks.
> >
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > ---
> >
> >  gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++-----------------
> >  1 file changed, 20 insertions(+), 17 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index b8ab0cf169a..cc10cd90a74 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -2119,23 +2119,26 @@ (define_split
> >  ;; occur when unsigned int is used for array indexing.  Split this into two
> >  ;; shifts.  Otherwise we can get 3 shifts.
> >
> > -(define_insn_and_split "zero_extendsidi2_shifted"
> > -  [(set (match_operand:DI 0 "register_operand" "=r")
> > -       (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > -                          (match_operand:QI 2 "immediate_operand" "I"))
> > -               (match_operand 3 "immediate_operand" "")))
> > -   (clobber (match_scratch:DI 4 "=&r"))]
> > -  "TARGET_64BIT && !TARGET_ZBA
> > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)"
> > -  "#"
> > -  "&& reload_completed"
> > -  [(set (match_dup 4)
> > -       (ashift:DI (match_dup 1) (const_int 32)))
> > -   (set (match_dup 0)
> > -       (lshiftrt:DI (match_dup 4) (match_dup 5)))]
> > -  "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));"
> > -  [(set_attr "type" "shift")
> > -   (set_attr "mode" "DI")])
> > +(define_split
> > +  [(set (match_operand:DI 0 "register_operand")
> > +       (and:DI (ashift:DI (match_operand:DI 1 "register_operand")
> > +                          (match_operand:QI 2 "immediate_operand"))
> > +               (match_operand:DI 3 "consecutive_bits_operand")))]
> > +  "TARGET_64BIT"
> > +  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4)))
> > +   (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))]
>
> I would prefer to keep using another register if possible:
>
> like this:
> +  [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4)))
> +   (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))]
>
> if (can_create_pseudo_p)
>   operands[6] = gen_reg_rtx (DImode);
> else
>   operands[6] = operands[0];

I don't see the benefit to this (unless you expect opportunities for
CSE), as there will be a linear dependency chain anyway.  I'd like to
understand your reasoning behind this a bit better, as our style
currently generally tries to not avoid introducing temporaries if it
is avoidable.

Thanks,
Philipp.

>
> > +{
> > +       unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
> > +       int leading = clz_hwi (mask);
> > +       int trailing = ctz_hwi (mask);
> > +
> > +       /* The shift-amount must match the number of trailing bits */
> > +       if (trailing != UINTVAL (operands[2]))
> > +          FAIL;
> > +
> > +       operands[4] = GEN_INT (leading + trailing);
> > +       operands[5] = GEN_INT (leading);
> > +})
> >
> >  ;;
> >  ;;  ....................
> > --
> > 2.34.1
> >
  
Kito Cheng June 7, 2022, 1:18 p.m. UTC | #3
Using the same pseudo register makes one longer live range instead of
two shorter live ranges, that's not good when inst. scheduler try to
separate those two instructions, and I think register allocator has
more complete knowledge to decide which way is better - using the same
or different, so I prefer to use another pseudo here if possible.

That's also what AArch64/ARM/x86 port did - use new pseudo as tmp if possible.


On Tue, Jun 7, 2022 at 6:50 PM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote:
>
> On Tue, 7 Jun 2022 at 12:24, Kito Cheng <kito.cheng@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 5:47 AM Philipp Tomsich
> > <philipp.tomsich@vrull.eu> wrote:
> > >
> > > The current method of treating shifts of extended values on RISC-V
> > > frequently causes sequences of 3 shifts, despite the presence of the
> > > 'zero_extendsidi2_shifted' pattern.
> > >
> > > Consider:
> > >     unsigned long f(unsigned int a, unsigned long b)
> > >     {
> > >             a = a << 1;
> > >             unsigned long c = (unsigned long) a;
> > >             c = b + (c<<4);
> > >             return c;
> > >     }
> > > which will present at combine-time as:
> > >     Trying 7, 8 -> 9:
> > >         7: r78:SI=r81:DI#0<<0x1
> > >           REG_DEAD r81:DI
> > >         8: r79:DI=zero_extend(r78:SI)
> > >           REG_DEAD r78:SI
> > >         9: r72:DI=r79:DI<<0x4
> > >           REG_DEAD r79:DI
> > >     Failed to match this instruction:
> > >     (set (reg:DI 72 [ _1 ])
> > >         (and:DI (ashift:DI (reg:DI 81)
> > >                 (const_int 5 [0x5]))
> > >         (const_int 68719476704 [0xfffffffe0])))
> > > and produce the following (optimized) assembly:
> > >     f:
> > >         slliw   a5,a0,1
> > >         slli    a5,a5,32
> > >         srli    a5,a5,28
> > >         add     a0,a5,a1
> > >         ret
> > >
> > > The current way of handling this (in 'zero_extendsidi2_shifted')
> > > doesn't apply for two reasons:
> > > - this is seen before reload, and
> > > - (more importantly) the constant mask is not 0xfffffffful.
> > >
> > > To address this, we introduce a generalized version of shifting
> > > zero-extended values that supports any mask of consecutive ones as
> > > long as the number of training zeros is the inner shift-amount.
> > >
> > > With this new split, we generate the following assembly for the
> > > aforementioned function:
> > >     f:
> > >         slli    a0,a0,33
> > >         srli    a0,a0,28
> > >         add     a0,a0,a1
> > >         ret
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/riscv/riscv.md (zero_extendsidi2_shifted): Replace
> > >           with a generalized split that requires no clobber, runs
> > >           before reload and works for smaller masks.
> > >
> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > > ---
> > >
> > >  gcc/config/riscv/riscv.md | 37 ++++++++++++++++++++-----------------
> > >  1 file changed, 20 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > > index b8ab0cf169a..cc10cd90a74 100644
> > > --- a/gcc/config/riscv/riscv.md
> > > +++ b/gcc/config/riscv/riscv.md
> > > @@ -2119,23 +2119,26 @@ (define_split
> > >  ;; occur when unsigned int is used for array indexing.  Split this into two
> > >  ;; shifts.  Otherwise we can get 3 shifts.
> > >
> > > -(define_insn_and_split "zero_extendsidi2_shifted"
> > > -  [(set (match_operand:DI 0 "register_operand" "=r")
> > > -       (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > -                          (match_operand:QI 2 "immediate_operand" "I"))
> > > -               (match_operand 3 "immediate_operand" "")))
> > > -   (clobber (match_scratch:DI 4 "=&r"))]
> > > -  "TARGET_64BIT && !TARGET_ZBA
> > > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)"
> > > -  "#"
> > > -  "&& reload_completed"
> > > -  [(set (match_dup 4)
> > > -       (ashift:DI (match_dup 1) (const_int 32)))
> > > -   (set (match_dup 0)
> > > -       (lshiftrt:DI (match_dup 4) (match_dup 5)))]
> > > -  "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));"
> > > -  [(set_attr "type" "shift")
> > > -   (set_attr "mode" "DI")])
> > > +(define_split
> > > +  [(set (match_operand:DI 0 "register_operand")
> > > +       (and:DI (ashift:DI (match_operand:DI 1 "register_operand")
> > > +                          (match_operand:QI 2 "immediate_operand"))
> > > +               (match_operand:DI 3 "consecutive_bits_operand")))]
> > > +  "TARGET_64BIT"
> > > +  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4)))
> > > +   (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))]
> >
> > I would prefer to keep using another register if possible:
> >
> > like this:
> > +  [(set (match_dup 6) (ashift:DI (match_dup 1) (match_dup 4)))
> > +   (set (match_dup 0) (lshiftrt:DI (match_dup 6) (match_dup 5)))]
> >
> > if (can_create_pseudo_p)
> >   operands[6] = gen_reg_rtx (DImode);
> > else
> >   operands[6] = operands[0];
>
> I don't see the benefit to this (unless you expect opportunities for
> CSE), as there will be a linear dependency chain anyway.  I'd like to
> understand your reasoning behind this a bit better, as our style
> currently generally tries to not avoid introducing temporaries if it
> is avoidable.
>
> Thanks,
> Philipp.
>
> >
> > > +{
> > > +       unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
> > > +       int leading = clz_hwi (mask);
> > > +       int trailing = ctz_hwi (mask);
> > > +
> > > +       /* The shift-amount must match the number of trailing bits */
> > > +       if (trailing != UINTVAL (operands[2]))
> > > +          FAIL;
> > > +
> > > +       operands[4] = GEN_INT (leading + trailing);
> > > +       operands[5] = GEN_INT (leading);
> > > +})
> > >
> > >  ;;
> > >  ;;  ....................
> > > --
> > > 2.34.1
> > >
  

Patch

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b8ab0cf169a..cc10cd90a74 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2119,23 +2119,26 @@  (define_split
 ;; occur when unsigned int is used for array indexing.  Split this into two
 ;; shifts.  Otherwise we can get 3 shifts.
 
-(define_insn_and_split "zero_extendsidi2_shifted"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-	(and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
-			   (match_operand:QI 2 "immediate_operand" "I"))
-		(match_operand 3 "immediate_operand" "")))
-   (clobber (match_scratch:DI 4 "=&r"))]
-  "TARGET_64BIT && !TARGET_ZBA
-   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0xffffffff)"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 4)
-	(ashift:DI (match_dup 1) (const_int 32)))
-   (set (match_dup 0)
-	(lshiftrt:DI (match_dup 4) (match_dup 5)))]
-  "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));"
-  [(set_attr "type" "shift")
-   (set_attr "mode" "DI")])
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+	(and:DI (ashift:DI (match_operand:DI 1 "register_operand")
+			   (match_operand:QI 2 "immediate_operand"))
+	        (match_operand:DI 3 "consecutive_bits_operand")))]
+  "TARGET_64BIT"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4)))
+   (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))]
+{
+	unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
+	int leading = clz_hwi (mask);
+	int trailing = ctz_hwi (mask);
+
+	/* The shift-amount must match the number of trailing bits */
+	if (trailing != UINTVAL (operands[2]))
+	   FAIL;
+
+	operands[4] = GEN_INT (leading + trailing);
+	operands[5] = GEN_INT (leading);
+})
 
 ;;
 ;;  ....................