[3/7] AArch64 Add pattern for sshr to cmlt

Message ID 20210929162001.GA31867@arm.com
State Committed
Headers
Series AArch64 Optimize truncation, shifts and bitmask comparisons |

Commit Message

Tamar Christina Sept. 29, 2021, 4:20 p.m. UTC
  Hi All,

This optimizes signed right shift by BITSIZE-1 into a cmlt operation which is
more optimal because generally compares have a higher throughput than shifts.

On AArch64 the result of the shift would have been either -1 or 0 which is the
results of the compare.

i.e.

void e (int * restrict a, int *b, int n)
{
    for (int i = 0; i < n; i++)
      b[i] = a[i] >> 31;
}

now generates:

.L4:
        ldr     q0, [x0, x3]
        cmlt    v0.4s, v0.4s, #0
        str     q0, [x1, x3]
        add     x3, x3, 16
        cmp     x4, x3
        bne     .L4

instead of:

.L4:
        ldr     q0, [x0, x3]
        sshr    v0.4s, v0.4s, 31
        str     q0, [x1, x3]
        add     x3, x3, 16
        cmp     x4, x3
        bne     .L4

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_simd_ashr<mode>): Add case cmp
	case.
	* config/aarch64/constraints.md (D1): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/shl-combine-2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d5742698297ded861228ee22007 100644


--
  

Comments

Kyrylo Tkachov Sept. 30, 2021, 9:27 a.m. UTC | #1
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Wednesday, September 29, 2021 5:20 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> 
> Hi All,
> 
> This optimizes signed right shift by BITSIZE-1 into a cmlt operation which is
> more optimal because generally compares have a higher throughput than
> shifts.
> 
> On AArch64 the result of the shift would have been either -1 or 0 which is the
> results of the compare.
> 
> i.e.
> 
> void e (int * restrict a, int *b, int n)
> {
>     for (int i = 0; i < n; i++)
>       b[i] = a[i] >> 31;
> }
> 
> now generates:
> 
> .L4:
>         ldr     q0, [x0, x3]
>         cmlt    v0.4s, v0.4s, #0
>         str     q0, [x1, x3]
>         add     x3, x3, 16
>         cmp     x4, x3
>         bne     .L4
> 
> instead of:
> 
> .L4:
>         ldr     q0, [x0, x3]
>         sshr    v0.4s, v0.4s, 31
>         str     q0, [x1, x3]
>         add     x3, x3, 16
>         cmp     x4, x3
>         bne     .L4
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

This should be okay (either a win or neutral) for Arm Cortex and Neoverse cores so I'm tempted to not ask for a CPU-specific tunable to guard it to keep the code clean.
Andrew, would this change be okay from a Thunder X line perspective?
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* config/aarch64/aarch64-simd.md (aarch64_simd_ashr<mode>):
> Add case cmp
> 	case.
> 	* config/aarch64/constraints.md (D1): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/aarch64/shl-combine-2.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d574269829
> 7ded861228ee22007 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1127,12 +1127,14 @@ (define_insn "aarch64_simd_lshr<mode>"
>  )
> 
>  (define_insn "aarch64_simd_ashr<mode>"
> - [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> -       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> -		     (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> "Dr")))]
> + [(set (match_operand:VDQ_I 0 "register_operand" "=w,w")
> +       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w,w")
> +		     (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> "D1,Dr")))]
>   "TARGET_SIMD"
> - "sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> -  [(set_attr "type" "neon_shift_imm<q>")]
> + "@
> +  cmlt\t%0.<Vtype>, %1.<Vtype>, #0
> +  sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> +  [(set_attr "type" "neon_compare<q>,neon_shift_imm<q>")]
>  )
> 
>  (define_insn "*aarch64_simd_sra<mode>"
> diff --git a/gcc/config/aarch64/constraints.md
> b/gcc/config/aarch64/constraints.md
> index
> 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a89
> 9db69fd428dfb0d6 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -437,6 +437,14 @@ (define_constraint "Dl"
>        (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
>  						 true)")))
> 
> +(define_constraint "D1"
> +  "@internal
> + A constraint that matches vector of immediates that is bits(mode)-1."
> + (and (match_code "const,const_vector")
> +      (match_test "aarch64_const_vec_all_same_in_range_p (op,
> +			GET_MODE_UNIT_BITSIZE (mode) - 1,
> +			GET_MODE_UNIT_BITSIZE (mode) - 1)")))
> +
>  (define_constraint "Dr"
>    "@internal
>   A constraint that matches vector of immediates for right shifts."
> diff --git a/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..bdfe35d09ffccc7928947c9e
> 57f1034f7ca2c798
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
> +
> +void e (int * restrict a, int *b, int n)
> +{
> +    for (int i = 0; i < n; i++)
> +      b[i] = a[i] >> 31;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tcmlt\t} 1 } } */
> +/* { dg-final { scan-assembler-not {\tsshr\t} } } */
> +
> 
> 
> --
  
Andrew Pinski Oct. 11, 2021, 7:56 p.m. UTC | #2
On Thu, Sep 30, 2021 at 2:28 AM Kyrylo Tkachov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
> > -----Original Message-----
> > From: Tamar Christina <Tamar.Christina@arm.com>
> > Sent: Wednesday, September 29, 2021 5:20 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> > <Richard.Sandiford@arm.com>
> > Subject: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> >
> > Hi All,
> >
> > This optimizes signed right shift by BITSIZE-1 into a cmlt operation which is
> > more optimal because generally compares have a higher throughput than
> > shifts.
> >
> > On AArch64 the result of the shift would have been either -1 or 0 which is the
> > results of the compare.
> >
> > i.e.
> >
> > void e (int * restrict a, int *b, int n)
> > {
> >     for (int i = 0; i < n; i++)
> >       b[i] = a[i] >> 31;
> > }
> >
> > now generates:
> >
> > .L4:
> >         ldr     q0, [x0, x3]
> >         cmlt    v0.4s, v0.4s, #0
> >         str     q0, [x1, x3]
> >         add     x3, x3, 16
> >         cmp     x4, x3
> >         bne     .L4
> >
> > instead of:
> >
> > .L4:
> >         ldr     q0, [x0, x3]
> >         sshr    v0.4s, v0.4s, 31
> >         str     q0, [x1, x3]
> >         add     x3, x3, 16
> >         cmp     x4, x3
> >         bne     .L4
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
>
> This should be okay (either a win or neutral) for Arm Cortex and Neoverse cores so I'm tempted to not ask for a CPU-specific tunable to guard it to keep the code clean.
> Andrew, would this change be okay from a Thunder X line perspective?

I don't know about ThunderX2 but here are the details for ThunderX1
(and OcteonX1) and OcteonX2:
The sshr and cmlt are handled the same in the pipeline as far as I can tell.

Thanks,
Andrew



> Thanks,
> Kyrill
>
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >       * config/aarch64/aarch64-simd.md (aarch64_simd_ashr<mode>):
> > Add case cmp
> >       case.
> >       * config/aarch64/constraints.md (D1): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/aarch64/shl-combine-2.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> > 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d574269829
> > 7ded861228ee22007 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -1127,12 +1127,14 @@ (define_insn "aarch64_simd_lshr<mode>"
> >  )
> >
> >  (define_insn "aarch64_simd_ashr<mode>"
> > - [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> > -       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> > -                  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > "Dr")))]
> > + [(set (match_operand:VDQ_I 0 "register_operand" "=w,w")
> > +       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w,w")
> > +                  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > "D1,Dr")))]
> >   "TARGET_SIMD"
> > - "sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> > -  [(set_attr "type" "neon_shift_imm<q>")]
> > + "@
> > +  cmlt\t%0.<Vtype>, %1.<Vtype>, #0
> > +  sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> > +  [(set_attr "type" "neon_compare<q>,neon_shift_imm<q>")]
> >  )
> >
> >  (define_insn "*aarch64_simd_sra<mode>"
> > diff --git a/gcc/config/aarch64/constraints.md
> > b/gcc/config/aarch64/constraints.md
> > index
> > 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a89
> > 9db69fd428dfb0d6 100644
> > --- a/gcc/config/aarch64/constraints.md
> > +++ b/gcc/config/aarch64/constraints.md
> > @@ -437,6 +437,14 @@ (define_constraint "Dl"
> >        (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
> >                                                true)")))
> >
> > +(define_constraint "D1"
> > +  "@internal
> > + A constraint that matches vector of immediates that is bits(mode)-1."
> > + (and (match_code "const,const_vector")
> > +      (match_test "aarch64_const_vec_all_same_in_range_p (op,
> > +                     GET_MODE_UNIT_BITSIZE (mode) - 1,
> > +                     GET_MODE_UNIT_BITSIZE (mode) - 1)")))
> > +
> >  (define_constraint "Dr"
> >    "@internal
> >   A constraint that matches vector of immediates for right shifts."
> > diff --git a/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..bdfe35d09ffccc7928947c9e
> > 57f1034f7ca2c798
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do assemble } */
> > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
> > +
> > +void e (int * restrict a, int *b, int n)
> > +{
> > +    for (int i = 0; i < n; i++)
> > +      b[i] = a[i] >> 31;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\tcmlt\t} 1 } } */
> > +/* { dg-final { scan-assembler-not {\tsshr\t} } } */
> > +
> >
> >
> > --
  
Kyrylo Tkachov Oct. 12, 2021, 12:19 p.m. UTC | #3
> -----Original Message-----
> From: Andrew Pinski <pinskia@gmail.com>
> Sent: Monday, October 11, 2021 8:56 PM
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org;
> apinski@marvell.com; Richard Earnshaw <Richard.Earnshaw@arm.com>; nd
> <nd@arm.com>; Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Richard
> Sandiford <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> 
> On Thu, Sep 30, 2021 at 2:28 AM Kyrylo Tkachov via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> > > -----Original Message-----
> > > From: Tamar Christina <Tamar.Christina@arm.com>
> > > Sent: Wednesday, September 29, 2021 5:20 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>;
> > > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> > > <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> > > <Richard.Sandiford@arm.com>
> > > Subject: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> > >
> > > Hi All,
> > >
> > > This optimizes signed right shift by BITSIZE-1 into a cmlt operation which
> is
> > > more optimal because generally compares have a higher throughput than
> > > shifts.
> > >
> > > On AArch64 the result of the shift would have been either -1 or 0 which is
> the
> > > results of the compare.
> > >
> > > i.e.
> > >
> > > void e (int * restrict a, int *b, int n)
> > > {
> > >     for (int i = 0; i < n; i++)
> > >       b[i] = a[i] >> 31;
> > > }
> > >
> > > now generates:
> > >
> > > .L4:
> > >         ldr     q0, [x0, x3]
> > >         cmlt    v0.4s, v0.4s, #0
> > >         str     q0, [x1, x3]
> > >         add     x3, x3, 16
> > >         cmp     x4, x3
> > >         bne     .L4
> > >
> > > instead of:
> > >
> > > .L4:
> > >         ldr     q0, [x0, x3]
> > >         sshr    v0.4s, v0.4s, 31
> > >         str     q0, [x1, x3]
> > >         add     x3, x3, 16
> > >         cmp     x4, x3
> > >         bne     .L4
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > This should be okay (either a win or neutral) for Arm Cortex and Neoverse
> cores so I'm tempted to not ask for a CPU-specific tunable to guard it to keep
> the code clean.
> > Andrew, would this change be okay from a Thunder X line perspective?
> 
> I don't know about ThunderX2 but here are the details for ThunderX1
> (and OcteonX1) and OcteonX2:
> The sshr and cmlt are handled the same in the pipeline as far as I can tell.
> 

Thanks for the info.
This patch is ok.
Kyrill

> Thanks,
> Andrew
> 
> 
> 
> > Thanks,
> > Kyrill
> >
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >       * config/aarch64/aarch64-simd.md (aarch64_simd_ashr<mode>):
> > > Add case cmp
> > >       case.
> > >       * config/aarch64/constraints.md (D1): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/aarch64/shl-combine-2.c: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > > b/gcc/config/aarch64/aarch64-simd.md
> > > index
> > >
> 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d574269829
> > > 7ded861228ee22007 100644
> > > --- a/gcc/config/aarch64/aarch64-simd.md
> > > +++ b/gcc/config/aarch64/aarch64-simd.md
> > > @@ -1127,12 +1127,14 @@ (define_insn "aarch64_simd_lshr<mode>"
> > >  )
> > >
> > >  (define_insn "aarch64_simd_ashr<mode>"
> > > - [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> > > -       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> > > -                  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > > "Dr")))]
> > > + [(set (match_operand:VDQ_I 0 "register_operand" "=w,w")
> > > +       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w,w")
> > > +                  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > > "D1,Dr")))]
> > >   "TARGET_SIMD"
> > > - "sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> > > -  [(set_attr "type" "neon_shift_imm<q>")]
> > > + "@
> > > +  cmlt\t%0.<Vtype>, %1.<Vtype>, #0
> > > +  sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> > > +  [(set_attr "type" "neon_compare<q>,neon_shift_imm<q>")]
> > >  )
> > >
> > >  (define_insn "*aarch64_simd_sra<mode>"
> > > diff --git a/gcc/config/aarch64/constraints.md
> > > b/gcc/config/aarch64/constraints.md
> > > index
> > >
> 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a89
> > > 9db69fd428dfb0d6 100644
> > > --- a/gcc/config/aarch64/constraints.md
> > > +++ b/gcc/config/aarch64/constraints.md
> > > @@ -437,6 +437,14 @@ (define_constraint "Dl"
> > >        (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
> > >                                                true)")))
> > >
> > > +(define_constraint "D1"
> > > +  "@internal
> > > + A constraint that matches vector of immediates that is bits(mode)-1."
> > > + (and (match_code "const,const_vector")
> > > +      (match_test "aarch64_const_vec_all_same_in_range_p (op,
> > > +                     GET_MODE_UNIT_BITSIZE (mode) - 1,
> > > +                     GET_MODE_UNIT_BITSIZE (mode) - 1)")))
> > > +
> > >  (define_constraint "Dr"
> > >    "@internal
> > >   A constraint that matches vector of immediates for right shifts."
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > > b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..bdfe35d09ffccc7928947c9e
> > > 57f1034f7ca2c798
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do assemble } */
> > > +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" }
> */
> > > +
> > > +void e (int * restrict a, int *b, int n)
> > > +{
> > > +    for (int i = 0; i < n; i++)
> > > +      b[i] = a[i] >> 31;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times {\tcmlt\t} 1 } } */
> > > +/* { dg-final { scan-assembler-not {\tsshr\t} } } */
> > > +
> > >
> > >
> > > --
  
Tamar Christina Oct. 12, 2021, 4:20 p.m. UTC | #4
Thanks,

Just archiving a version with more tests as requested.

I will assume the OK still stands.

Regards,
Tamar

> -----Original Message-----
> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Sent: Tuesday, October 12, 2021 1:19 PM
> To: Andrew Pinski <pinskia@gmail.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org;
> apinski@marvell.com; Richard Earnshaw <Richard.Earnshaw@arm.com>; nd
> <nd@arm.com>; Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Richard
> Sandiford <Richard.Sandiford@arm.com>
> Subject: RE: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> 
> 
> 
> > -----Original Message-----
> > From: Andrew Pinski <pinskia@gmail.com>
> > Sent: Monday, October 11, 2021 8:56 PM
> > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > Cc: Tamar Christina <Tamar.Christina@arm.com>;
> > gcc-patches@gcc.gnu.org; apinski@marvell.com; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; nd <nd@arm.com>; Marcus Shawcroft
> > <Marcus.Shawcroft@arm.com>; Richard Sandiford
> > <Richard.Sandiford@arm.com>
> > Subject: Re: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> >
> > On Thu, Sep 30, 2021 at 2:28 AM Kyrylo Tkachov via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > > > -----Original Message-----
> > > > From: Tamar Christina <Tamar.Christina@arm.com>
> > > > Sent: Wednesday, September 29, 2021 5:20 PM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: nd <nd@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>;
> > > > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> > > > <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> > > > <Richard.Sandiford@arm.com>
> > > > Subject: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> > > >
> > > > Hi All,
> > > >
> > > > This optimizes signed right shift by BITSIZE-1 into a cmlt
> > > > operation which
> > is
> > > > more optimal because generally compares have a higher throughput
> > > > than shifts.
> > > >
> > > > On AArch64 the result of the shift would have been either -1 or 0
> > > > which is
> > the
> > > > results of the compare.
> > > >
> > > > i.e.
> > > >
> > > > void e (int * restrict a, int *b, int n) {
> > > >     for (int i = 0; i < n; i++)
> > > >       b[i] = a[i] >> 31;
> > > > }
> > > >
> > > > now generates:
> > > >
> > > > .L4:
> > > >         ldr     q0, [x0, x3]
> > > >         cmlt    v0.4s, v0.4s, #0
> > > >         str     q0, [x1, x3]
> > > >         add     x3, x3, 16
> > > >         cmp     x4, x3
> > > >         bne     .L4
> > > >
> > > > instead of:
> > > >
> > > > .L4:
> > > >         ldr     q0, [x0, x3]
> > > >         sshr    v0.4s, v0.4s, 31
> > > >         str     q0, [x1, x3]
> > > >         add     x3, x3, 16
> > > >         cmp     x4, x3
> > > >         bne     .L4
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > This should be okay (either a win or neutral) for Arm Cortex and
> > > Neoverse
> > cores so I'm tempted to not ask for a CPU-specific tunable to guard it
> > to keep the code clean.
> > > Andrew, would this change be okay from a Thunder X line perspective?
> >
> > I don't know about ThunderX2 but here are the details for ThunderX1
> > (and OcteonX1) and OcteonX2:
> > The sshr and cmlt are handled the same in the pipeline as far as I can tell.
> >
> 
> Thanks for the info.
> This patch is ok.
> Kyrill
> 
> > Thanks,
> > Andrew
> >
> >
> >
> > > Thanks,
> > > Kyrill
> > >
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >       * config/aarch64/aarch64-simd.md (aarch64_simd_ashr<mode>):
> > > > Add case cmp
> > > >       case.
> > > >       * config/aarch64/constraints.md (D1): New.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >       * gcc.target/aarch64/shl-combine-2.c: New test.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > > > b/gcc/config/aarch64/aarch64-simd.md
> > > > index
> > > >
> > 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d574269829
> > > > 7ded861228ee22007 100644
> > > > --- a/gcc/config/aarch64/aarch64-simd.md
> > > > +++ b/gcc/config/aarch64/aarch64-simd.md
> > > > @@ -1127,12 +1127,14 @@ (define_insn "aarch64_simd_lshr<mode>"
> > > >  )
> > > >
> > > >  (define_insn "aarch64_simd_ashr<mode>"
> > > > - [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> > > > -       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> > > > -                  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > > > "Dr")))]
> > > > + [(set (match_operand:VDQ_I 0 "register_operand" "=w,w")
> > > > +       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand"
> "w,w")
> > > > +                  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > > > "D1,Dr")))]
> > > >   "TARGET_SIMD"
> > > > - "sshr\t%0.<Vtype>, %1.<Vtype>, %2"
> > > > -  [(set_attr "type" "neon_shift_imm<q>")]
> > > > + "@
> > > > +  cmlt\t%0.<Vtype>, %1.<Vtype>, #0  sshr\t%0.<Vtype>, %1.<Vtype>,
> > > > + %2"
> > > > +  [(set_attr "type" "neon_compare<q>,neon_shift_imm<q>")]
> > > >  )
> > > >
> > > >  (define_insn "*aarch64_simd_sra<mode>"
> > > > diff --git a/gcc/config/aarch64/constraints.md
> > > > b/gcc/config/aarch64/constraints.md
> > > > index
> > > >
> > 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a89
> > > > 9db69fd428dfb0d6 100644
> > > > --- a/gcc/config/aarch64/constraints.md
> > > > +++ b/gcc/config/aarch64/constraints.md
> > > > @@ -437,6 +437,14 @@ (define_constraint "Dl"
> > > >        (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
> > > >                                                true)")))
> > > >
> > > > +(define_constraint "D1"
> > > > +  "@internal
> > > > + A constraint that matches vector of immediates that is bits(mode)-1."
> > > > + (and (match_code "const,const_vector")
> > > > +      (match_test "aarch64_const_vec_all_same_in_range_p (op,
> > > > +                     GET_MODE_UNIT_BITSIZE (mode) - 1,
> > > > +                     GET_MODE_UNIT_BITSIZE (mode) - 1)")))
> > > > +
> > > >  (define_constraint "Dr"
> > > >    "@internal
> > > >   A constraint that matches vector of immediates for right shifts."
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > > > b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > > > new file mode 100644
> > > > index
> > > >
> > 0000000000000000000000000000000000000000..bdfe35d09ffccc7928947c9e
> > > > 57f1034f7ca2c798
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do assemble } */
> > > > +/* { dg-options "-O3 --save-temps
> > > > +--param=vect-epilogues-nomask=0" }
> > */
> > > > +
> > > > +void e (int * restrict a, int *b, int n) {
> > > > +    for (int i = 0; i < n; i++)
> > > > +      b[i] = a[i] >> 31;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-times {\tcmlt\t} 1 } } */
> > > > +/* { dg-final { scan-assembler-not {\tsshr\t} } } */
> > > > +
> > > >
> > > >
> > > > --
  

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d5742698297ded861228ee22007 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1127,12 +1127,14 @@  (define_insn "aarch64_simd_lshr<mode>"
 )
 
 (define_insn "aarch64_simd_ashr<mode>"
- [(set (match_operand:VDQ_I 0 "register_operand" "=w")
-       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
-		     (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm" "Dr")))]
+ [(set (match_operand:VDQ_I 0 "register_operand" "=w,w")
+       (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w,w")
+		     (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm" "D1,Dr")))]
  "TARGET_SIMD"
- "sshr\t%0.<Vtype>, %1.<Vtype>, %2"
-  [(set_attr "type" "neon_shift_imm<q>")]
+ "@
+  cmlt\t%0.<Vtype>, %1.<Vtype>, #0
+  sshr\t%0.<Vtype>, %1.<Vtype>, %2"
+  [(set_attr "type" "neon_compare<q>,neon_shift_imm<q>")]
 )
 
 (define_insn "*aarch64_simd_sra<mode>"
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a899db69fd428dfb0d6 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -437,6 +437,14 @@  (define_constraint "Dl"
       (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
 						 true)")))
 
+(define_constraint "D1"
+  "@internal
+ A constraint that matches vector of immediates that is bits(mode)-1."
+ (and (match_code "const,const_vector")
+      (match_test "aarch64_const_vec_all_same_in_range_p (op,
+			GET_MODE_UNIT_BITSIZE (mode) - 1,
+			GET_MODE_UNIT_BITSIZE (mode) - 1)")))
+
 (define_constraint "Dr"
   "@internal
  A constraint that matches vector of immediates for right shifts."
diff --git a/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..bdfe35d09ffccc7928947c9e57f1034f7ca2c798
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
@@ -0,0 +1,12 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
+
+void e (int * restrict a, int *b, int n)
+{
+    for (int i = 0; i < n; i++)
+      b[i] = a[i] >> 31;
+}
+
+/* { dg-final { scan-assembler-times {\tcmlt\t} 1 } } */
+/* { dg-final { scan-assembler-not {\tsshr\t} } } */
+