[1/2] middle-end Fold BIT_FIELD_REF and Shifts into BIT_FIELD_REFs alone

Message ID patch-15776-tamar@arm.com
State Deferred
Headers
Series [1/2] middle-end Fold BIT_FIELD_REF and Shifts into BIT_FIELD_REFs alone |

Commit Message

Tamar Christina Sept. 23, 2022, 11:42 a.m. UTC
  Hi All,

This adds a match.pd rule that can fold right shifts and bit_field_refs of
integers into just a bit_field_ref by adjusting the offset and the size of the
extract and adds an extend to the previous size.

Concretely turns:

#include <arm_neon.h>

unsigned int foor (uint32x4_t x)
{
    return x[1] >> 16;
}

which used to generate:

  _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
  _3 = _1 >> 16;

into

  _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
  _2 = (unsigned int) _4;

I currently limit the rewrite to only doing it if the resulting extract is in
a mode the target supports. i.e. it won't rewrite it to extract say 13-bits
because I worry that for targets that won't have a bitfield extract instruction
this may be a de-optimization.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Testcase are added in patch 2/2.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* match.pd: Add bitfield and shift folding.

--- inline copy of patch -- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03761544bfd499c01 100644




--
diff --git a/gcc/match.pd b/gcc/match.pd
index 1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03761544bfd499c01 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0)))
   (IFN_REDUC_PLUS_WIDEN @0)))
 
+/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */
+(for shift (rshift)
+     op (plus)
+ (simplify
+  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
+  (if (INTEGRAL_TYPE_P (type))
+   (with { /* Can't use wide-int here as the precision differs between
+	      @1 and @3.  */
+	   unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
+	   unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
+	   unsigned HOST_WIDE_INT newsize = size - shiftc;
+	   tree nsize = wide_int_to_tree (bitsizetype, newsize);
+	   tree ntype
+	     = build_nonstandard_integer_type (newsize, 1); }
+    (if (ntype)
+     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2 @3))))))))
+
 (simplify
  (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
  (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4); }))
  

Comments

Jeff Law Sept. 24, 2022, 6:38 p.m. UTC | #1
On 9/23/22 05:42, Tamar Christina wrote:
> Hi All,
>
> This adds a match.pd rule that can fold right shifts and bit_field_refs of
> integers into just a bit_field_ref by adjusting the offset and the size of the
> extract and adds an extend to the previous size.
>
> Concretely turns:
>
> #include <arm_neon.h>
>
> unsigned int foor (uint32x4_t x)
> {
>      return x[1] >> 16;
> }
>
> which used to generate:
>
>    _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
>    _3 = _1 >> 16;
>
> into
>
>    _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
>    _2 = (unsigned int) _4;
>
> I currently limit the rewrite to only doing it if the resulting extract is in
> a mode the target supports. i.e. it won't rewrite it to extract say 13-bits
> because I worry that for targets that won't have a bitfield extract instruction
> this may be a de-optimization.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Testcase are added in patch 2/2.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* match.pd: Add bitfield and shift folding.

Were you planning to handle left shifts as well?  It looks like it since 
you've got iterations for the shift opcode and corresponding adjustment 
to the field, but they currently only handle rshift/plus.


Jeff
  
Andrew Pinski Sept. 24, 2022, 6:57 p.m. UTC | #2
On Fri, Sep 23, 2022 at 4:43 AM Tamar Christina via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi All,
>
> This adds a match.pd rule that can fold right shifts and bit_field_refs of
> integers into just a bit_field_ref by adjusting the offset and the size of the
> extract and adds an extend to the previous size.
>
> Concretely turns:
>
> #include <arm_neon.h>
>
> unsigned int foor (uint32x4_t x)
> {
>     return x[1] >> 16;
> }
>
> which used to generate:
>
>   _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
>   _3 = _1 >> 16;
>
> into
>
>   _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
>   _2 = (unsigned int) _4;
>
> I currently limit the rewrite to only doing it if the resulting extract is in
> a mode the target supports. i.e. it won't rewrite it to extract say 13-bits
> because I worry that for targets that won't have a bitfield extract instruction
> this may be a de-optimization.

It is only a de-optimization for the following case:
* vector extraction

All other cases should be handled correctly in the middle-end when
expanding to RTL because they need to be handled for bit-fields
anyways.
Plus SIGN_EXTRACT and ZERO_EXTRACT would be used in the integer case
for the RTL.
Getting SIGN_EXTRACT/ZERO_EXTRACT early on in the RTL is better than
waiting until combine really.


>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Testcase are added in patch 2/2.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         * match.pd: Add bitfield and shift folding.
>
> --- inline copy of patch --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03761544bfd499c01 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0)))
>    (IFN_REDUC_PLUS_WIDEN @0)))
>
> +/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */
> +(for shift (rshift)
> +     op (plus)
> + (simplify
> +  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
> +  (if (INTEGRAL_TYPE_P (type))
> +   (with { /* Can't use wide-int here as the precision differs between
> +             @1 and @3.  */
> +          unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
> +          unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
> +          unsigned HOST_WIDE_INT newsize = size - shiftc;
> +          tree nsize = wide_int_to_tree (bitsizetype, newsize);
> +          tree ntype
> +            = build_nonstandard_integer_type (newsize, 1); }

Maybe use `build_nonstandard_integer_type (newsize, /* unsignedp = */ true);`
or better yet `build_nonstandard_integer_type (newsize, UNSIGNED);`

I had started to convert some of the unsignedp into enum signop but I
never finished or submitted the patch.

Thanks,
Andrew Pinski


> +    (if (ntype)
> +     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2 @3))))))))
> +
>  (simplify
>   (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
>   (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4); }))
>
>
>
>
> --
  
Tamar Christina Sept. 26, 2022, 4:55 a.m. UTC | #3
> -----Original Message-----
> From: Andrew Pinski <pinskia@gmail.com>
> Sent: Saturday, September 24, 2022 8:57 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de
> Subject: Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into
> BIT_FIELD_REFs alone
> 
> On Fri, Sep 23, 2022 at 4:43 AM Tamar Christina via Gcc-patches <gcc-
> patches@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This adds a match.pd rule that can fold right shifts and
> > bit_field_refs of integers into just a bit_field_ref by adjusting the
> > offset and the size of the extract and adds an extend to the previous size.
> >
> > Concretely turns:
> >
> > #include <arm_neon.h>
> >
> > unsigned int foor (uint32x4_t x)
> > {
> >     return x[1] >> 16;
> > }
> >
> > which used to generate:
> >
> >   _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
> >   _3 = _1 >> 16;
> >
> > into
> >
> >   _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
> >   _2 = (unsigned int) _4;
> >
> > I currently limit the rewrite to only doing it if the resulting
> > extract is in a mode the target supports. i.e. it won't rewrite it to
> > extract say 13-bits because I worry that for targets that won't have a
> > bitfield extract instruction this may be a de-optimization.
> 
> It is only a de-optimization for the following case:
> * vector extraction
> 
> All other cases should be handled correctly in the middle-end when
> expanding to RTL because they need to be handled for bit-fields anyways.
> Plus SIGN_EXTRACT and ZERO_EXTRACT would be used in the integer case
> for the RTL.
> Getting SIGN_EXTRACT/ZERO_EXTRACT early on in the RTL is better than
> waiting until combine really.
> 

Fair enough, I've dropped the constraint.

> 
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Testcase are added in patch 2/2.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >         * match.pd: Add bitfield and shift folding.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> >
> 1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03
> 761
> > 544bfd499c01 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >        && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P
> (TREE_TYPE(@0)))
> >    (IFN_REDUC_PLUS_WIDEN @0)))
> >
> > +/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */ (for
> > +shift (rshift)
> > +     op (plus)
> > + (simplify
> > +  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
> > +  (if (INTEGRAL_TYPE_P (type))
> > +   (with { /* Can't use wide-int here as the precision differs between
> > +             @1 and @3.  */
> > +          unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
> > +          unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
> > +          unsigned HOST_WIDE_INT newsize = size - shiftc;
> > +          tree nsize = wide_int_to_tree (bitsizetype, newsize);
> > +          tree ntype
> > +            = build_nonstandard_integer_type (newsize, 1); }
> 
> Maybe use `build_nonstandard_integer_type (newsize, /* unsignedp = */
> true);` or better yet `build_nonstandard_integer_type (newsize,
> UNSIGNED);`

Ah, will do,
Tamar.

> 
> I had started to convert some of the unsignedp into enum signop but I never
> finished or submitted the patch.
> 
> Thanks,
> Andrew Pinski
> 
> 
> > +    (if (ntype)
> > +     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2
> > + @3))))))))
> > +
> >  (simplify
> >   (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
> >   (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4);
> > }))
> >
> >
> >
> >
> > --
  
Richard Biener Sept. 26, 2022, 8:05 a.m. UTC | #4
On Mon, 26 Sep 2022, Tamar Christina wrote:

> > -----Original Message-----
> > From: Andrew Pinski <pinskia@gmail.com>
> > Sent: Saturday, September 24, 2022 8:57 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de
> > Subject: Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into
> > BIT_FIELD_REFs alone
> > 
> > On Fri, Sep 23, 2022 at 4:43 AM Tamar Christina via Gcc-patches <gcc-
> > patches@gcc.gnu.org> wrote:
> > >
> > > Hi All,
> > >
> > > This adds a match.pd rule that can fold right shifts and
> > > bit_field_refs of integers into just a bit_field_ref by adjusting the
> > > offset and the size of the extract and adds an extend to the previous size.
> > >
> > > Concretely turns:
> > >
> > > #include <arm_neon.h>
> > >
> > > unsigned int foor (uint32x4_t x)
> > > {
> > >     return x[1] >> 16;
> > > }
> > >
> > > which used to generate:
> > >
> > >   _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
> > >   _3 = _1 >> 16;
> > >
> > > into
> > >
> > >   _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
> > >   _2 = (unsigned int) _4;
> > >
> > > I currently limit the rewrite to only doing it if the resulting
> > > extract is in a mode the target supports. i.e. it won't rewrite it to
> > > extract say 13-bits because I worry that for targets that won't have a
> > > bitfield extract instruction this may be a de-optimization.
> > 
> > It is only a de-optimization for the following case:
> > * vector extraction
> > 
> > All other cases should be handled correctly in the middle-end when
> > expanding to RTL because they need to be handled for bit-fields anyways.
> > Plus SIGN_EXTRACT and ZERO_EXTRACT would be used in the integer case
> > for the RTL.
> > Getting SIGN_EXTRACT/ZERO_EXTRACT early on in the RTL is better than
> > waiting until combine really.
> > 
> 
> Fair enough, I've dropped the constraint.
> 
> > 
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > and no issues.
> > >
> > > Testcase are added in patch 2/2.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >         * match.pd: Add bitfield and shift folding.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > >
> > 1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03
> > 761
> > > 544bfd499c01 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >        && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P
> > (TREE_TYPE(@0)))
> > >    (IFN_REDUC_PLUS_WIDEN @0)))
> > >
> > > +/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */
> > > (for
> > > +shift (rshift)
> > > +     op (plus)

why have a for when you only iterate over a single operation?!  And 'op'
seems unused?

> > > + (simplify
> > > +  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
> > > +  (if (INTEGRAL_TYPE_P (type))
> > > +   (with { /* Can't use wide-int here as the precision differs between
> > > +             @1 and @3.  */
> > > +          unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
> > > +          unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);

But you should then test tree_fits_uhwi_p.

> > > +          unsigned HOST_WIDE_INT newsize = size - shiftc;
> > > +          tree nsize = wide_int_to_tree (bitsizetype, newsize);
> > > +          tree ntype
> > > +            = build_nonstandard_integer_type (newsize, 1); }

build_nonstandard_integer_type never fails so I don't see how
you "limit" this to extractions fitting a mode.

I'm quite sure this breaks with BYTES_BIG_ENDIAN.  Please try
BIT_FIELD_REF _offsets_ that make the extraction cross byte
boundaries.

Also I'm missing a testcase?

Thanks,
Richard.

> > Maybe use `build_nonstandard_integer_type (newsize, /* unsignedp = */
> > true);` or better yet `build_nonstandard_integer_type (newsize,
> > UNSIGNED);`
> 
> Ah, will do,
> Tamar.
> 
> > 
> > I had started to convert some of the unsignedp into enum signop but I never
> > finished or submitted the patch.
> > 
> > Thanks,
> > Andrew Pinski
> > 
> > 
> > > +    (if (ntype)
> > > +     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2
> > > + @3))))))))
> > > +
> > >  (simplify
> > >   (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
> > >   (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4);
> > > }))
> > >
> > >
> > >
> > >
> > > --
>
  
Andrew Pinski Sept. 26, 2022, 3:24 p.m. UTC | #5
On Sun, Sep 25, 2022 at 9:56 PM Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> > -----Original Message-----
> > From: Andrew Pinski <pinskia@gmail.com>
> > Sent: Saturday, September 24, 2022 8:57 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de
> > Subject: Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into
> > BIT_FIELD_REFs alone
> >
> > On Fri, Sep 23, 2022 at 4:43 AM Tamar Christina via Gcc-patches <gcc-
> > patches@gcc.gnu.org> wrote:
> > >
> > > Hi All,
> > >
> > > This adds a match.pd rule that can fold right shifts and
> > > bit_field_refs of integers into just a bit_field_ref by adjusting the
> > > offset and the size of the extract and adds an extend to the previous size.
> > >
> > > Concretely turns:
> > >
> > > #include <arm_neon.h>
> > >
> > > unsigned int foor (uint32x4_t x)
> > > {
> > >     return x[1] >> 16;
> > > }
> > >
> > > which used to generate:
> > >
> > >   _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
> > >   _3 = _1 >> 16;
> > >
> > > into
> > >
> > >   _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
> > >   _2 = (unsigned int) _4;
> > >
> > > I currently limit the rewrite to only doing it if the resulting
> > > extract is in a mode the target supports. i.e. it won't rewrite it to
> > > extract say 13-bits because I worry that for targets that won't have a
> > > bitfield extract instruction this may be a de-optimization.
> >
> > It is only a de-optimization for the following case:
> > * vector extraction
> >
> > All other cases should be handled correctly in the middle-end when
> > expanding to RTL because they need to be handled for bit-fields anyways.
> > Plus SIGN_EXTRACT and ZERO_EXTRACT would be used in the integer case
> > for the RTL.
> > Getting SIGN_EXTRACT/ZERO_EXTRACT early on in the RTL is better than
> > waiting until combine really.
> >
>
> Fair enough, I've dropped the constraint.

Well the constraint should be done still for VECTOR_TYPE I think.
Attached is what I had done for left shift for integer types.
Note the BYTES_BIG_ENDIAN part which you missed for the right shift case.

Thanks,
Andrew Pinski

>
> >
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > and no issues.
> > >
> > > Testcase are added in patch 2/2.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >         * match.pd: Add bitfield and shift folding.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > >
> > 1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03
> > 761
> > > 544bfd499c01 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >        && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P
> > (TREE_TYPE(@0)))
> > >    (IFN_REDUC_PLUS_WIDEN @0)))
> > >
> > > +/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */ (for
> > > +shift (rshift)
> > > +     op (plus)
> > > + (simplify
> > > +  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
> > > +  (if (INTEGRAL_TYPE_P (type))
> > > +   (with { /* Can't use wide-int here as the precision differs between
> > > +             @1 and @3.  */
> > > +          unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
> > > +          unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
> > > +          unsigned HOST_WIDE_INT newsize = size - shiftc;
> > > +          tree nsize = wide_int_to_tree (bitsizetype, newsize);
> > > +          tree ntype
> > > +            = build_nonstandard_integer_type (newsize, 1); }
> >
> > Maybe use `build_nonstandard_integer_type (newsize, /* unsignedp = */
> > true);` or better yet `build_nonstandard_integer_type (newsize,
> > UNSIGNED);`
>
> Ah, will do,
> Tamar.
>
> >
> > I had started to convert some of the unsignedp into enum signop but I never
> > finished or submitted the patch.
> >
> > Thanks,
> > Andrew Pinski
> >
> >
> > > +    (if (ntype)
> > > +     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2
> > > + @3))))))))
> > > +
> > >  (simplify
> > >   (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
> > >   (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4);
> > > }))
> > >
> > >
> > >
> > >
> > > --
From ed7c08c4d565bd4418cf2dce3bbfecc18fdd42a2 Mon Sep 17 00:00:00 2001
From: Andrew Pinski <apinski@marvell.com>
Date: Wed, 25 Dec 2019 01:20:13 +0000
Subject: [PATCH] Add simplification of shift of a bit_field.

We can simplify a shift of a bit_field_ref to
a shift of an and (note sometimes the shift can
be removed).

Change-Id: I1a9f3fc87889ecd7cf569272405b6ee7dd5f8d7b
Signed-off-by: Andrew Pinski <apinski@marvell.com>
---

diff --git a/gcc/match.pd b/gcc/match.pd
index cb981ec..e4f6d47 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6071,6 +6071,34 @@
     (cmp (bit_and @0 { wide_int_to_tree (type1, mask); })
          { wide_int_to_tree (type1, cst); })))))
 
+/* lshift<bitfield<>> -> shift(bit_and(@0, mask)) */
+(simplify
+ (lshift (convert (BIT_FIELD_REF@bit @0 @bitsize @bitpos)) INTEGER_CST@1)
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && tree_fits_uhwi_p (@1)
+      && (tree_nop_conversion_p (type, TREE_TYPE (@0))
+	  || (TYPE_UNSIGNED (TREE_TYPE (@0))
+	      && TYPE_UNSIGNED (TREE_TYPE (@bit))
+	      && TYPE_UNSIGNED (type)
+	      && TYPE_PRECISION (type) > tree_to_uhwi (@bitsize))))
+  (with
+   {
+     unsigned HOST_WIDE_INT bitpos = tree_to_uhwi (@bitpos);
+     unsigned HOST_WIDE_INT bitsize = tree_to_uhwi (@bitsize);
+     if (BYTES_BIG_ENDIAN)
+       bitpos = TYPE_PRECISION (TREE_TYPE (@0)) - bitpos - bitsize;
+     wide_int wmask = wi::shifted_mask (bitpos, bitsize, false, TYPE_PRECISION (type));
+   }
+   (switch
+    (if (tree_to_uhwi (@1) == bitpos)
+     (bit_and (convert @0) { wide_int_to_tree (type, wmask); }))
+    (if (tree_to_uhwi (@1) > bitpos)
+     (lshift (bit_and (convert @0) { wide_int_to_tree (type, wmask); })
+	     { wide_int_to_tree (integer_type_node, tree_to_uhwi (@1) - bitpos); } ))
+    (if (tree_to_uhwi (@1) < bitpos)
+     (rshift (bit_and (convert @0) { wide_int_to_tree (type, wmask); })
+	     { wide_int_to_tree (integer_type_node, bitpos - tree_to_uhwi (@1)); } ))))))
 
 (if (canonicalize_math_after_vectorization_p ())
  (for fmas (FMA)
  
Richard Biener Sept. 27, 2022, 12:40 p.m. UTC | #6
On Mon, Sep 26, 2022 at 5:25 PM Andrew Pinski via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Sun, Sep 25, 2022 at 9:56 PM Tamar Christina <Tamar.Christina@arm.com> wrote:
> >
> > > -----Original Message-----
> > > From: Andrew Pinski <pinskia@gmail.com>
> > > Sent: Saturday, September 24, 2022 8:57 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de
> > > Subject: Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into
> > > BIT_FIELD_REFs alone
> > >
> > > On Fri, Sep 23, 2022 at 4:43 AM Tamar Christina via Gcc-patches <gcc-
> > > patches@gcc.gnu.org> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > This adds a match.pd rule that can fold right shifts and
> > > > bit_field_refs of integers into just a bit_field_ref by adjusting the
> > > > offset and the size of the extract and adds an extend to the previous size.
> > > >
> > > > Concretely turns:
> > > >
> > > > #include <arm_neon.h>
> > > >
> > > > unsigned int foor (uint32x4_t x)
> > > > {
> > > >     return x[1] >> 16;
> > > > }
> > > >
> > > > which used to generate:
> > > >
> > > >   _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
> > > >   _3 = _1 >> 16;
> > > >
> > > > into
> > > >
> > > >   _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
> > > >   _2 = (unsigned int) _4;
> > > >
> > > > I currently limit the rewrite to only doing it if the resulting
> > > > extract is in a mode the target supports. i.e. it won't rewrite it to
> > > > extract say 13-bits because I worry that for targets that won't have a
> > > > bitfield extract instruction this may be a de-optimization.
> > >
> > > It is only a de-optimization for the following case:
> > > * vector extraction
> > >
> > > All other cases should be handled correctly in the middle-end when
> > > expanding to RTL because they need to be handled for bit-fields anyways.
> > > Plus SIGN_EXTRACT and ZERO_EXTRACT would be used in the integer case
> > > for the RTL.
> > > Getting SIGN_EXTRACT/ZERO_EXTRACT early on in the RTL is better than
> > > waiting until combine really.
> > >
> >
> > Fair enough, I've dropped the constraint.
>
> Well the constraint should be done still for VECTOR_TYPE I think.
> Attached is what I had done for left shift for integer types.
> Note the BYTES_BIG_ENDIAN part which you missed for the right shift case.

Note we formerly had BIT_FIELD_REF_UNSIGNED and allowed the precision
of the TREE_TYPE of the BIT_FIELD_REF to not match the extracted size.  That
might have mapped directly to zero/sign_extract.

Now that this is no more we should think of a canonical way to express this
and make sure we can synthesize those early.

Richard.

> Thanks,
> Andrew Pinski
>
> >
> > >
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > > and no issues.
> > > >
> > > > Testcase are added in patch 2/2.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * match.pd: Add bitfield and shift folding.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > > >
> > > 1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03
> > > 761
> > > > 544bfd499c01 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >        && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P
> > > (TREE_TYPE(@0)))
> > > >    (IFN_REDUC_PLUS_WIDEN @0)))
> > > >
> > > > +/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */ (for
> > > > +shift (rshift)
> > > > +     op (plus)
> > > > + (simplify
> > > > +  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
> > > > +  (if (INTEGRAL_TYPE_P (type))
> > > > +   (with { /* Can't use wide-int here as the precision differs between
> > > > +             @1 and @3.  */
> > > > +          unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
> > > > +          unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
> > > > +          unsigned HOST_WIDE_INT newsize = size - shiftc;
> > > > +          tree nsize = wide_int_to_tree (bitsizetype, newsize);
> > > > +          tree ntype
> > > > +            = build_nonstandard_integer_type (newsize, 1); }
> > >
> > > Maybe use `build_nonstandard_integer_type (newsize, /* unsignedp = */
> > > true);` or better yet `build_nonstandard_integer_type (newsize,
> > > UNSIGNED);`
> >
> > Ah, will do,
> > Tamar.
> >
> > >
> > > I had started to convert some of the unsignedp into enum signop but I never
> > > finished or submitted the patch.
> > >
> > > Thanks,
> > > Andrew Pinski
> > >
> > >
> > > > +    (if (ntype)
> > > > +     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2
> > > > + @3))))))))
> > > > +
> > > >  (simplify
> > > >   (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
> > > >   (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4);
> > > > }))
> > > >
> > > >
> > > >
> > > >
> > > > --
  
Tamar Christina Sept. 28, 2022, 1:19 p.m. UTC | #7
> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Saturday, September 24, 2022 8:38 PM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de
> Subject: Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into
> BIT_FIELD_REFs alone
> 
> 
> On 9/23/22 05:42, Tamar Christina wrote:
> > Hi All,
> >
> > This adds a match.pd rule that can fold right shifts and
> > bit_field_refs of integers into just a bit_field_ref by adjusting the
> > offset and the size of the extract and adds an extend to the previous size.
> >
> > Concretely turns:
> >
> > #include <arm_neon.h>
> >
> > unsigned int foor (uint32x4_t x)
> > {
> >      return x[1] >> 16;
> > }
> >
> > which used to generate:
> >
> >    _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
> >    _3 = _1 >> 16;
> >
> > into
> >
> >    _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
> >    _2 = (unsigned int) _4;
> >
> > I currently limit the rewrite to only doing it if the resulting
> > extract is in a mode the target supports. i.e. it won't rewrite it to
> > extract say 13-bits because I worry that for targets that won't have a
> > bitfield extract instruction this may be a de-optimization.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Testcase are added in patch 2/2.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* match.pd: Add bitfield and shift folding.
> 
> Were you planning to handle left shifts as well?  It looks like it since you've
> got iterations for the shift opcode and corresponding adjustment to the field,
> but they currently only handle rshift/plus.
> 

Hmm do left shifts work here? Since a left shift would increase the size of the
resulting value by adding zeros to the end of the number, so you can't increase
the size of the bitfield to do the same.

I did however realize that truncating casts have the same effect as a right shift,
so I have added that now.

Tamar

> 
> Jeff
>
  
Jeff Law Sept. 28, 2022, 5:25 p.m. UTC | #8
On 9/28/22 07:19, Tamar Christina wrote:
>> -----Original Message-----
>> From: Jeff Law <jeffreyalaw@gmail.com>
>> Sent: Saturday, September 24, 2022 8:38 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
>> Cc: nd <nd@arm.com>; rguenther@suse.de
>> Subject: Re: [PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into
>> BIT_FIELD_REFs alone
>>
>>
>> On 9/23/22 05:42, Tamar Christina wrote:
>>> Hi All,
>>>
>>> This adds a match.pd rule that can fold right shifts and
>>> bit_field_refs of integers into just a bit_field_ref by adjusting the
>>> offset and the size of the extract and adds an extend to the previous size.
>>>
>>> Concretely turns:
>>>
>>> #include <arm_neon.h>
>>>
>>> unsigned int foor (uint32x4_t x)
>>> {
>>>       return x[1] >> 16;
>>> }
>>>
>>> which used to generate:
>>>
>>>     _1 = BIT_FIELD_REF <x_2(D), 32, 32>;
>>>     _3 = _1 >> 16;
>>>
>>> into
>>>
>>>     _4 = BIT_FIELD_REF <x_1(D), 16, 48>;
>>>     _2 = (unsigned int) _4;
>>>
>>> I currently limit the rewrite to only doing it if the resulting
>>> extract is in a mode the target supports. i.e. it won't rewrite it to
>>> extract say 13-bits because I worry that for targets that won't have a
>>> bitfield extract instruction this may be a de-optimization.
>>>
>>> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
>>> and no issues.
>>>
>>> Testcase are added in patch 2/2.
>>>
>>> Ok for master?
>>>
>>> Thanks,
>>> Tamar
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* match.pd: Add bitfield and shift folding.
>> Were you planning to handle left shifts as well?  It looks like it since you've
>> got iterations for the shift opcode and corresponding adjustment to the field,
>> but they currently only handle rshift/plus.
>>
> Hmm do left shifts work here? Since a left shift would increase the size of the
> resulting value by adding zeros to the end of the number, so you can't increase
> the size of the bitfield to do the same.

Dunno, I hadn't really thought about it.  It just looked like you were 
prepared to handle more cases with those iterators.


>
> I did however realize that truncating casts have the same effect as a right shift,
> so I have added that now.

ACK.

jeff
  
Tamar Christina Oct. 31, 2022, 11:51 a.m. UTC | #9
Hi All,

Here's a respin addressing review comments.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* match.pd: Add bitfield and shift folding.

gcc/testsuite/ChangeLog:

	* gcc.dg/bitshift_1.c: New.
	* gcc.dg/bitshift_2.c: New.

--- inline copy of patch ---

diff --git a/gcc/match.pd b/gcc/match.pd
index 70e90cdbfa902830e6b58be84e114e86ff7b4dff..a4ad465b2b074b21835be74732dce295f8db03bc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7245,6 +7245,45 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0)))
   (IFN_REDUC_PLUS_WIDEN @0)))
 
+/* Canonicalize BIT_FIELD_REFS and right shift to BIT_FIELD_REFS.  */
+(simplify
+ (rshift (BIT_FIELD_REF @0 @1 @2) INTEGER_CST@3)
+ (if (INTEGRAL_TYPE_P (type)
+      && tree_fits_uhwi_p (@1)
+      && tree_fits_uhwi_p (@3))
+  (with { /* Can't use wide-int here as the precision differs between
+	     @1 and @3.  */
+	  unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
+	  unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
+	  unsigned HOST_WIDE_INT newsize = size - shiftc;
+	  tree nsize = wide_int_to_tree (bitsizetype, newsize);
+	  tree ntype
+	    = build_nonstandard_integer_type (newsize, TYPE_UNSIGNED (type)); }
+   (switch
+    (if (INTEGRAL_TYPE_P (ntype) && !BYTES_BIG_ENDIAN)
+     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (plus @2 @3))))
+    (if (INTEGRAL_TYPE_P (ntype) && BYTES_BIG_ENDIAN)
+     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (minus @2 @3))))))))
+
+/* Canonicalize BIT_FIELD_REFS and converts to BIT_FIELD_REFS.  */
+(simplify
+ (convert (BIT_FIELD_REF@3 @0 @1 @2))
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@3)))
+  (with { unsigned int size_inner = element_precision (TREE_TYPE (@3));
+	  unsigned int size_outer  = element_precision (type); }
+   (if (size_inner > size_outer)
+    /* Truncating convert, we can shrink the bit field similar to the
+        shift case.  */
+    (with {
+	    tree nsize = wide_int_to_tree (bitsizetype, size_outer);
+	    auto sign = TYPE_UNSIGNED (type);
+	    tree ntype
+	      = build_nonstandard_integer_type (size_outer, sign);
+	    gcc_assert (useless_type_conversion_p (type, ntype)); }
+     (if (INTEGRAL_TYPE_P (ntype))
+      (BIT_FIELD_REF:ntype @0 { nsize; } @2)))))))
+
 (simplify
  (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
  (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4); }))
diff --git a/gcc/testsuite/gcc.dg/bitshift_1.c b/gcc/testsuite/gcc.dg/bitshift_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..5995d0746d2301eb48304629cb4b779b079f1270
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/bitshift_1.c
@@ -0,0 +1,50 @@
+/* { dg-do compile { target le } } */
+/* { dg-additional-options "-O2 -save-temps -fdump-tree-optimized" } */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef unsigned int v4usi __attribute__ ((vector_size (16)));
+typedef unsigned short v8uhi __attribute__ ((vector_size (16)));
+
+unsigned int foor (v4usi x)
+{
+    return x[1] >> 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 48>;} "optimized" } } */
+
+unsigned int fool (v4usi x)
+{
+    return x[1] << 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 32, 32>;} "optimized" } } */
+
+unsigned short foor2 (v4usi x)
+{
+    return x[3] >> 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 112>;} "optimized" } } */
+
+unsigned int fool2 (v4usi x)
+{
+    return x[0] << 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 32, 0>;} "optimized" } } */
+
+unsigned char foor3 (v8uhi x)
+{
+    return x[3] >> 9;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 7, 57>;} "optimized" } } */
+
+unsigned short fool3 (v8uhi x)
+{
+    return x[0] << 9;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 0>;} "optimized" } } */
+
+unsigned short foo2 (v4si x)
+{
+  int y = x[0] + x[1];
+  return y >> 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 64, 0>;} "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/bitshift_2.c b/gcc/testsuite/gcc.dg/bitshift_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..406b4def9d4aebbc83bd5bef92dab825b85f2aa4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/bitshift_2.c
@@ -0,0 +1,49 @@
+/* { dg-do compile { target be } } */
+/* { dg-additional-options "-O2 -save-temps -fdump-tree-optimized" } */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef unsigned int v4usi __attribute__ ((vector_size (16)));
+typedef unsigned short v8uhi __attribute__ ((vector_size (16)));
+
+unsigned int foor (v4usi x)
+{
+    return x[1] >> 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 16>;} "optimized" } } */
+
+unsigned int fool (v4usi x)
+{
+    return x[1] << 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 32, 32>;} "optimized" } } */
+
+unsigned short foor2 (v4usi x)
+{
+    return x[3] >> 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 80>;} "optimized" } } */
+
+unsigned int fool2 (v4usi x)
+{
+    return x[0] << 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 32, 0>;} "optimized" } } */
+
+unsigned char foor3 (v8uhi x)
+{
+    return x[3] >> 9;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 7, 39>;} "optimized" } } */
+
+unsigned short fool3 (v8uhi x)
+{
+    return x[0] << 9;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 0>;} "optimized" } } */
+
+unsigned short foo2 (v4si x)
+{
+  int y = x[0] + x[1];
+  return y >> 16;
+}
+/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 64, 0>;} "optimized" } } */
  
Jeff Law Oct. 31, 2022, 4:24 p.m. UTC | #10
On 10/31/22 05:51, Tamar Christina via Gcc-patches wrote:
> Hi All,
>
> Here's a respin addressing review comments.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* match.pd: Add bitfield and shift folding.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/bitshift_1.c: New.
> 	* gcc.dg/bitshift_2.c: New.

OK

jeff
  
Richard Biener Nov. 7, 2022, 1:29 p.m. UTC | #11
On Mon, 31 Oct 2022, Tamar Christina wrote:

> Hi All,
> 
> Here's a respin addressing review comments.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* match.pd: Add bitfield and shift folding.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/bitshift_1.c: New.
> 	* gcc.dg/bitshift_2.c: New.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 70e90cdbfa902830e6b58be84e114e86ff7b4dff..a4ad465b2b074b21835be74732dce295f8db03bc 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7245,6 +7245,45 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0)))
>    (IFN_REDUC_PLUS_WIDEN @0)))
>  
> +/* Canonicalize BIT_FIELD_REFS and right shift to BIT_FIELD_REFS.  */
> +(simplify
> + (rshift (BIT_FIELD_REF @0 @1 @2) INTEGER_CST@3)
> + (if (INTEGRAL_TYPE_P (type)
> +      && tree_fits_uhwi_p (@1)
> +      && tree_fits_uhwi_p (@3))
> +  (with { /* Can't use wide-int here as the precision differs between
> +	     @1 and @3.  */
> +	  unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
> +	  unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
> +	  unsigned HOST_WIDE_INT newsize = size - shiftc;
> +	  tree nsize = wide_int_to_tree (bitsizetype, newsize);
> +	  tree ntype
> +	    = build_nonstandard_integer_type (newsize, TYPE_UNSIGNED (type)); }
> +   (switch
> +    (if (INTEGRAL_TYPE_P (ntype) && !BYTES_BIG_ENDIAN)
> +     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (plus @2 @3))))

the :type is not necessary.  Don't you need to verify that (plus/minus @2 
@3) is in bounds?

> +    (if (INTEGRAL_TYPE_P (ntype) && BYTES_BIG_ENDIAN)
> +     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (minus @2 @3))))))))
> +
> +/* Canonicalize BIT_FIELD_REFS and converts to BIT_FIELD_REFS.  */
> +(simplify
> + (convert (BIT_FIELD_REF@3 @0 @1 @2))
> + (if (INTEGRAL_TYPE_P (type)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (@3)))
> +  (with { unsigned int size_inner = element_precision (TREE_TYPE (@3));
> +	  unsigned int size_outer  = element_precision (type); }

since you check for INTEGRAL_TYPE_P using element_precision is odd,
just use TYPE_PRECISION here.

> +   (if (size_inner > size_outer)
> +    /* Truncating convert, we can shrink the bit field similar to the
> +        shift case.  */
> +    (with {
> +	    tree nsize = wide_int_to_tree (bitsizetype, size_outer);

bitsize_int

> +	    auto sign = TYPE_UNSIGNED (type);
> +	    tree ntype
> +	      = build_nonstandard_integer_type (size_outer, sign);
> +	    gcc_assert (useless_type_conversion_p (type, ntype)); }

if it's the same type why re-build it?

> +     (if (INTEGRAL_TYPE_P (ntype))

since you build a nonstandard integer type that's always going
to be INTEGRAL_TYPE_P.

> +      (BIT_FIELD_REF:ntype @0 { nsize; } @2)))))))

so why not simply

  (if (size_inner > size_outer)
   (BIT_FIELD_REF @0 { bitsize_int (size_outer); } @2))

?

> +
>  (simplify
>   (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
>   (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4); }))
> diff --git a/gcc/testsuite/gcc.dg/bitshift_1.c b/gcc/testsuite/gcc.dg/bitshift_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..5995d0746d2301eb48304629cb4b779b079f1270
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/bitshift_1.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile { target le } } */
> +/* { dg-additional-options "-O2 -save-temps -fdump-tree-optimized" } */
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +typedef unsigned int v4usi __attribute__ ((vector_size (16)));
> +typedef unsigned short v8uhi __attribute__ ((vector_size (16)));
> +
> +unsigned int foor (v4usi x)
> +{
> +    return x[1] >> 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 48>;} "optimized" } } */
> +
> +unsigned int fool (v4usi x)
> +{
> +    return x[1] << 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 32, 32>;} "optimized" } } */
> +
> +unsigned short foor2 (v4usi x)
> +{
> +    return x[3] >> 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 112>;} "optimized" } } */
> +
> +unsigned int fool2 (v4usi x)
> +{
> +    return x[0] << 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 32, 0>;} "optimized" } } */
> +
> +unsigned char foor3 (v8uhi x)
> +{
> +    return x[3] >> 9;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 7, 57>;} "optimized" } } */
> +
> +unsigned short fool3 (v8uhi x)
> +{
> +    return x[0] << 9;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 0>;} "optimized" } } */
> +
> +unsigned short foo2 (v4si x)
> +{
> +  int y = x[0] + x[1];
> +  return y >> 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 64, 0>;} "optimized" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/bitshift_2.c b/gcc/testsuite/gcc.dg/bitshift_2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..406b4def9d4aebbc83bd5bef92dab825b85f2aa4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/bitshift_2.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile { target be } } */
> +/* { dg-additional-options "-O2 -save-temps -fdump-tree-optimized" } */
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +typedef unsigned int v4usi __attribute__ ((vector_size (16)));
> +typedef unsigned short v8uhi __attribute__ ((vector_size (16)));
> +
> +unsigned int foor (v4usi x)
> +{
> +    return x[1] >> 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 16>;} "optimized" } } */
> +
> +unsigned int fool (v4usi x)
> +{
> +    return x[1] << 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 32, 32>;} "optimized" } } */
> +
> +unsigned short foor2 (v4usi x)
> +{
> +    return x[3] >> 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 80>;} "optimized" } } */
> +
> +unsigned int fool2 (v4usi x)
> +{
> +    return x[0] << 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 32, 0>;} "optimized" } } */
> +
> +unsigned char foor3 (v8uhi x)
> +{
> +    return x[3] >> 9;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 7, 39>;} "optimized" } } */
> +
> +unsigned short fool3 (v8uhi x)
> +{
> +    return x[0] << 9;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 16, 0>;} "optimized" } } */
> +
> +unsigned short foo2 (v4si x)
> +{
> +  int y = x[0] + x[1];
> +  return y >> 16;
> +}
> +/* { dg-final { scan-tree-dump {BIT_FIELD_REF <x_[^,]+, 64, 0>;} "optimized" } } */
>
  

Patch

--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7245,6 +7245,23 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0)))
   (IFN_REDUC_PLUS_WIDEN @0)))
 
+/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */
+(for shift (rshift)
+     op (plus)
+ (simplify
+  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
+  (if (INTEGRAL_TYPE_P (type))
+   (with { /* Can't use wide-int here as the precision differs between
+	      @1 and @3.  */
+	   unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
+	   unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
+	   unsigned HOST_WIDE_INT newsize = size - shiftc;
+	   tree nsize = wide_int_to_tree (bitsizetype, newsize);
+	   tree ntype
+	     = build_nonstandard_integer_type (newsize, 1); }
+    (if (ntype)
+     (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2 @3))))))))
+
 (simplify
  (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
  (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4); }))