[RFA] Minor optimization of variable bit testing

Message ID 5bc6c39a-568c-e24b-3a0c-00885b0630f0@tachyum.com
State New
Headers
Series [RFA] Minor optimization of variable bit testing |

Commit Message

Jeff Law Nov. 2, 2021, 3:52 p.m. UTC
  I was wandering spec chasing down instances where we should be 
generating bit-test, bit-set and bit-clear types of instructions for our 
target when I ran across a generic missed optimization in this space.


(((1 << N) & C) != 0)  -> (N == C')
(((1 << N) & C) == 0)  -> (N != C')

Where C is a constant power of 2 and C' is log2 (C).



That obviously avoids the shift by a variable amount and the bit masking 
which is the primary effect.  I did see cases where we were able to 
constant propagate into uses of N, but those were only in PHI nodes and 
never triggered any real secondary effects in the cases I looked at.


Anyway, it's a fairly minor optimization, but with the analysis done and 
patch in hand, it's silly not to take the easy win.


Bootstrapped and regression tested on x86_64 and verified that the 
affected spec benchmark (gcc itself) still passes on our target.

OK for the trunk?  Note I added the patterns at the end of match.pd.  
Certainly open to moving them elsewhere.


Jeff
  

Comments

Richard Biener Nov. 3, 2021, 8:15 a.m. UTC | #1
On Tue, Nov 2, 2021 at 4:53 PM Jeff Law <jlaw@tachyum.com> wrote:
>
>
> I was wandering spec chasing down instances where we should be
> generating bit-test, bit-set and bit-clear types of instructions for our
> target when I ran across a generic missed optimization in this space.
>
>
> (((1 << N) & C) != 0)  -> (N == C')
> (((1 << N) & C) == 0)  -> (N != C')
>
> Where C is a constant power of 2 and C' is log2 (C).
>
>
>
> That obviously avoids the shift by a variable amount and the bit masking
> which is the primary effect.  I did see cases where we were able to
> constant propagate into uses of N, but those were only in PHI nodes and
> never triggered any real secondary effects in the cases I looked at.
>
>
> Anyway, it's a fairly minor optimization, but with the analysis done and
> patch in hand, it's silly not to take the easy win.
>
>
> Bootstrapped and regression tested on x86_64 and verified that the
> affected spec benchmark (gcc itself) still passes on our target.
>
> OK for the trunk?  Note I added the patterns at the end of match.pd.
> Certainly open to moving them elsewhere.

There are related patterns like

/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
   (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)

please move the new patterns next to those.

+/* ((1 << n) & M) != 0  -> n == log2 (M) */
+(simplify
+ (ne
+  (bit_and
+   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
+ (eq @1 { build_int_cst (integer_type_node,
+                         wi::exact_log2 (wi::to_wide (@2))); }))
+
+/* ((1 << n) & M) == 0  -> n != log2 (M) */
+(simplify
+ (eq
+  (bit_and
+   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
+ (ne @1 { build_int_cst (integer_type_node,
+                         wi::exact_log2 (wi::to_wide (@2))); }))

you don't need @3 or @0 so no need to specify them.  You can merge the
patterns with

(for cmp (ne eq)
       icmp (eq ne)
  (simplify
    (cmp
+  (bit_and
      (nop_convert? (lshift integer_onep @1)) integer_pow2p@2) integer_zerop)
    (icmp @1 { wide_int_to_tree (TREE_TYPE (@1),
+                         wi::exact_log2 (wi::to_wide (@2))); }))

I belive the integer constant you build should be of the type of @1 (I
fixed that above,
also using wide_int_to_tree.  The pattern is written in a way that _could_ match
vector operations and a vector by vector shift in which case the
wi::to_wide would
ICE - integer_pow2p currently does not match vector constants.  But maybe be
defensive and add

  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)))

I think the patch is OK with those changes.

Thanks,
Richard.


>
> Jeff
  
Jeff Law Nov. 4, 2021, 3:09 p.m. UTC | #2
On 11/3/2021 2:15 AM, Richard Biener via Gcc-patches wrote:
> On Tue, Nov 2, 2021 at 4:53 PM Jeff Law <jlaw@tachyum.com> wrote:
>>
>> I was wandering spec chasing down instances where we should be
>> generating bit-test, bit-set and bit-clear types of instructions for our
>> target when I ran across a generic missed optimization in this space.
>>
>>
>> (((1 << N) & C) != 0)  -> (N == C')
>> (((1 << N) & C) == 0)  -> (N != C')
>>
>> Where C is a constant power of 2 and C' is log2 (C).
>>
>>
>>
>> That obviously avoids the shift by a variable amount and the bit masking
>> which is the primary effect.  I did see cases where we were able to
>> constant propagate into uses of N, but those were only in PHI nodes and
>> never triggered any real secondary effects in the cases I looked at.
>>
>>
>> Anyway, it's a fairly minor optimization, but with the analysis done and
>> patch in hand, it's silly not to take the easy win.
>>
>>
>> Bootstrapped and regression tested on x86_64 and verified that the
>> affected spec benchmark (gcc itself) still passes on our target.
>>
>> OK for the trunk?  Note I added the patterns at the end of match.pd.
>> Certainly open to moving them elsewhere.
> There are related patterns like
>
> /* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
>     (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
>
> please move the new patterns next to those.
Will do.   FWIW, it feels like match.pd is getting a bit unwieldy in 
terms of being able to find things.  I wonder if we should be looking to 
break it up into multiple files.  Not critical of course, but it's grown 
to ~6k lines at this point.


>
> +/* ((1 << n) & M) != 0  -> n == log2 (M) */
> +(simplify
> + (ne
> +  (bit_and
> +   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
> + (eq @1 { build_int_cst (integer_type_node,
> +                         wi::exact_log2 (wi::to_wide (@2))); }))
> +
> +/* ((1 << n) & M) == 0  -> n != log2 (M) */
> +(simplify
> + (eq
> +  (bit_and
> +   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
> + (ne @1 { build_int_cst (integer_type_node,
> +                         wi::exact_log2 (wi::to_wide (@2))); }))
>
> you don't need @3 or @0 so no need to specify them.
Ah, I didn't know the language allowed us to do that.  Will do and 
adjust operand #s.



>   You can merge the
> patterns with
>
> (for cmp (ne eq)
>         icmp (eq ne)
Thanks.  I was pretty sure we we had this kind of mapping capability, 
now that I know what to look for, it's easy to find.


>    (simplify
>      (cmp
> +  (bit_and
>        (nop_convert? (lshift integer_onep @1)) integer_pow2p@2) integer_zerop)
>      (icmp @1 { wide_int_to_tree (TREE_TYPE (@1),
> +                         wi::exact_log2 (wi::to_wide (@2))); }))
>
> I belive the integer constant you build should be of the type of @1 (I
> fixed that above,
> also using wide_int_to_tree.  The pattern is written in a way that _could_ match
> vector operations and a vector by vector shift in which case the
> wi::to_wide would
> ICE - integer_pow2p currently does not match vector constants.  But maybe be
> defensive and add
>
>    (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)))
>
> I think the patch is OK with those changes.
I'll add that test as well and retest.

Thanks,
jeff
  
Richard Biener Nov. 5, 2021, 10:22 a.m. UTC | #3
On Thu, Nov 4, 2021 at 4:09 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 11/3/2021 2:15 AM, Richard Biener via Gcc-patches wrote:
> > On Tue, Nov 2, 2021 at 4:53 PM Jeff Law <jlaw@tachyum.com> wrote:
> >>
> >> I was wandering spec chasing down instances where we should be
> >> generating bit-test, bit-set and bit-clear types of instructions for our
> >> target when I ran across a generic missed optimization in this space.
> >>
> >>
> >> (((1 << N) & C) != 0)  -> (N == C')
> >> (((1 << N) & C) == 0)  -> (N != C')
> >>
> >> Where C is a constant power of 2 and C' is log2 (C).
> >>
> >>
> >>
> >> That obviously avoids the shift by a variable amount and the bit masking
> >> which is the primary effect.  I did see cases where we were able to
> >> constant propagate into uses of N, but those were only in PHI nodes and
> >> never triggered any real secondary effects in the cases I looked at.
> >>
> >>
> >> Anyway, it's a fairly minor optimization, but with the analysis done and
> >> patch in hand, it's silly not to take the easy win.
> >>
> >>
> >> Bootstrapped and regression tested on x86_64 and verified that the
> >> affected spec benchmark (gcc itself) still passes on our target.
> >>
> >> OK for the trunk?  Note I added the patterns at the end of match.pd.
> >> Certainly open to moving them elsewhere.
> > There are related patterns like
> >
> > /* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
> >     (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
> >
> > please move the new patterns next to those.
> Will do.   FWIW, it feels like match.pd is getting a bit unwieldy in
> terms of being able to find things.  I wonder if we should be looking to
> break it up into multiple files.  Not critical of course, but it's grown
> to ~6k lines at this point.

Originally I had multiple .pd files and match.pd #including them.  But at
some point it was quite difficult to decide where to put a pattern which
resulted in a similarly messy state.

Btw, dwarf2out.c is still the largest file at 33k lines and match.pd isn't
amongst the 10 largest.

But yes, some visual separation of things might help.  I'll also note
that pattern order affects the generated matching code since we
try to preserve the invariant that earlier matching patterns need
to match first.

>
> >
> > +/* ((1 << n) & M) != 0  -> n == log2 (M) */
> > +(simplify
> > + (ne
> > +  (bit_and
> > +   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
> > + (eq @1 { build_int_cst (integer_type_node,
> > +                         wi::exact_log2 (wi::to_wide (@2))); }))
> > +
> > +/* ((1 << n) & M) == 0  -> n != log2 (M) */
> > +(simplify
> > + (eq
> > +  (bit_and
> > +   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
> > + (ne @1 { build_int_cst (integer_type_node,
> > +                         wi::exact_log2 (wi::to_wide (@2))); }))
> >
> > you don't need @3 or @0 so no need to specify them.
> Ah, I didn't know the language allowed us to do that.  Will do and
> adjust operand #s.
>
>
>
> >   You can merge the
> > patterns with
> >
> > (for cmp (ne eq)
> >         icmp (eq ne)
> Thanks.  I was pretty sure we we had this kind of mapping capability,
> now that I know what to look for, it's easy to find.
>
>
> >    (simplify
> >      (cmp
> > +  (bit_and
> >        (nop_convert? (lshift integer_onep @1)) integer_pow2p@2) integer_zerop)
> >      (icmp @1 { wide_int_to_tree (TREE_TYPE (@1),
> > +                         wi::exact_log2 (wi::to_wide (@2))); }))
> >
> > I belive the integer constant you build should be of the type of @1 (I
> > fixed that above,
> > also using wide_int_to_tree.  The pattern is written in a way that _could_ match
> > vector operations and a vector by vector shift in which case the
> > wi::to_wide would
> > ICE - integer_pow2p currently does not match vector constants.  But maybe be
> > defensive and add
> >
> >    (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)))
> >
> > I think the patch is OK with those changes.
> I'll add that test as well and retest.
>
> Thanks,
> jeff
>
  
Jeff Law Nov. 9, 2021, 4:27 a.m. UTC | #4
On 11/3/2021 2:15 AM, Richard Biener via Gcc-patches wrote:
> On Tue, Nov 2, 2021 at 4:53 PM Jeff Law <jlaw@tachyum.com> wrote:
>>
>> I was wandering spec chasing down instances where we should be
>> generating bit-test, bit-set and bit-clear types of instructions for our
>> target when I ran across a generic missed optimization in this space.
>>
>>
>> (((1 << N) & C) != 0)  -> (N == C')
>> (((1 << N) & C) == 0)  -> (N != C')
>>
>> Where C is a constant power of 2 and C' is log2 (C).
>>
>>
>>
>> That obviously avoids the shift by a variable amount and the bit masking
>> which is the primary effect.  I did see cases where we were able to
>> constant propagate into uses of N, but those were only in PHI nodes and
>> never triggered any real secondary effects in the cases I looked at.
>>
>>
>> Anyway, it's a fairly minor optimization, but with the analysis done and
>> patch in hand, it's silly not to take the easy win.
>>
>>
>> Bootstrapped and regression tested on x86_64 and verified that the
>> affected spec benchmark (gcc itself) still passes on our target.
>>
>> OK for the trunk?  Note I added the patterns at the end of match.pd.
>> Certainly open to moving them elsewhere.
> There are related patterns like
>
> /* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1)
>     (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)
>
> please move the new patterns next to those.
>
> +/* ((1 << n) & M) != 0  -> n == log2 (M) */
> +(simplify
> + (ne
> +  (bit_and
> +   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
> + (eq @1 { build_int_cst (integer_type_node,
> +                         wi::exact_log2 (wi::to_wide (@2))); }))
> +
> +/* ((1 << n) & M) == 0  -> n != log2 (M) */
> +(simplify
> + (eq
> +  (bit_and
> +   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
> + (ne @1 { build_int_cst (integer_type_node,
> +                         wi::exact_log2 (wi::to_wide (@2))); }))
>
> you don't need @3 or @0 so no need to specify them.  You can merge the
> patterns with
>
> (for cmp (ne eq)
>         icmp (eq ne)
>    (simplify
>      (cmp
> +  (bit_and
>        (nop_convert? (lshift integer_onep @1)) integer_pow2p@2) integer_zerop)
>      (icmp @1 { wide_int_to_tree (TREE_TYPE (@1),
> +                         wi::exact_log2 (wi::to_wide (@2))); }))
>
> I belive the integer constant you build should be of the type of @1 (I
> fixed that above,
> also using wide_int_to_tree.  The pattern is written in a way that _could_ match
> vector operations and a vector by vector shift in which case the
> wi::to_wide would
> ICE - integer_pow2p currently does not match vector constants.  But maybe be
> defensive and add
>
>    (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)))
>
> I think the patch is OK with those changes.
FTR, here's what I committed.  Bootstrapped and regression tested on our 
internal target.

jeff
commit 2abd924f91e54a5229efb2c3bb5fb247059a5b37
Author: Jeff Law <jeffreyalaw@gmail.com>
Date:   Mon Nov 8 23:23:34 2021 -0500

    Minor optimization of variable bit testing
    
    gcc/
            * match.pd: New pattern to simplify (1 << n) & M ==/!= 0 for M
            being a power of 2.
    
    gcc/testsuite
            * gcc.dg/tree-ssa/bittest.c: New test

diff --git a/gcc/match.pd b/gcc/match.pd
index 71cf6f9df0a..cdab5e59f4e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3441,6 +3441,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	     (bit_and (convert (shift:shift_type (convert @3) @1)) { newmaskt; })
 	     (bit_and @4 { newmaskt; })))))))))))))
 
+/* ((1 << n) & M) != 0  -> n == log2 (M) */
+(for cmp (ne eq)
+       icmp (eq ne)
+ (simplify
+  (cmp
+   (bit_and
+    (nop_convert? (lshift integer_onep @0)) integer_pow2p@1) integer_zerop)
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+   (icmp @0 { wide_int_to_tree (TREE_TYPE (@0),
+				wi::exact_log2 (wi::to_wide (@1))); }))))
+
 /* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)
    (X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1).  */
 (for shift (lshift rshift)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bittest.c b/gcc/testsuite/gcc.dg/tree-ssa/bittest.c
new file mode 100644
index 00000000000..7d712cad1ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bittest.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+void bar (void);
+
+void
+foo(unsigned int abc123)
+{
+  unsigned int xyzpdq = (1 << abc123);
+  if ((xyzpdq & 0x800) != 0)
+    bar();
+}
+
+void
+baz(unsigned int abc123)
+{
+  unsigned int xyzpdq = (1 << abc123);
+  if ((xyzpdq & 0x800) == 0)
+    bar();
+}
+
+/* What we want to verify is that the bit test against xyzpdq is
+   replaced with a test against abc123 which avoids the shifting
+   and bit ops.  */
+/* { dg-final { scan-tree-dump-not "xyzpdq" "optimized"} } */
+/* { dg-final { scan-tree-dump-times "if .abc123" 2 "optimized"} } */
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 4fbba3922e5..b275631555d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6835,3 +6835,20 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    to the number of trailing zeroes.  */
 (match (ctz_table_index @1 @2 @3)
   (rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3))
+
+/* ((1 << n) & M) != 0  -> n == log2 (M) */
+(simplify
+ (ne
+  (bit_and
+   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
+ (eq @1 { build_int_cst (integer_type_node,
+                         wi::exact_log2 (wi::to_wide (@2))); }))
+
+/* ((1 << n) & M) == 0  -> n != log2 (M) */
+(simplify
+ (eq
+  (bit_and
+   (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3)
+ (ne @1 { build_int_cst (integer_type_node,
+                         wi::exact_log2 (wi::to_wide (@2))); }))
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bittest.c b/gcc/testsuite/gcc.dg/tree-ssa/bittest.c
new file mode 100644
index 00000000000..7d712cad1ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bittest.c
@@ -0,0 +1,27 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+void bar (void);
+
+void
+foo(unsigned int abc123)
+{
+  unsigned int xyzpdq = (1 << abc123);
+  if ((xyzpdq & 0x800) != 0)
+    bar();
+}
+
+void
+baz(unsigned int abc123)
+{
+  unsigned int xyzpdq = (1 << abc123);
+  if ((xyzpdq & 0x800) == 0)
+    bar();
+}
+
+/* What we want to verify is that the bit test against xyzpdq is
+   replaced with a test against abc123 which avoids the shifting
+   and bit ops.  */
+/* { dg-final { scan-tree-dump-not "xyzpdq" "optimized"} } */
+/* { dg-final { scan-tree-dump-times "if .abc123" 2 "optimized"} } */