Message ID | 5bc6c39a-568c-e24b-3a0c-00885b0630f0@tachyum.com |
---|---|
State | New |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 19A023858404 for <patchwork@sourceware.org>; Tue, 2 Nov 2021 15:52:58 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mxa.tachyum.com (mxa.tachyum.com [50.229.46.153]) by sourceware.org (Postfix) with ESMTPS id 7FBC03858D39 for <gcc-patches@gcc.gnu.org>; Tue, 2 Nov 2021 15:52:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7FBC03858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tachyum.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tachyum.com X-Virus-Scanned: by SpamTitan at tachyum.com Authentication-Results: mxa.tachyum.com; x-trusted-ip=pass Content-Type: multipart/mixed; boundary="------------On3aB4seg05cK2IK5RSaUB4c" Message-ID: <5bc6c39a-568c-e24b-3a0c-00885b0630f0@tachyum.com> Date: Tue, 2 Nov 2021 09:52:24 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.1.2 Content-Language: en-US From: Jeff Law <jlaw@tachyum.com> To: GCC Patches <gcc-patches@gcc.gnu.org> Subject: [RFA] Minor optimization of variable bit testing X-ClientProxiedBy: THQ-EX1.tachyum.com (10.7.1.6) To THQ-EX1.tachyum.com (10.7.1.6) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[RFA] Minor optimization of variable bit testing
|
|
Commit Message
Jeff Law
Nov. 2, 2021, 3:52 p.m. UTC
I was wandering spec chasing down instances where we should be generating bit-test, bit-set and bit-clear types of instructions for our target when I ran across a generic missed optimization in this space. (((1 << N) & C) != 0) -> (N == C') (((1 << N) & C) == 0) -> (N != C') Where C is a constant power of 2 and C' is log2 (C). That obviously avoids the shift by a variable amount and the bit masking which is the primary effect. I did see cases where we were able to constant propagate into uses of N, but those were only in PHI nodes and never triggered any real secondary effects in the cases I looked at. Anyway, it's a fairly minor optimization, but with the analysis done and patch in hand, it's silly not to take the easy win. Bootstrapped and regression tested on x86_64 and verified that the affected spec benchmark (gcc itself) still passes on our target. OK for the trunk? Note I added the patterns at the end of match.pd. Certainly open to moving them elsewhere. Jeff
Comments
On Tue, Nov 2, 2021 at 4:53 PM Jeff Law <jlaw@tachyum.com> wrote: > > > I was wandering spec chasing down instances where we should be > generating bit-test, bit-set and bit-clear types of instructions for our > target when I ran across a generic missed optimization in this space. > > > (((1 << N) & C) != 0) -> (N == C') > (((1 << N) & C) == 0) -> (N != C') > > Where C is a constant power of 2 and C' is log2 (C). > > > > That obviously avoids the shift by a variable amount and the bit masking > which is the primary effect. I did see cases where we were able to > constant propagate into uses of N, but those were only in PHI nodes and > never triggered any real secondary effects in the cases I looked at. > > > Anyway, it's a fairly minor optimization, but with the analysis done and > patch in hand, it's silly not to take the easy win. > > > Bootstrapped and regression tested on x86_64 and verified that the > affected spec benchmark (gcc itself) still passes on our target. > > OK for the trunk? Note I added the patterns at the end of match.pd. > Certainly open to moving them elsewhere. There are related patterns like /* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) please move the new patterns next to those. +/* ((1 << n) & M) != 0 -> n == log2 (M) */ +(simplify + (ne + (bit_and + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) + (eq @1 { build_int_cst (integer_type_node, + wi::exact_log2 (wi::to_wide (@2))); })) + +/* ((1 << n) & M) == 0 -> n != log2 (M) */ +(simplify + (eq + (bit_and + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) + (ne @1 { build_int_cst (integer_type_node, + wi::exact_log2 (wi::to_wide (@2))); })) you don't need @3 or @0 so no need to specify them. You can merge the patterns with (for cmp (ne eq) icmp (eq ne) (simplify (cmp + (bit_and (nop_convert? (lshift integer_onep @1)) integer_pow2p@2) integer_zerop) (icmp @1 { wide_int_to_tree (TREE_TYPE (@1), + wi::exact_log2 (wi::to_wide (@2))); })) I belive the integer constant you build should be of the type of @1 (I fixed that above, also using wide_int_to_tree. The pattern is written in a way that _could_ match vector operations and a vector by vector shift in which case the wi::to_wide would ICE - integer_pow2p currently does not match vector constants. But maybe be defensive and add (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))) I think the patch is OK with those changes. Thanks, Richard. > > Jeff
On 11/3/2021 2:15 AM, Richard Biener via Gcc-patches wrote: > On Tue, Nov 2, 2021 at 4:53 PM Jeff Law <jlaw@tachyum.com> wrote: >> >> I was wandering spec chasing down instances where we should be >> generating bit-test, bit-set and bit-clear types of instructions for our >> target when I ran across a generic missed optimization in this space. >> >> >> (((1 << N) & C) != 0) -> (N == C') >> (((1 << N) & C) == 0) -> (N != C') >> >> Where C is a constant power of 2 and C' is log2 (C). >> >> >> >> That obviously avoids the shift by a variable amount and the bit masking >> which is the primary effect. I did see cases where we were able to >> constant propagate into uses of N, but those were only in PHI nodes and >> never triggered any real secondary effects in the cases I looked at. >> >> >> Anyway, it's a fairly minor optimization, but with the analysis done and >> patch in hand, it's silly not to take the easy win. >> >> >> Bootstrapped and regression tested on x86_64 and verified that the >> affected spec benchmark (gcc itself) still passes on our target. >> >> OK for the trunk? Note I added the patterns at the end of match.pd. >> Certainly open to moving them elsewhere. > There are related patterns like > > /* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) > (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) > > please move the new patterns next to those. Will do. FWIW, it feels like match.pd is getting a bit unwieldy in terms of being able to find things. I wonder if we should be looking to break it up into multiple files. Not critical of course, but it's grown to ~6k lines at this point. > > +/* ((1 << n) & M) != 0 -> n == log2 (M) */ > +(simplify > + (ne > + (bit_and > + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) > + (eq @1 { build_int_cst (integer_type_node, > + wi::exact_log2 (wi::to_wide (@2))); })) > + > +/* ((1 << n) & M) == 0 -> n != log2 (M) */ > +(simplify > + (eq > + (bit_and > + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) > + (ne @1 { build_int_cst (integer_type_node, > + wi::exact_log2 (wi::to_wide (@2))); })) > > you don't need @3 or @0 so no need to specify them. Ah, I didn't know the language allowed us to do that. Will do and adjust operand #s. > You can merge the > patterns with > > (for cmp (ne eq) > icmp (eq ne) Thanks. I was pretty sure we we had this kind of mapping capability, now that I know what to look for, it's easy to find. > (simplify > (cmp > + (bit_and > (nop_convert? (lshift integer_onep @1)) integer_pow2p@2) integer_zerop) > (icmp @1 { wide_int_to_tree (TREE_TYPE (@1), > + wi::exact_log2 (wi::to_wide (@2))); })) > > I belive the integer constant you build should be of the type of @1 (I > fixed that above, > also using wide_int_to_tree. The pattern is written in a way that _could_ match > vector operations and a vector by vector shift in which case the > wi::to_wide would > ICE - integer_pow2p currently does not match vector constants. But maybe be > defensive and add > > (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))) > > I think the patch is OK with those changes. I'll add that test as well and retest. Thanks, jeff
On Thu, Nov 4, 2021 at 4:09 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 11/3/2021 2:15 AM, Richard Biener via Gcc-patches wrote: > > On Tue, Nov 2, 2021 at 4:53 PM Jeff Law <jlaw@tachyum.com> wrote: > >> > >> I was wandering spec chasing down instances where we should be > >> generating bit-test, bit-set and bit-clear types of instructions for our > >> target when I ran across a generic missed optimization in this space. > >> > >> > >> (((1 << N) & C) != 0) -> (N == C') > >> (((1 << N) & C) == 0) -> (N != C') > >> > >> Where C is a constant power of 2 and C' is log2 (C). > >> > >> > >> > >> That obviously avoids the shift by a variable amount and the bit masking > >> which is the primary effect. I did see cases where we were able to > >> constant propagate into uses of N, but those were only in PHI nodes and > >> never triggered any real secondary effects in the cases I looked at. > >> > >> > >> Anyway, it's a fairly minor optimization, but with the analysis done and > >> patch in hand, it's silly not to take the easy win. > >> > >> > >> Bootstrapped and regression tested on x86_64 and verified that the > >> affected spec benchmark (gcc itself) still passes on our target. > >> > >> OK for the trunk? Note I added the patterns at the end of match.pd. > >> Certainly open to moving them elsewhere. > > There are related patterns like > > > > /* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) > > (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) > > > > please move the new patterns next to those. > Will do. FWIW, it feels like match.pd is getting a bit unwieldy in > terms of being able to find things. I wonder if we should be looking to > break it up into multiple files. Not critical of course, but it's grown > to ~6k lines at this point. Originally I had multiple .pd files and match.pd #including them. But at some point it was quite difficult to decide where to put a pattern which resulted in a similarly messy state. Btw, dwarf2out.c is still the largest file at 33k lines and match.pd isn't amongst the 10 largest. But yes, some visual separation of things might help. I'll also note that pattern order affects the generated matching code since we try to preserve the invariant that earlier matching patterns need to match first. > > > > > +/* ((1 << n) & M) != 0 -> n == log2 (M) */ > > +(simplify > > + (ne > > + (bit_and > > + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) > > + (eq @1 { build_int_cst (integer_type_node, > > + wi::exact_log2 (wi::to_wide (@2))); })) > > + > > +/* ((1 << n) & M) == 0 -> n != log2 (M) */ > > +(simplify > > + (eq > > + (bit_and > > + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) > > + (ne @1 { build_int_cst (integer_type_node, > > + wi::exact_log2 (wi::to_wide (@2))); })) > > > > you don't need @3 or @0 so no need to specify them. > Ah, I didn't know the language allowed us to do that. Will do and > adjust operand #s. > > > > > You can merge the > > patterns with > > > > (for cmp (ne eq) > > icmp (eq ne) > Thanks. I was pretty sure we we had this kind of mapping capability, > now that I know what to look for, it's easy to find. > > > > (simplify > > (cmp > > + (bit_and > > (nop_convert? (lshift integer_onep @1)) integer_pow2p@2) integer_zerop) > > (icmp @1 { wide_int_to_tree (TREE_TYPE (@1), > > + wi::exact_log2 (wi::to_wide (@2))); })) > > > > I belive the integer constant you build should be of the type of @1 (I > > fixed that above, > > also using wide_int_to_tree. The pattern is written in a way that _could_ match > > vector operations and a vector by vector shift in which case the > > wi::to_wide would > > ICE - integer_pow2p currently does not match vector constants. But maybe be > > defensive and add > > > > (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))) > > > > I think the patch is OK with those changes. > I'll add that test as well and retest. > > Thanks, > jeff >
On 11/3/2021 2:15 AM, Richard Biener via Gcc-patches wrote: > On Tue, Nov 2, 2021 at 4:53 PM Jeff Law <jlaw@tachyum.com> wrote: >> >> I was wandering spec chasing down instances where we should be >> generating bit-test, bit-set and bit-clear types of instructions for our >> target when I ran across a generic missed optimization in this space. >> >> >> (((1 << N) & C) != 0) -> (N == C') >> (((1 << N) & C) == 0) -> (N != C') >> >> Where C is a constant power of 2 and C' is log2 (C). >> >> >> >> That obviously avoids the shift by a variable amount and the bit masking >> which is the primary effect. I did see cases where we were able to >> constant propagate into uses of N, but those were only in PHI nodes and >> never triggered any real secondary effects in the cases I looked at. >> >> >> Anyway, it's a fairly minor optimization, but with the analysis done and >> patch in hand, it's silly not to take the easy win. >> >> >> Bootstrapped and regression tested on x86_64 and verified that the >> affected spec benchmark (gcc itself) still passes on our target. >> >> OK for the trunk? Note I added the patterns at the end of match.pd. >> Certainly open to moving them elsewhere. > There are related patterns like > > /* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) > (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) > > please move the new patterns next to those. > > +/* ((1 << n) & M) != 0 -> n == log2 (M) */ > +(simplify > + (ne > + (bit_and > + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) > + (eq @1 { build_int_cst (integer_type_node, > + wi::exact_log2 (wi::to_wide (@2))); })) > + > +/* ((1 << n) & M) == 0 -> n != log2 (M) */ > +(simplify > + (eq > + (bit_and > + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) > + (ne @1 { build_int_cst (integer_type_node, > + wi::exact_log2 (wi::to_wide (@2))); })) > > you don't need @3 or @0 so no need to specify them. You can merge the > patterns with > > (for cmp (ne eq) > icmp (eq ne) > (simplify > (cmp > + (bit_and > (nop_convert? (lshift integer_onep @1)) integer_pow2p@2) integer_zerop) > (icmp @1 { wide_int_to_tree (TREE_TYPE (@1), > + wi::exact_log2 (wi::to_wide (@2))); })) > > I belive the integer constant you build should be of the type of @1 (I > fixed that above, > also using wide_int_to_tree. The pattern is written in a way that _could_ match > vector operations and a vector by vector shift in which case the > wi::to_wide would > ICE - integer_pow2p currently does not match vector constants. But maybe be > defensive and add > > (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))) > > I think the patch is OK with those changes. FTR, here's what I committed. Bootstrapped and regression tested on our internal target. jeff commit 2abd924f91e54a5229efb2c3bb5fb247059a5b37 Author: Jeff Law <jeffreyalaw@gmail.com> Date: Mon Nov 8 23:23:34 2021 -0500 Minor optimization of variable bit testing gcc/ * match.pd: New pattern to simplify (1 << n) & M ==/!= 0 for M being a power of 2. gcc/testsuite * gcc.dg/tree-ssa/bittest.c: New test diff --git a/gcc/match.pd b/gcc/match.pd index 71cf6f9df0a..cdab5e59f4e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3441,6 +3441,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_and (convert (shift:shift_type (convert @3) @1)) { newmaskt; }) (bit_and @4 { newmaskt; }))))))))))))) +/* ((1 << n) & M) != 0 -> n == log2 (M) */ +(for cmp (ne eq) + icmp (eq ne) + (simplify + (cmp + (bit_and + (nop_convert? (lshift integer_onep @0)) integer_pow2p@1) integer_zerop) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))) + (icmp @0 { wide_int_to_tree (TREE_TYPE (@0), + wi::exact_log2 (wi::to_wide (@1))); })))) + /* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1) (X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1). */ (for shift (lshift rshift) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bittest.c b/gcc/testsuite/gcc.dg/tree-ssa/bittest.c new file mode 100644 index 00000000000..7d712cad1ee --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/bittest.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + + +void bar (void); + +void +foo(unsigned int abc123) +{ + unsigned int xyzpdq = (1 << abc123); + if ((xyzpdq & 0x800) != 0) + bar(); +} + +void +baz(unsigned int abc123) +{ + unsigned int xyzpdq = (1 << abc123); + if ((xyzpdq & 0x800) == 0) + bar(); +} + +/* What we want to verify is that the bit test against xyzpdq is + replaced with a test against abc123 which avoids the shifting + and bit ops. */ +/* { dg-final { scan-tree-dump-not "xyzpdq" "optimized"} } */ +/* { dg-final { scan-tree-dump-times "if .abc123" 2 "optimized"} } */
diff --git a/gcc/match.pd b/gcc/match.pd index 4fbba3922e5..b275631555d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -6835,3 +6835,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) to the number of trailing zeroes. */ (match (ctz_table_index @1 @2 @3) (rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3)) + +/* ((1 << n) & M) != 0 -> n == log2 (M) */ +(simplify + (ne + (bit_and + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) + (eq @1 { build_int_cst (integer_type_node, + wi::exact_log2 (wi::to_wide (@2))); })) + +/* ((1 << n) & M) == 0 -> n != log2 (M) */ +(simplify + (eq + (bit_and + (nop_convert? (lshift integer_onep@0 @1)) integer_pow2p@2) integer_zerop@3) + (ne @1 { build_int_cst (integer_type_node, + wi::exact_log2 (wi::to_wide (@2))); })) + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bittest.c b/gcc/testsuite/gcc.dg/tree-ssa/bittest.c new file mode 100644 index 00000000000..7d712cad1ee --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/bittest.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + + +void bar (void); + +void +foo(unsigned int abc123) +{ + unsigned int xyzpdq = (1 << abc123); + if ((xyzpdq & 0x800) != 0) + bar(); +} + +void +baz(unsigned int abc123) +{ + unsigned int xyzpdq = (1 << abc123); + if ((xyzpdq & 0x800) == 0) + bar(); +} + +/* What we want to verify is that the bit test against xyzpdq is + replaced with a test against abc123 which avoids the shifting + and bit ops. */ +/* { dg-final { scan-tree-dump-not "xyzpdq" "optimized"} } */ +/* { dg-final { scan-tree-dump-times "if .abc123" 2 "optimized"} } */