[v2] match.pd: (A>>bool) == 0 -> (unsigned)A) <= bool [PR119420]

Message ID 20260407212033.2739886-1-daniel.barboza@oss.qualcomm.com
State New
Headers
Series [v2] match.pd: (A>>bool) == 0 -> (unsigned)A) <= bool [PR119420] |

Commit Message

Daniel Henrique Barboza April 7, 2026, 9:20 p.m. UTC
  From: Daniel Barboza <daniel.barboza@oss.qualcomm.com>

Note that we're not adding its counterpart:

"(A>>bool) != 0 -> (unsigned)A) > bool"

Because it will collide with patterns that handles builtin_clz (e.g.
clz-complement-int.c tree-ssa test).

Bootstrapped and regression tested in x86.

        PR tree-optimization/119420

gcc/ChangeLog:

	* match.pd(`(A>>bool) EQ 0 -> (unsigned)A) LE bool`): New
	pattern.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr119420.c: New test.
---

Changes from v1:
- fixed formatting of (with ...) block;
- added match.pd comment about the interaction with builtin_clz and why
  we're not adding the NE variation.
- v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709727.html

 gcc/match.pd                             | 11 +++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
  

Comments

Andrew Pinski April 8, 2026, 5:26 a.m. UTC | #1
On Tue, Apr 7, 2026 at 2:20 PM Daniel Henrique Barboza
<daniel.barboza@oss.qualcomm.com> wrote:
>
> From: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
>
> Note that we're not adding its counterpart:
>
> "(A>>bool) != 0 -> (unsigned)A) > bool"
>
> Because it will collide with patterns that handles builtin_clz (e.g.
> clz-complement-int.c tree-ssa test).

I think this can be done for any constant shift (and not just for bool value).
From combine.cc:
  /* If we have (compare (xshiftrt FOO N) (const_int C)) and
     the low order N bits of FOO are known to be zero, we can do this
     by comparing FOO with C shifted left N bits so long as no
     overflow occurs.  Even if the low order N bits of FOO aren't known
     to be zero, if the comparison is >= or < we can use the same
     optimization and for > or <= by setting all the low
     order N bits in the comparison constant.  */

Handling more than just bool would also fix PR 124808. But maybe that
should be handled separately since this case is for the non-constant
bool case.

Thanks,
Andrew

>
> Bootstrapped and regression tested in x86.
>
>         PR tree-optimization/119420
>
> gcc/ChangeLog:
>
>         * match.pd(`(A>>bool) EQ 0 -> (unsigned)A) LE bool`): New
>         pattern.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/pr119420.c: New test.
> ---
>
> Changes from v1:
> - fixed formatting of (with ...) block;
> - added match.pd comment about the interaction with builtin_clz and why
>   we're not adding the NE variation.
> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709727.html
>
>  gcc/match.pd                             | 11 +++++++++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr119420.c | 23 +++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7f16fd4e081..b7a8ce420b4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5367,6 +5367,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        @0)))))
>  #endif
>
> +/* PR119420: (A >> bool) == 0 -> (unsigned)a <= (unsigned)bool
> +
> +   We're not adding its NE counterpart:
> +   "(A>>bool) != 0 -> (unsigned)A) > bool"
> +   Because it collides with patterns that handles builtin_clz, as
> +   shown in clz-complement-int.c tree-ssa test.  */
> +(simplify
> + (eq (rshift @0 zero_one_valued_p@1) integer_zerop)
> + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
> +  (le (convert:utype @0) (convert:utype @1))))
> +
>  /* Rewrite an LROTATE_EXPR by a constant into an
>     RROTATE_EXPR by a new constant.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> new file mode 100644
> index 00000000000..3169650fe66
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> @@ -0,0 +1,23 @@
> +/* { dg-additional-options -O2 } */
> +/* { dg-additional-options -fdump-tree-optimized } */
> +
> +int ll (signed a)
> +{
> +  int d = a >> 1;
> +  return d == 0;
> +}
> +
> +int ll1 (signed a)
> +{
> +  int d = a & ~1;
> +  return d == 0;
> +}
> +
> +int ll2 (signed a)
> +{
> +  unsigned aa = a;
> +  return aa <= 1;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " >> " 0 optimized } } */
> +/* { dg-final { scan-tree-dump-times " <= " 3 optimized } } */
> --
> 2.43.0
>
  
Daniel Henrique Barboza April 8, 2026, 9:39 a.m. UTC | #2
On 4/8/2026 2:26 AM, Andrew Pinski wrote:
> On Tue, Apr 7, 2026 at 2:20 PM Daniel Henrique Barboza
> <daniel.barboza@oss.qualcomm.com> wrote:
>>
>> From: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
>>
>> Note that we're not adding its counterpart:
>>
>> "(A>>bool) != 0 -> (unsigned)A) > bool"
>>
>> Because it will collide with patterns that handles builtin_clz (e.g.
>> clz-complement-int.c tree-ssa test).
> 
> I think this can be done for any constant shift (and not just for bool value).
>  From combine.cc:
>    /* If we have (compare (xshiftrt FOO N) (const_int C)) and
>       the low order N bits of FOO are known to be zero, we can do this
>       by comparing FOO with C shifted left N bits so long as no
>       overflow occurs.  Even if the low order N bits of FOO aren't known
>       to be zero, if the comparison is >= or < we can use the same
>       optimization and for > or <= by setting all the low
>       order N bits in the comparison constant.  */
> 
> Handling more than just bool would also fix PR 124808. But maybe that
> should be handled separately since this case is for the non-constant
> bool case.

I'll make changes to handle constant shifts and maybe we can help alleviate
124808.  That PR might require additional work anyway but at least progress
was made.


Thanks,
Daniel

> 
> Thanks,
> Andrew
> 
>>
>> Bootstrapped and regression tested in x86.
>>
>>          PR tree-optimization/119420
>>
>> gcc/ChangeLog:
>>
>>          * match.pd(`(A>>bool) EQ 0 -> (unsigned)A) LE bool`): New
>>          pattern.
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.dg/tree-ssa/pr119420.c: New test.
>> ---
>>
>> Changes from v1:
>> - fixed formatting of (with ...) block;
>> - added match.pd comment about the interaction with builtin_clz and why
>>    we're not adding the NE variation.
>> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709727.html
>>
>>   gcc/match.pd                             | 11 +++++++++++
>>   gcc/testsuite/gcc.dg/tree-ssa/pr119420.c | 23 +++++++++++++++++++++++
>>   2 files changed, 34 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 7f16fd4e081..b7a8ce420b4 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -5367,6 +5367,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>         @0)))))
>>   #endif
>>
>> +/* PR119420: (A >> bool) == 0 -> (unsigned)a <= (unsigned)bool
>> +
>> +   We're not adding its NE counterpart:
>> +   "(A>>bool) != 0 -> (unsigned)A) > bool"
>> +   Because it collides with patterns that handles builtin_clz, as
>> +   shown in clz-complement-int.c tree-ssa test.  */
>> +(simplify
>> + (eq (rshift @0 zero_one_valued_p@1) integer_zerop)
>> + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
>> +  (le (convert:utype @0) (convert:utype @1))))
>> +
>>   /* Rewrite an LROTATE_EXPR by a constant into an
>>      RROTATE_EXPR by a new constant.  */
>>   (simplify
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>> new file mode 100644
>> index 00000000000..3169650fe66
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-additional-options -O2 } */
>> +/* { dg-additional-options -fdump-tree-optimized } */
>> +
>> +int ll (signed a)
>> +{
>> +  int d = a >> 1;
>> +  return d == 0;
>> +}
>> +
>> +int ll1 (signed a)
>> +{
>> +  int d = a & ~1;
>> +  return d == 0;
>> +}
>> +
>> +int ll2 (signed a)
>> +{
>> +  unsigned aa = a;
>> +  return aa <= 1;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times " >> " 0 optimized } } */
>> +/* { dg-final { scan-tree-dump-times " <= " 3 optimized } } */
>> --
>> 2.43.0
>>
  
Daniel Henrique Barboza April 8, 2026, 8:49 p.m. UTC | #3
On 4/8/2026 2:26 AM, Andrew Pinski wrote:
> On Tue, Apr 7, 2026 at 2:20 PM Daniel Henrique Barboza
> <daniel.barboza@oss.qualcomm.com> wrote:
>>
>> From: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
>>
>> Note that we're not adding its counterpart:
>>
>> "(A>>bool) != 0 -> (unsigned)A) > bool"
>>
>> Because it will collide with patterns that handles builtin_clz (e.g.
>> clz-complement-int.c tree-ssa test).
> 
> I think this can be done for any constant shift (and not just for bool value).
>  From combine.cc:
>    /* If we have (compare (xshiftrt FOO N) (const_int C)) and
>       the low order N bits of FOO are known to be zero, we can do this
>       by comparing FOO with C shifted left N bits so long as no
>       overflow occurs.  Even if the low order N bits of FOO aren't known
>       to be zero, if the comparison is >= or < we can use the same
>       optimization and for > or <= by setting all the low
>       order N bits in the comparison constant.  */
> 
> Handling more than just bool would also fix PR 124808. But maybe that
> should be handled separately since this case is for the non-constant
> bool case.

After trying to fit in the logic in this same patch I figured you're right.
PR124808 is too complex to be added as a side case of this pattern.

I'll create a separated patch for it.  Let's keep this one as is.


Thanks,
Daniel

> 
> Thanks,
> Andrew
> 
>>
>> Bootstrapped and regression tested in x86.
>>
>>          PR tree-optimization/119420
>>
>> gcc/ChangeLog:
>>
>>          * match.pd(`(A>>bool) EQ 0 -> (unsigned)A) LE bool`): New
>>          pattern.
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.dg/tree-ssa/pr119420.c: New test.
>> ---
>>
>> Changes from v1:
>> - fixed formatting of (with ...) block;
>> - added match.pd comment about the interaction with builtin_clz and why
>>    we're not adding the NE variation.
>> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709727.html
>>
>>   gcc/match.pd                             | 11 +++++++++++
>>   gcc/testsuite/gcc.dg/tree-ssa/pr119420.c | 23 +++++++++++++++++++++++
>>   2 files changed, 34 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 7f16fd4e081..b7a8ce420b4 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -5367,6 +5367,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>         @0)))))
>>   #endif
>>
>> +/* PR119420: (A >> bool) == 0 -> (unsigned)a <= (unsigned)bool
>> +
>> +   We're not adding its NE counterpart:
>> +   "(A>>bool) != 0 -> (unsigned)A) > bool"
>> +   Because it collides with patterns that handles builtin_clz, as
>> +   shown in clz-complement-int.c tree-ssa test.  */
>> +(simplify
>> + (eq (rshift @0 zero_one_valued_p@1) integer_zerop)
>> + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
>> +  (le (convert:utype @0) (convert:utype @1))))
>> +
>>   /* Rewrite an LROTATE_EXPR by a constant into an
>>      RROTATE_EXPR by a new constant.  */
>>   (simplify
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>> new file mode 100644
>> index 00000000000..3169650fe66
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-additional-options -O2 } */
>> +/* { dg-additional-options -fdump-tree-optimized } */
>> +
>> +int ll (signed a)
>> +{
>> +  int d = a >> 1;
>> +  return d == 0;
>> +}
>> +
>> +int ll1 (signed a)
>> +{
>> +  int d = a & ~1;
>> +  return d == 0;
>> +}
>> +
>> +int ll2 (signed a)
>> +{
>> +  unsigned aa = a;
>> +  return aa <= 1;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times " >> " 0 optimized } } */
>> +/* { dg-final { scan-tree-dump-times " <= " 3 optimized } } */
>> --
>> 2.43.0
>>
  
Andrew Pinski April 9, 2026, 6:10 p.m. UTC | #4
On Tue, Apr 7, 2026 at 2:20 PM Daniel Henrique Barboza
<daniel.barboza@oss.qualcomm.com> wrote:
>
> From: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
>
> Note that we're not adding its counterpart:
>
> "(A>>bool) != 0 -> (unsigned)A) > bool"
>
> Because it will collide with patterns that handles builtin_clz (e.g.
> clz-complement-int.c tree-ssa test).
>
> Bootstrapped and regression tested in x86.
>
>         PR tree-optimization/119420
>
> gcc/ChangeLog:
>
>         * match.pd(`(A>>bool) EQ 0 -> (unsigned)A) LE bool`): New
>         pattern.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/pr119420.c: New test.
> ---
>
> Changes from v1:
> - fixed formatting of (with ...) block;
> - added match.pd comment about the interaction with builtin_clz and why
>   we're not adding the NE variation.
> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709727.html
>
>  gcc/match.pd                             | 11 +++++++++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr119420.c | 23 +++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7f16fd4e081..b7a8ce420b4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5367,6 +5367,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        @0)))))
>  #endif
>
> +/* PR119420: (A >> bool) == 0 -> (unsigned)a <= (unsigned)bool
> +
> +   We're not adding its NE counterpart:
> +   "(A>>bool) != 0 -> (unsigned)A) > bool"
> +   Because it collides with patterns that handles builtin_clz, as
> +   shown in clz-complement-int.c tree-ssa test.  */
> +(simplify
> + (eq (rshift @0 zero_one_valued_p@1) integer_zerop)
> + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
> +  (le (convert:utype @0) (convert:utype @1))))

Maybe for the ne case you could add a :s on the rshift for now. I
filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124833 for the non
single use case.
That seems like what LLVM does. I think the above also would need the
:s on the rshift for the same reason.


Thanks,
Andrew Pinski

> +
>  /* Rewrite an LROTATE_EXPR by a constant into an
>     RROTATE_EXPR by a new constant.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> new file mode 100644
> index 00000000000..3169650fe66
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> @@ -0,0 +1,23 @@
> +/* { dg-additional-options -O2 } */
> +/* { dg-additional-options -fdump-tree-optimized } */
> +
> +int ll (signed a)
> +{
> +  int d = a >> 1;
> +  return d == 0;
> +}
> +
> +int ll1 (signed a)
> +{
> +  int d = a & ~1;
> +  return d == 0;
> +}
> +
> +int ll2 (signed a)
> +{
> +  unsigned aa = a;
> +  return aa <= 1;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " >> " 0 optimized } } */
> +/* { dg-final { scan-tree-dump-times " <= " 3 optimized } } */
> --
> 2.43.0
>
  
Daniel Henrique Barboza April 9, 2026, 6:50 p.m. UTC | #5
On 4/9/2026 3:10 PM, Andrew Pinski wrote:
> On Tue, Apr 7, 2026 at 2:20 PM Daniel Henrique Barboza
> <daniel.barboza@oss.qualcomm.com> wrote:
>>
>> From: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
>>
>> Note that we're not adding its counterpart:
>>
>> "(A>>bool) != 0 -> (unsigned)A) > bool"
>>
>> Because it will collide with patterns that handles builtin_clz (e.g.
>> clz-complement-int.c tree-ssa test).
>>
>> Bootstrapped and regression tested in x86.
>>
>>          PR tree-optimization/119420
>>
>> gcc/ChangeLog:
>>
>>          * match.pd(`(A>>bool) EQ 0 -> (unsigned)A) LE bool`): New
>>          pattern.
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.dg/tree-ssa/pr119420.c: New test.
>> ---
>>
>> Changes from v1:
>> - fixed formatting of (with ...) block;
>> - added match.pd comment about the interaction with builtin_clz and why
>>    we're not adding the NE variation.
>> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709727.html
>>
>>   gcc/match.pd                             | 11 +++++++++++
>>   gcc/testsuite/gcc.dg/tree-ssa/pr119420.c | 23 +++++++++++++++++++++++
>>   2 files changed, 34 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 7f16fd4e081..b7a8ce420b4 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -5367,6 +5367,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>         @0)))))
>>   #endif
>>
>> +/* PR119420: (A >> bool) == 0 -> (unsigned)a <= (unsigned)bool
>> +
>> +   We're not adding its NE counterpart:
>> +   "(A>>bool) != 0 -> (unsigned)A) > bool"
>> +   Because it collides with patterns that handles builtin_clz, as
>> +   shown in clz-complement-int.c tree-ssa test.  */
>> +(simplify
>> + (eq (rshift @0 zero_one_valued_p@1) integer_zerop)
>> + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
>> +  (le (convert:utype @0) (convert:utype @1))))
> 
> Maybe for the ne case you could add a :s on the rshift for now. I
> filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124833 for the non
> single use case.
> That seems like what LLVM does. I think the above also would need the
> :s on the rshift for the same reason.

That brings me back to a patch that is pending a review/redo:

"Re: [PATCH v2] match.pd: simplify lshift const cmp using bit_and [PR124019]"

Jeff mentioned that this patch breaks a CRIS test, and adding ":s" would be
a solution, and the pattern handled there:

"A << CST1 EQ|NE CST2 -> (A & CSTmask) EQ|NE (CST2 >> CST1); "

it is also a pattern that happens to collide with the builtin_ctz call with
CST1 = 1 (which is not handled in that patch).

And the pattern you suggested for PR124808 is also similar to these 2 (by the
way your idea improves that loop):

(A >> CST1) cmp CST2  ->  A cmp (CST2 << CST1)


My point here is that maybe there's an overall design/pattern in all these
optimizations with this overall format "A lshift|rshift CST1 cmp CST2" due
to how they match these CTZ/CLZ builtins, and maybe we should default to
use ":s" in all related patterns.  Does that make sense?


Daniel

> 
> 
> Thanks,
> Andrew Pinski
> 
>> +
>>   /* Rewrite an LROTATE_EXPR by a constant into an
>>      RROTATE_EXPR by a new constant.  */
>>   (simplify
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>> new file mode 100644
>> index 00000000000..3169650fe66
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-additional-options -O2 } */
>> +/* { dg-additional-options -fdump-tree-optimized } */
>> +
>> +int ll (signed a)
>> +{
>> +  int d = a >> 1;
>> +  return d == 0;
>> +}
>> +
>> +int ll1 (signed a)
>> +{
>> +  int d = a & ~1;
>> +  return d == 0;
>> +}
>> +
>> +int ll2 (signed a)
>> +{
>> +  unsigned aa = a;
>> +  return aa <= 1;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times " >> " 0 optimized } } */
>> +/* { dg-final { scan-tree-dump-times " <= " 3 optimized } } */
>> --
>> 2.43.0
>>
  
Andrew Pinski April 9, 2026, 7:02 p.m. UTC | #6
On Thu, Apr 9, 2026 at 11:50 AM Daniel Henrique Barboza
<daniel.barboza@oss.qualcomm.com> wrote:
>
>
>
> On 4/9/2026 3:10 PM, Andrew Pinski wrote:
> > On Tue, Apr 7, 2026 at 2:20 PM Daniel Henrique Barboza
> > <daniel.barboza@oss.qualcomm.com> wrote:
> >>
> >> From: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
> >>
> >> Note that we're not adding its counterpart:
> >>
> >> "(A>>bool) != 0 -> (unsigned)A) > bool"
> >>
> >> Because it will collide with patterns that handles builtin_clz (e.g.
> >> clz-complement-int.c tree-ssa test).
> >>
> >> Bootstrapped and regression tested in x86.
> >>
> >>          PR tree-optimization/119420
> >>
> >> gcc/ChangeLog:
> >>
> >>          * match.pd(`(A>>bool) EQ 0 -> (unsigned)A) LE bool`): New
> >>          pattern.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>          * gcc.dg/tree-ssa/pr119420.c: New test.
> >> ---
> >>
> >> Changes from v1:
> >> - fixed formatting of (with ...) block;
> >> - added match.pd comment about the interaction with builtin_clz and why
> >>    we're not adding the NE variation.
> >> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709727.html
> >>
> >>   gcc/match.pd                             | 11 +++++++++++
> >>   gcc/testsuite/gcc.dg/tree-ssa/pr119420.c | 23 +++++++++++++++++++++++
> >>   2 files changed, 34 insertions(+)
> >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> >>
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index 7f16fd4e081..b7a8ce420b4 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -5367,6 +5367,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>         @0)))))
> >>   #endif
> >>
> >> +/* PR119420: (A >> bool) == 0 -> (unsigned)a <= (unsigned)bool
> >> +
> >> +   We're not adding its NE counterpart:
> >> +   "(A>>bool) != 0 -> (unsigned)A) > bool"
> >> +   Because it collides with patterns that handles builtin_clz, as
> >> +   shown in clz-complement-int.c tree-ssa test.  */
> >> +(simplify
> >> + (eq (rshift @0 zero_one_valued_p@1) integer_zerop)
> >> + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
> >> +  (le (convert:utype @0) (convert:utype @1))))
> >
> > Maybe for the ne case you could add a :s on the rshift for now. I
> > filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124833 for the non
> > single use case.
> > That seems like what LLVM does. I think the above also would need the
> > :s on the rshift for the same reason.
>
> That brings me back to a patch that is pending a review/redo:
>
> "Re: [PATCH v2] match.pd: simplify lshift const cmp using bit_and [PR124019]"
>
> Jeff mentioned that this patch breaks a CRIS test, and adding ":s" would be
> a solution, and the pattern handled there:
>
> "A << CST1 EQ|NE CST2 -> (A & CSTmask) EQ|NE (CST2 >> CST1); "
>
> it is also a pattern that happens to collide with the builtin_ctz call with
> CST1 = 1 (which is not handled in that patch).
>
> And the pattern you suggested for PR124808 is also similar to these 2 (by the
> way your idea improves that loop):
>
> (A >> CST1) cmp CST2  ->  A cmp (CST2 << CST1)
>
>
> My point here is that maybe there's an overall design/pattern in all these
> optimizations with this overall format "A lshift|rshift CST1 cmp CST2" due
> to how they match these CTZ/CLZ builtins, and maybe we should default to
> use ":s" in all related patterns.  Does that make sense?

Yes, I think so. Though in some cases having `:s` there is worse.
Maybe the issue is forwprop needs to be limiting how it handles
conditionals for loop exits.
Currently forwprop calls fold_stmt and not limiting the valueization
here. But maybe for loop exits we should.
This would allow us to remove the single_use/`:s` on some more
patterns (e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122186).
Let me think about this some more.

Thanks,
Andrew


>
>
> Daniel
>
> >
> >
> > Thanks,
> > Andrew Pinski
> >
> >> +
> >>   /* Rewrite an LROTATE_EXPR by a constant into an
> >>      RROTATE_EXPR by a new constant.  */
> >>   (simplify
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> >> new file mode 100644
> >> index 00000000000..3169650fe66
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> >> @@ -0,0 +1,23 @@
> >> +/* { dg-additional-options -O2 } */
> >> +/* { dg-additional-options -fdump-tree-optimized } */
> >> +
> >> +int ll (signed a)
> >> +{
> >> +  int d = a >> 1;
> >> +  return d == 0;
> >> +}
> >> +
> >> +int ll1 (signed a)
> >> +{
> >> +  int d = a & ~1;
> >> +  return d == 0;
> >> +}
> >> +
> >> +int ll2 (signed a)
> >> +{
> >> +  unsigned aa = a;
> >> +  return aa <= 1;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times " >> " 0 optimized } } */
> >> +/* { dg-final { scan-tree-dump-times " <= " 3 optimized } } */
> >> --
> >> 2.43.0
> >>
>
  
Richard Biener April 10, 2026, 6:25 a.m. UTC | #7
On Thu, Apr 9, 2026 at 9:04 PM Andrew Pinski
<andrew.pinski@oss.qualcomm.com> wrote:
>
> On Thu, Apr 9, 2026 at 11:50 AM Daniel Henrique Barboza
> <daniel.barboza@oss.qualcomm.com> wrote:
> >
> >
> >
> > On 4/9/2026 3:10 PM, Andrew Pinski wrote:
> > > On Tue, Apr 7, 2026 at 2:20 PM Daniel Henrique Barboza
> > > <daniel.barboza@oss.qualcomm.com> wrote:
> > >>
> > >> From: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
> > >>
> > >> Note that we're not adding its counterpart:
> > >>
> > >> "(A>>bool) != 0 -> (unsigned)A) > bool"
> > >>
> > >> Because it will collide with patterns that handles builtin_clz (e.g.
> > >> clz-complement-int.c tree-ssa test).
> > >>
> > >> Bootstrapped and regression tested in x86.
> > >>
> > >>          PR tree-optimization/119420
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >>          * match.pd(`(A>>bool) EQ 0 -> (unsigned)A) LE bool`): New
> > >>          pattern.
> > >>
> > >> gcc/testsuite/ChangeLog:
> > >>
> > >>          * gcc.dg/tree-ssa/pr119420.c: New test.
> > >> ---
> > >>
> > >> Changes from v1:
> > >> - fixed formatting of (with ...) block;
> > >> - added match.pd comment about the interaction with builtin_clz and why
> > >>    we're not adding the NE variation.
> > >> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709727.html
> > >>
> > >>   gcc/match.pd                             | 11 +++++++++++
> > >>   gcc/testsuite/gcc.dg/tree-ssa/pr119420.c | 23 +++++++++++++++++++++++
> > >>   2 files changed, 34 insertions(+)
> > >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> > >>
> > >> diff --git a/gcc/match.pd b/gcc/match.pd
> > >> index 7f16fd4e081..b7a8ce420b4 100644
> > >> --- a/gcc/match.pd
> > >> +++ b/gcc/match.pd
> > >> @@ -5367,6 +5367,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >>         @0)))))
> > >>   #endif
> > >>
> > >> +/* PR119420: (A >> bool) == 0 -> (unsigned)a <= (unsigned)bool
> > >> +
> > >> +   We're not adding its NE counterpart:
> > >> +   "(A>>bool) != 0 -> (unsigned)A) > bool"
> > >> +   Because it collides with patterns that handles builtin_clz, as
> > >> +   shown in clz-complement-int.c tree-ssa test.  */
> > >> +(simplify
> > >> + (eq (rshift @0 zero_one_valued_p@1) integer_zerop)
> > >> + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
> > >> +  (le (convert:utype @0) (convert:utype @1))))
> > >
> > > Maybe for the ne case you could add a :s on the rshift for now. I
> > > filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124833 for the non
> > > single use case.
> > > That seems like what LLVM does. I think the above also would need the
> > > :s on the rshift for the same reason.
> >
> > That brings me back to a patch that is pending a review/redo:
> >
> > "Re: [PATCH v2] match.pd: simplify lshift const cmp using bit_and [PR124019]"
> >
> > Jeff mentioned that this patch breaks a CRIS test, and adding ":s" would be
> > a solution, and the pattern handled there:
> >
> > "A << CST1 EQ|NE CST2 -> (A & CSTmask) EQ|NE (CST2 >> CST1); "
> >
> > it is also a pattern that happens to collide with the builtin_ctz call with
> > CST1 = 1 (which is not handled in that patch).
> >
> > And the pattern you suggested for PR124808 is also similar to these 2 (by the
> > way your idea improves that loop):
> >
> > (A >> CST1) cmp CST2  ->  A cmp (CST2 << CST1)
> >
> >
> > My point here is that maybe there's an overall design/pattern in all these
> > optimizations with this overall format "A lshift|rshift CST1 cmp CST2" due
> > to how they match these CTZ/CLZ builtins, and maybe we should default to
> > use ":s" in all related patterns.  Does that make sense?
>
> Yes, I think so. Though in some cases having `:s` there is worse.
> Maybe the issue is forwprop needs to be limiting how it handles
> conditionals for loop exits.
> Currently forwprop calls fold_stmt and not limiting the valueization
> here. But maybe for loop exits we should.
> This would allow us to remove the single_use/`:s` on some more
> patterns (e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122186).
> Let me think about this some more.

It's also if you have two uses that can be folded in a similar way,
leading to CSE of the generated extra stmt that we cannot anticipate
(or even try).  At one point I thought that forwprop could do
:s ignoring folding, but with the ability to roll-back, and do some
form of costing after a local CSE on the result, costing before/after
for "connected components".  But I have never sat down to think
about how to actually reasonably implement such thing - it might
be interesting to get feedback from folding like "this would have
simplified, but did not because of multi-use of @1, @2 ...", and
then somehow gather possible multi-use folding roots, "unshare",
fold, try to CSE and cost the set of expressions.  But that does not
play well with the somewhat nice lattice setup we have in forwprop
right now.

Richard.

>
> Thanks,
> Andrew
>
>
> >
> >
> > Daniel
> >
> > >
> > >
> > > Thanks,
> > > Andrew Pinski
> > >
> > >> +
> > >>   /* Rewrite an LROTATE_EXPR by a constant into an
> > >>      RROTATE_EXPR by a new constant.  */
> > >>   (simplify
> > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> > >> new file mode 100644
> > >> index 00000000000..3169650fe66
> > >> --- /dev/null
> > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
> > >> @@ -0,0 +1,23 @@
> > >> +/* { dg-additional-options -O2 } */
> > >> +/* { dg-additional-options -fdump-tree-optimized } */
> > >> +
> > >> +int ll (signed a)
> > >> +{
> > >> +  int d = a >> 1;
> > >> +  return d == 0;
> > >> +}
> > >> +
> > >> +int ll1 (signed a)
> > >> +{
> > >> +  int d = a & ~1;
> > >> +  return d == 0;
> > >> +}
> > >> +
> > >> +int ll2 (signed a)
> > >> +{
> > >> +  unsigned aa = a;
> > >> +  return aa <= 1;
> > >> +}
> > >> +
> > >> +/* { dg-final { scan-tree-dump-times " >> " 0 optimized } } */
> > >> +/* { dg-final { scan-tree-dump-times " <= " 3 optimized } } */
> > >> --
> > >> 2.43.0
> > >>
> >
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 7f16fd4e081..b7a8ce420b4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5367,6 +5367,17 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       @0)))))
 #endif
 
+/* PR119420: (A >> bool) == 0 -> (unsigned)a <= (unsigned)bool
+
+   We're not adding its NE counterpart:
+   "(A>>bool) != 0 -> (unsigned)A) > bool"
+   Because it collides with patterns that handles builtin_clz, as
+   shown in clz-complement-int.c tree-ssa test.  */
+(simplify
+ (eq (rshift @0 zero_one_valued_p@1) integer_zerop)
+ (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
+  (le (convert:utype @0) (convert:utype @1))))
+
 /* Rewrite an LROTATE_EXPR by a constant into an
    RROTATE_EXPR by a new constant.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
new file mode 100644
index 00000000000..3169650fe66
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr119420.c
@@ -0,0 +1,23 @@ 
+/* { dg-additional-options -O2 } */
+/* { dg-additional-options -fdump-tree-optimized } */
+
+int ll (signed a)
+{
+  int d = a >> 1;
+  return d == 0;
+}
+
+int ll1 (signed a)
+{
+  int d = a & ~1;
+  return d == 0;
+}
+
+int ll2 (signed a)
+{
+  unsigned aa = a;
+  return aa <= 1;
+}
+
+/* { dg-final { scan-tree-dump-times " >> " 0 optimized } } */
+/* { dg-final { scan-tree-dump-times " <= " 3 optimized } } */