Ignore (possible) signed zeros in operands of FP comparisons.

Message ID 001c01d837d9$4c8bf060$e5a3d120$@nextmovesoftware.com
State New
Headers
Series Ignore (possible) signed zeros in operands of FP comparisons. |

Commit Message

Roger Sayle March 14, 2022, 7:25 p.m. UTC
  I've been wondering about the possible performance/missed-optimization
impact of my patch for PR middle-end/98420 and similar IEEE correctness
fixes that disable constant folding optimizations when worrying about -0.0.
In the common situation where the floating point result is used by a
FP comparison, there's no distinction between +0.0 and -0.0, so some
HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.

Consider the following interesting example:

int foo(int x, double y) {
    return (x * 0.0) < y;
}

Although we know that x (when converted to double) can't be NaN or Inf,
we still worry that for negative values of x that (x * 0.0) may be -0.0
and so perform the multiplication at run-time.  But in this case, the
result of the comparison (-0.0 < y) will be exactly the same as (+0.0 < y)
for any y, hence the above may be safely constant folded to "0.0 < y"
avoiding the multiplication at run-time.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and allows GCC to continue to
optimize cases that we optimized in GCC 11 (without regard to correctness).
Ok for mainline?


2022-03-14  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* match.pd (X CMP (Y-Y) -> X CMP 0.0): New transformation.
	(X CMP (Y * 0.0) -> X CMP 0.0): Likewise.
	(X CMP X -> true): Test tree_expr_maybe_nan_p instead of HONOR_NANS.
	(X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
	this can't trap/signal.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-compare-9.c: New test case.


Thanks in advance,
Roger
--
  

Comments

Richard Biener March 15, 2022, 7:29 a.m. UTC | #1
On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> I've been wondering about the possible performance/missed-optimization
> impact of my patch for PR middle-end/98420 and similar IEEE correctness
> fixes that disable constant folding optimizations when worrying about -0.0.
> In the common situation where the floating point result is used by a
> FP comparison, there's no distinction between +0.0 and -0.0, so some
> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
>
> Consider the following interesting example:
>
> int foo(int x, double y) {
>     return (x * 0.0) < y;
> }
>
> Although we know that x (when converted to double) can't be NaN or Inf,
> we still worry that for negative values of x that (x * 0.0) may be -0.0
> and so perform the multiplication at run-time.  But in this case, the
> result of the comparison (-0.0 < y) will be exactly the same as (+0.0 < y)
> for any y, hence the above may be safely constant folded to "0.0 < y"
> avoiding the multiplication at run-time.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures, and allows GCC to continue to
> optimize cases that we optimized in GCC 11 (without regard to correctness).
> Ok for mainline?

Isn't that something that gimple-ssa-backprop.c is designed to handle?  I wonder
if you can see whether the signed zero speciality can be retrofitted there?
It currently tracks "sign does not matter", so possibly another state,
"sign of zero
does not matter" could be introduced there.

Thanks,
Richard.

>
> 2022-03-14  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * match.pd (X CMP (Y-Y) -> X CMP 0.0): New transformation.
>         (X CMP (Y * 0.0) -> X CMP 0.0): Likewise.
>         (X CMP X -> true): Test tree_expr_maybe_nan_p instead of HONOR_NANS.
>         (X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
>         this can't trap/signal.
>
> gcc/testsuite/ChangeLog
>         * gcc.dg/fold-compare-9.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>
  
Roger Sayle March 15, 2022, 8:03 a.m. UTC | #2
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 15 March 2022 07:29
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
> comparisons.
> 
> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> >
> >
> > I've been wondering about the possible performance/missed-optimization
> > impact of my patch for PR middle-end/98420 and similar IEEE
> > correctness fixes that disable constant folding optimizations when worrying
> about -0.0.
> > In the common situation where the floating point result is used by a
> > FP comparison, there's no distinction between +0.0 and -0.0, so some
> > HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
> >
> > Consider the following interesting example:
> >
> > int foo(int x, double y) {
> >     return (x * 0.0) < y;
> > }
> >
> > Although we know that x (when converted to double) can't be NaN or
> > Inf, we still worry that for negative values of x that (x * 0.0) may
> > be -0.0 and so perform the multiplication at run-time.  But in this
> > case, the result of the comparison (-0.0 < y) will be exactly the same
> > as (+0.0 < y) for any y, hence the above may be safely constant folded to "0.0 <
> y"
> > avoiding the multiplication at run-time.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures, and allows GCC to continue to
> > optimize cases that we optimized in GCC 11 (without regard to correctness).
> > Ok for mainline?
> 
> Isn't that something that gimple-ssa-backprop.c is designed to handle?  I wonder
> if you can see whether the signed zero speciality can be retrofitted there?
> It currently tracks "sign does not matter", so possibly another state, "sign of
> zero does not matter" could be introduced there.

Two questions. Would adding tracking of "sign of zero does not matter" to
gimple-ssa-backprop.c be suitable for stage4?  Secondly, even if gimple-ssa-backprop.c
performed this kind of optimization, would that be a reason not to support
these transformations in match.pd?  Perhaps someone could open a missed
optimization PR for backprop in Bugzilla, but the above patch still needs to be
reviewed on its own merits.

Speaking of tree-ssa passes that could be improved, I was wondering whether
you could review my EVRP patch to fix regression PR/102950.  Pretty please?
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html

Thanks (as always),
Roger

> Thanks,
> Richard.
> 
> >
> > 2022-03-14  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * match.pd (X CMP (Y-Y) -> X CMP 0.0): New transformation.
> >         (X CMP (Y * 0.0) -> X CMP 0.0): Likewise.
> >         (X CMP X -> true): Test tree_expr_maybe_nan_p instead of
> HONOR_NANS.
> >         (X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
> >         this can't trap/signal.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.dg/fold-compare-9.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >
  
Richard Biener March 15, 2022, 10:50 a.m. UTC | #3
On Tue, Mar 15, 2022 at 9:03 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: 15 March 2022 07:29
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
> > comparisons.
> >
> > On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
> > <roger@nextmovesoftware.com> wrote:
> > >
> > >
> > > I've been wondering about the possible performance/missed-optimization
> > > impact of my patch for PR middle-end/98420 and similar IEEE
> > > correctness fixes that disable constant folding optimizations when worrying
> > about -0.0.
> > > In the common situation where the floating point result is used by a
> > > FP comparison, there's no distinction between +0.0 and -0.0, so some
> > > HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
> > >
> > > Consider the following interesting example:
> > >
> > > int foo(int x, double y) {
> > >     return (x * 0.0) < y;
> > > }
> > >
> > > Although we know that x (when converted to double) can't be NaN or
> > > Inf, we still worry that for negative values of x that (x * 0.0) may
> > > be -0.0 and so perform the multiplication at run-time.  But in this
> > > case, the result of the comparison (-0.0 < y) will be exactly the same
> > > as (+0.0 < y) for any y, hence the above may be safely constant folded to "0.0 <
> > y"
> > > avoiding the multiplication at run-time.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check with no new failures, and allows GCC to continue to
> > > optimize cases that we optimized in GCC 11 (without regard to correctness).
> > > Ok for mainline?
> >
> > Isn't that something that gimple-ssa-backprop.c is designed to handle?  I wonder
> > if you can see whether the signed zero speciality can be retrofitted there?
> > It currently tracks "sign does not matter", so possibly another state, "sign of
> > zero does not matter" could be introduced there.
>
> Two questions. Would adding tracking of "sign of zero does not matter" to
> gimple-ssa-backprop.c be suitable for stage4?

Probably not.

>  Secondly, even if gimple-ssa-backprop.c
> performed this kind of optimization, would that be a reason not to support
> these transformations in match.pd?

The only reason would be to avoid growing match.pd with lots of special
patterns for cases that should rarely matter in practice.  For example the
pattern at hand wouldn't trigger for (x * 0.0) * z < y which is why I thought
of backprop.  Yes, we do have match.pd patterns with similar issues already.

Basically when the pattern doesn't simplify the outermost expression it
is prone to such issues.

> Perhaps someone could open a missed
> optimization PR for backprop in Bugzilla, but the above patch still needs to be
> reviewed on its own merits.

There's a few other pieces in the patch (didn't look at it before), changing
HONOR_NANS and ltgt, those are OK independently.

One comment, instead of matching both

 (cmp (mult ...) @2)

and

  (cmp @2 (mult ..))

you can use :c on the 'cmp' - it will do the "right" thing (swap the
comparison code)
when matching the other way around.  That will reduce repetition.

>
> Speaking of tree-ssa passes that could be improved, I was wondering whether
> you could review my EVRP patch to fix regression PR/102950.  Pretty please?
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html

I've left this to the ranger folks - you may want to ping Andrew here.

Richard.

> Thanks (as always),
> Roger
>
> > Thanks,
> > Richard.
> >
> > >
> > > 2022-03-14  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         * match.pd (X CMP (Y-Y) -> X CMP 0.0): New transformation.
> > >         (X CMP (Y * 0.0) -> X CMP 0.0): Likewise.
> > >         (X CMP X -> true): Test tree_expr_maybe_nan_p instead of
> > HONOR_NANS.
> > >         (X LTGT X -> false): Enable if X is not tree_expr_maybe_nan_p, as
> > >         this can't trap/signal.
> > >
> > > gcc/testsuite/ChangeLog
> > >         * gcc.dg/fold-compare-9.c: New test case.
> > >
> > >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
>
  
Richard Sandiford March 16, 2022, 9:44 a.m. UTC | #4
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>> I've been wondering about the possible performance/missed-optimization
>> impact of my patch for PR middle-end/98420 and similar IEEE correctness
>> fixes that disable constant folding optimizations when worrying about -0.0.
>> In the common situation where the floating point result is used by a
>> FP comparison, there's no distinction between +0.0 and -0.0, so some
>> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
>>
>> Consider the following interesting example:
>>
>> int foo(int x, double y) {
>>     return (x * 0.0) < y;
>> }
>>
>> Although we know that x (when converted to double) can't be NaN or Inf,
>> we still worry that for negative values of x that (x * 0.0) may be -0.0
>> and so perform the multiplication at run-time.  But in this case, the
>> result of the comparison (-0.0 < y) will be exactly the same as (+0.0 < y)
>> for any y, hence the above may be safely constant folded to "0.0 < y"
>> avoiding the multiplication at run-time.
>>
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> and make -k check with no new failures, and allows GCC to continue to
>> optimize cases that we optimized in GCC 11 (without regard to correctness).
>> Ok for mainline?
>
> Isn't that something that gimple-ssa-backprop.c is designed to handle?  I wonder
> if you can see whether the signed zero speciality can be retrofitted there?
> It currently tracks "sign does not matter", so possibly another state,
> "sign of zero
> does not matter" could be introduced there.

I agree that would be a nice thing to have FWIW.  gimple-ssa-backprop.c
was added to avoid regressions in one specific fold-const -> match.pd
move, but it was supposed to be suitable for other similar things
in future.

Thanks,
Richard
  
Jeff Law March 17, 2022, 11:27 p.m. UTC | #5
On 3/15/2022 2:03 AM, Roger Sayle wrote:
>> -----Original Message-----
>> From: Richard Biener <richard.guenther@gmail.com>
>> Sent: 15 March 2022 07:29
>> To: Roger Sayle <roger@nextmovesoftware.com>
>> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
>> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
>> comparisons.
>>
>> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
>> <roger@nextmovesoftware.com> wrote:
>>>
>>> I've been wondering about the possible performance/missed-optimization
>>> impact of my patch for PR middle-end/98420 and similar IEEE
>>> correctness fixes that disable constant folding optimizations when worrying
>> about -0.0.
>>> In the common situation where the floating point result is used by a
>>> FP comparison, there's no distinction between +0.0 and -0.0, so some
>>> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
>>>
>>> Consider the following interesting example:
>>>
>>> int foo(int x, double y) {
>>>      return (x * 0.0) < y;
>>> }
>>>
>>> Although we know that x (when converted to double) can't be NaN or
>>> Inf, we still worry that for negative values of x that (x * 0.0) may
>>> be -0.0 and so perform the multiplication at run-time.  But in this
>>> case, the result of the comparison (-0.0 < y) will be exactly the same
>>> as (+0.0 < y) for any y, hence the above may be safely constant folded to "0.0 <
>> y"
>>> avoiding the multiplication at run-time.
>>>
>>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> and make -k check with no new failures, and allows GCC to continue to
>>> optimize cases that we optimized in GCC 11 (without regard to correctness).
>>> Ok for mainline?
>> Isn't that something that gimple-ssa-backprop.c is designed to handle?  I wonder
>> if you can see whether the signed zero speciality can be retrofitted there?
>> It currently tracks "sign does not matter", so possibly another state, "sign of
>> zero does not matter" could be introduced there.
> Two questions. Would adding tracking of "sign of zero does not matter" to
> gimple-ssa-backprop.c be suitable for stage4?  Secondly, even if gimple-ssa-backprop.c
> performed this kind of optimization, would that be a reason not to support
> these transformations in match.pd?  Perhaps someone could open a missed
> optimization PR for backprop in Bugzilla, but the above patch still needs to be
> reviewed on its own merits.

Can't see how it's appropriate for stage4, but definitely interesting 
for gcc-13.

It'd fit well into some of the Ranger plans too -- Aldy and Andrew have 
been talking about tracking the special FP values in Ranger.   This is 
related, though not exactly the same since rather than tracking the 
special value, you're tracking if those special values actually matter.  
If you're going to do more work in this space, you might want to reach 
out to Aldy and Andrew to see if there's space for collaboration.


>
> Speaking of tree-ssa passes that could be improved, I was wondering whether
> you could review my EVRP patch to fix regression PR/102950.  Pretty please?
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html

I forwarded this to Aldy & Andrew.  I suspect they missed it.


>
> Thanks (as always),

No, thank you.  I'm so happy to see you contributing to GCC regularly again!


Jeff

>
  
Andrew MacLeod March 18, 2022, 2:12 a.m. UTC | #6
On 3/17/22 19:27, Jeff Law via Gcc-patches wrote:
>
> On 3/15/2022 2:03 AM, Roger Sayle wrote:
>
>
>>
>> Speaking of tree-ssa passes that could be improved, I was wondering 
>> whether
>> you could review my EVRP patch to fix regression PR/102950. Pretty 
>> please?
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html
>
> I forwarded this to Aldy & Andrew.  I suspect they missed it.
>
>
I saw it originally, and was happy to not have to figure out the bits 
myself :-)  The more people pitching in to range-ops the better!!

I'm certainly fine with it, but the gory details of the bit twiddling it 
beyond my scope of expertise. Figured Jakub or richi were better shots 
at that part.

Andrew
  
Roger Sayle March 18, 2022, 7:43 a.m. UTC | #7
Hi Jeff/Andrew,
> If you're going to do more work in this space, you might want to reach out to
> Aldy and Andrew to see if there's space for collaboration.

One (clever?) suggestion that I do have for ranger would be to add support for
an additional value_range_kind, VR_NONZEROBITS, which would be a variant of
VR_RANGE (for unsigned types?) and require very few changes to the existing
code.  Just like VR_RANGE all values would lie in [MIN, MAX], so by default
treating this value_range_kind identically to VR_RANGE there should be no
visible changes, but the change in semantics is that MIN has the minimum bits
set, and MAX, the maximum bits set [equivalent to the RVAL and RMASK pairs
from CCP's bit_value_{bin,un}op].  Hence, the VR_NONZEROBITS range [2,7]
would represent the possible values {2, 3, 6, 7} rather than {2, 3, 4, 5, 6, 7}. 
For a small number of bits, int_range can already handle this with multiple
irange spans, but adding this representation would allow the unification of the
bit-based propagation performed in tree-ssa-ccp with the range-value based
propagation performed in EVRP/ranger, allowing the clever forwards/backwards
functionality.

As Andrew's recent (partial) review points out, tracking the effect of operations
like BIT_XOR_EXPR on VR_RANGE is much more complicated than on the
proposed VR_NONZEROBITS.

Alas, I'm not the sort of contributor to make large infrastructure changes
myself, but if the above functionality were in place, I/the compiler would
be able to make use of it.

Cheers,
Roger
--

> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: 17 March 2022 23:28
> To: Roger Sayle <roger@nextmovesoftware.com>; 'Richard Biener'
> <richard.guenther@gmail.com>
> Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
> comparisons.
> 
> 
> On 3/15/2022 2:03 AM, Roger Sayle wrote:
> >> -----Original Message-----
> >> From: Richard Biener <richard.guenther@gmail.com>
> >> Sent: 15 March 2022 07:29
> >> To: Roger Sayle <roger@nextmovesoftware.com>
> >> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> >> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
> >> comparisons.
> >>
> >> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
> >> <roger@nextmovesoftware.com> wrote:
> >>>
> >>> I've been wondering about the possible
> >>> performance/missed-optimization impact of my patch for PR
> >>> middle-end/98420 and similar IEEE correctness fixes that disable
> >>> constant folding optimizations when worrying
> >> about -0.0.
> >>> In the common situation where the floating point result is used by a
> >>> FP comparison, there's no distinction between +0.0 and -0.0, so some
> >>> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
> >>>
> >>> Consider the following interesting example:
> >>>
> >>> int foo(int x, double y) {
> >>>      return (x * 0.0) < y;
> >>> }
> >>>
> >>> Although we know that x (when converted to double) can't be NaN or
> >>> Inf, we still worry that for negative values of x that (x * 0.0) may
> >>> be -0.0 and so perform the multiplication at run-time.  But in this
> >>> case, the result of the comparison (-0.0 < y) will be exactly the
> >>> same as (+0.0 < y) for any y, hence the above may be safely constant
> >>> folded to "0.0 <
> >> y"
> >>> avoiding the multiplication at run-time.
> >>>
> >>> This patch has been tested on x86_64-pc-linux-gnu with make
> >>> bootstrap and make -k check with no new failures, and allows GCC to
> >>> continue to optimize cases that we optimized in GCC 11 (without regard to
> correctness).
> >>> Ok for mainline?
> >> Isn't that something that gimple-ssa-backprop.c is designed to
> >> handle?  I wonder if you can see whether the signed zero speciality can be
> retrofitted there?
> >> It currently tracks "sign does not matter", so possibly another
> >> state, "sign of zero does not matter" could be introduced there.
> > Two questions. Would adding tracking of "sign of zero does not matter"
> > to gimple-ssa-backprop.c be suitable for stage4?  Secondly, even if
> > gimple-ssa-backprop.c performed this kind of optimization, would that
> > be a reason not to support these transformations in match.pd?  Perhaps
> > someone could open a missed optimization PR for backprop in Bugzilla,
> > but the above patch still needs to be reviewed on its own merits.
> 
> Can't see how it's appropriate for stage4, but definitely interesting for gcc-13.
> 
> It'd fit well into some of the Ranger plans too -- Aldy and Andrew have been
> talking about tracking the special FP values in Ranger.   This is related, though
> not exactly the same since rather than tracking the special value, you're tracking
> if those special values actually matter. If you're going to do more work in this
> space, you might want to reach out to Aldy and Andrew to see if there's space
> for collaboration.
> 
> 
> >
> > Speaking of tree-ssa passes that could be improved, I was wondering
> > whether you could review my EVRP patch to fix regression PR/102950.  Pretty
> please?
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html
> 
> I forwarded this to Aldy & Andrew.  I suspect they missed it.
> 
> 
> >
> > Thanks (as always),
> 
> No, thank you.  I'm so happy to see you contributing to GCC regularly again!
> 
> 
> Jeff
> 
> >
  
Andrew MacLeod March 18, 2022, 1:07 p.m. UTC | #8
On 3/18/22 03:43, Roger Sayle wrote:
> Hi Jeff/Andrew,
>> If you're going to do more work in this space, you might want to reach out to
>> Aldy and Andrew to see if there's space for collaboration.
> One (clever?) suggestion that I do have for ranger would be to add support for
> an additional value_range_kind, VR_NONZEROBITS, which would be a variant of
> VR_RANGE (for unsigned types?) and require very few changes to the existing


I think were ahead of you here.. Tracking known zero and one bits within 
irange as an adjunct has been in plan for awhile, just priorities 
haven't allowed us to get to it until recently...

Theres a bunch of stuff already in the hopper for the next stage1 that 
Aldy has been working with... he can expound upon it, but we plan to use 
both masks and ranges together  as appropriate.


> code.  Just like VR_RANGE all values would lie in [MIN, MAX], so by default
> treating this value_range_kind identically to VR_RANGE there should be no
> visible changes, but the change in semantics is that MIN has the minimum bits
> set, and MAX, the maximum bits set [equivalent to the RVAL and RMASK pairs
> from CCP's bit_value_{bin,un}op].  Hence, the VR_NONZEROBITS range [2,7]
> would represent the possible values {2, 3, 6, 7} rather than {2, 3, 4, 5, 6, 7}.
> For a small number of bits, int_range can already handle this with multiple
> irange spans, but adding this representation would allow the unification of the
> bit-based propagation performed in tree-ssa-ccp with the range-value based
> propagation performed in EVRP/ranger, allowing the clever forwards/backwards
> functionality.
>
> As Andrew's recent (partial) review points out, tracking the effect of operations
> like BIT_XOR_EXPR on VR_RANGE is much more complicated than on the
> proposed VR_NONZEROBITS.
>
> Alas, I'm not the sort of contributor to make large infrastructure changes
> myself, but if the above functionality were in place, I/the compiler would
> be able to make use of it.
>

And this is exactly what we are hoping.. we provide the structure and 
someone who is better at the underlying detail interaction can flesh out 
whatever specifics they find interesting.


Andrew
  
Andrew MacLeod March 18, 2022, 1:16 p.m. UTC | #9
On 3/17/22 19:27, Jeff Law via Gcc-patches wrote:
>
> On 3/15/2022 2:03 AM, Roger Sayle wrote:
>>> -----Original Message-----
>>> From: Richard Biener <richard.guenther@gmail.com>
>>> Sent: 15 March 2022 07:29
>>> To: Roger Sayle <roger@nextmovesoftware.com>
>>> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
>>> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
>>> comparisons.
>>>
>>> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
>>> <roger@nextmovesoftware.com> wrote:
>>>>
>>>> I've been wondering about the possible performance/missed-optimization
>>>> impact of my patch for PR middle-end/98420 and similar IEEE
>>>> correctness fixes that disable constant folding optimizations when 
>>>> worrying
>>> about -0.0.
>>>> In the common situation where the floating point result is used by a
>>>> FP comparison, there's no distinction between +0.0 and -0.0, so some
>>>> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
>>>>
>>>> Consider the following interesting example:
>>>>
>>>> int foo(int x, double y) {
>>>>      return (x * 0.0) < y;
>>>> }
>>>>
>>>> Although we know that x (when converted to double) can't be NaN or
>>>> Inf, we still worry that for negative values of x that (x * 0.0) may
>>>> be -0.0 and so perform the multiplication at run-time.  But in this
>>>> case, the result of the comparison (-0.0 < y) will be exactly the same
>>>> as (+0.0 < y) for any y, hence the above may be safely constant 
>>>> folded to "0.0 <
>>> y"
>>>> avoiding the multiplication at run-time.

I'm going to hazard a guess that this can be handled in the upcoming 
floating point range support?  there was a start of a conversation in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24021 about a month ago.

I know *zero* about floating point, but It seems like when we track 
floating point ranges, we will naturally be able to integrate

   _2 = _1 * 0.0;
   _3 = _2 < y_5(D);

that _2 evaluates to +/- 0.0  and when we do look at  (_2 < y_5)   that 
VRP  can simplify that to 0.0 < y?  (or any patch which uses 
simplification and ranges).    It seems like it should be 
straightforward anyway.

Andrew
  
Jeff Law March 18, 2022, 4:01 p.m. UTC | #10
On 3/18/2022 7:16 AM, Andrew MacLeod wrote:
> On 3/17/22 19:27, Jeff Law via Gcc-patches wrote:
>>
>> On 3/15/2022 2:03 AM, Roger Sayle wrote:
>>>> -----Original Message-----
>>>> From: Richard Biener <richard.guenther@gmail.com>
>>>> Sent: 15 March 2022 07:29
>>>> To: Roger Sayle <roger@nextmovesoftware.com>
>>>> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
>>>> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
>>>> comparisons.
>>>>
>>>> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
>>>> <roger@nextmovesoftware.com> wrote:
>>>>>
>>>>> I've been wondering about the possible 
>>>>> performance/missed-optimization
>>>>> impact of my patch for PR middle-end/98420 and similar IEEE
>>>>> correctness fixes that disable constant folding optimizations when 
>>>>> worrying
>>>> about -0.0.
>>>>> In the common situation where the floating point result is used by a
>>>>> FP comparison, there's no distinction between +0.0 and -0.0, so some
>>>>> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
>>>>>
>>>>> Consider the following interesting example:
>>>>>
>>>>> int foo(int x, double y) {
>>>>>      return (x * 0.0) < y;
>>>>> }
>>>>>
>>>>> Although we know that x (when converted to double) can't be NaN or
>>>>> Inf, we still worry that for negative values of x that (x * 0.0) may
>>>>> be -0.0 and so perform the multiplication at run-time. But in this
>>>>> case, the result of the comparison (-0.0 < y) will be exactly the 
>>>>> same
>>>>> as (+0.0 < y) for any y, hence the above may be safely constant 
>>>>> folded to "0.0 <
>>>> y"
>>>>> avoiding the multiplication at run-time.
>
> I'm going to hazard a guess that this can be handled in the upcoming 
> floating point range support?  there was a start of a conversation in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24021 about a month ago.
>
> I know *zero* about floating point, but It seems like when we track 
> floating point ranges, we will naturally be able to integrate
>
>   _2 = _1 * 0.0;
>   _3 = _2 < y_5(D);
>
> that _2 evaluates to +/- 0.0  and when we do look at  (_2 < y_5)   
> that VRP  can simplify that to 0.0 < y?  (or any patch which uses 
> simplification and ranges).    It seems like it should be 
> straightforward anyway.
Yea, I guess we'd pick it up that way and that's probably cleaner than 
what I was originally thinking in this space.

We realize that 2 is +-0.0.  Then we realize that for the comparison, we 
can constant propagate +0.0 for _2 since +0.0 and -0.0 behave the same 
way.  Ideally that removes the last reference to _2 and we DCE away the 
multiplication.


jeff
  
Aldy Hernandez March 18, 2022, 6:07 p.m. UTC | #11
On Fri, Mar 18, 2022 at 2:07 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 3/18/22 03:43, Roger Sayle wrote:
> > Hi Jeff/Andrew,
> >> If you're going to do more work in this space, you might want to reach out to
> >> Aldy and Andrew to see if there's space for collaboration.
> > One (clever?) suggestion that I do have for ranger would be to add support for
> > an additional value_range_kind, VR_NONZEROBITS, which would be a variant of
> > VR_RANGE (for unsigned types?) and require very few changes to the existing
>
>
> I think were ahead of you here.. Tracking known zero and one bits within
> irange as an adjunct has been in plan for awhile, just priorities
> haven't allowed us to get to it until recently...
>
> Theres a bunch of stuff already in the hopper for the next stage1 that
> Aldy has been working with... he can expound upon it, but we plan to use
> both masks and ranges together  as appropriate.

Yes, for the next stage1 I have patches queued up to provide nonzero
bits within the irange object.

In working on providing a global range storage (an irange replacement
for SSA_NAME_RANGE_INFO) for the next release, it became clear that
nonzero and ranges are probably best suited to live in the same space.
The current global range mechanism already does this-- by refining
nonzero bits with each change to the range for an SSA name.  However,
we sometimes pessimize global ranges, presumably because we couldn´t
properly represent an intersection with our value_range anti-range
business.  Also, this nonzero+range symbiosis only currently works for
global ranges (not iranges or value_ranges).  In the upcoming work,
the nonzero bits will live in irange, and work in tandem-- for
example, sometimes refining the range when a new nonzero mask is
available (0xff is really [0,255]) etc.  I have the core
infrastructure done.  I could probably use range-op help from the
relevant experts when the time comes.

And yes, as Roger points out, the mask tracking bits in CCP could play
very well with the ranger world.  For that matter, while I was doing
the above work I noticed that many of the bitmasks that CCP could
determine, would eventually be figured out by evrp, just in the form
of ranges.  Having nonzero bits in irange opens up all sorts of
possibilities.

>
>
> > code.  Just like VR_RANGE all values would lie in [MIN, MAX], so by default
> > treating this value_range_kind identically to VR_RANGE there should be no
> > visible changes, but the change in semantics is that MIN has the minimum bits
> > set, and MAX, the maximum bits set [equivalent to the RVAL and RMASK pairs
> > from CCP's bit_value_{bin,un}op].  Hence, the VR_NONZEROBITS range [2,7]
> > would represent the possible values {2, 3, 6, 7} rather than {2, 3, 4, 5, 6, 7}.
> > For a small number of bits, int_range can already handle this with multiple
> > irange spans, but adding this representation would allow the unification of the
> > bit-based propagation performed in tree-ssa-ccp with the range-value based
> > propagation performed in EVRP/ranger, allowing the clever forwards/backwards
> > functionality.
> >
> > As Andrew's recent (partial) review points out, tracking the effect of operations
> > like BIT_XOR_EXPR on VR_RANGE is much more complicated than on the
> > proposed VR_NONZEROBITS.
> >
> > Alas, I'm not the sort of contributor to make large infrastructure changes
> > myself, but if the above functionality were in place, I/the compiler would
> > be able to make use of it.
> >
>
> And this is exactly what we are hoping.. we provide the structure and
> someone who is better at the underlying detail interaction can flesh out
> whatever specifics they find interesting.

Yup.  We´re on the other side of the spectrum, the ability to provide
infrastructure with little expertise in the fine details (range ops,
floats, etc). :-)

Aldy

>
>
> Andrew
>
  
Aldy Hernandez March 18, 2022, 6:33 p.m. UTC | #12
On Fri, Mar 18, 2022 at 5:01 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 3/18/2022 7:16 AM, Andrew MacLeod wrote:
> > On 3/17/22 19:27, Jeff Law via Gcc-patches wrote:
> >>
> >> On 3/15/2022 2:03 AM, Roger Sayle wrote:
> >>>> -----Original Message-----
> >>>> From: Richard Biener <richard.guenther@gmail.com>
> >>>> Sent: 15 March 2022 07:29
> >>>> To: Roger Sayle <roger@nextmovesoftware.com>
> >>>> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> >>>> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
> >>>> comparisons.
> >>>>
> >>>> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
> >>>> <roger@nextmovesoftware.com> wrote:
> >>>>>
> >>>>> I've been wondering about the possible
> >>>>> performance/missed-optimization
> >>>>> impact of my patch for PR middle-end/98420 and similar IEEE
> >>>>> correctness fixes that disable constant folding optimizations when
> >>>>> worrying
> >>>> about -0.0.
> >>>>> In the common situation where the floating point result is used by a
> >>>>> FP comparison, there's no distinction between +0.0 and -0.0, so some
> >>>>> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
> >>>>>
> >>>>> Consider the following interesting example:
> >>>>>
> >>>>> int foo(int x, double y) {
> >>>>>      return (x * 0.0) < y;
> >>>>> }
> >>>>>
> >>>>> Although we know that x (when converted to double) can't be NaN or
> >>>>> Inf, we still worry that for negative values of x that (x * 0.0) may
> >>>>> be -0.0 and so perform the multiplication at run-time. But in this
> >>>>> case, the result of the comparison (-0.0 < y) will be exactly the
> >>>>> same
> >>>>> as (+0.0 < y) for any y, hence the above may be safely constant
> >>>>> folded to "0.0 <
> >>>> y"
> >>>>> avoiding the multiplication at run-time.
> >
> > I'm going to hazard a guess that this can be handled in the upcoming
> > floating point range support?  there was a start of a conversation in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24021 about a month ago.
> >
> > I know *zero* about floating point, but It seems like when we track
> > floating point ranges, we will naturally be able to integrate
> >
> >   _2 = _1 * 0.0;
> >   _3 = _2 < y_5(D);
> >
> > that _2 evaluates to +/- 0.0  and when we do look at  (_2 < y_5)
> > that VRP  can simplify that to 0.0 < y?  (or any patch which uses
> > simplification and ranges).    It seems like it should be
> > straightforward anyway.

Yes, definitely.  Ranger is really a propagation engine.  The plan all
along was to make it type agnostic with a generic vrange class that
provides core functionality that each implementation overrides (union,
intersect, varying, undefined, etc).  Right now we have plans for
float ranges, pointer tracking, and eventually string tracking.   The
generic vrange abstraction is done and queued up for the next release.

I have a core float implementation that handles relationals and end
points, but the plan is to add bits to track nonzero, infs, nans, etc.
Since ranger will just be a propagation engine, range-ops, union,
intersection, etc, could propagate not NaN and non Inf for the cast
operator above, plus +-0.0 for _2.  The float range-op entries would
be able to do all sorts of magic by querying the range for _2.  Also
the simplify_using_ranges class vrp uses could simplify the _2 < y_5
with appropriate smarts.

It really does seem pretty straightforward once the vrange and frange
classes are in place.  All this should be doable for the next stage1,
but we could use help with the nuances of frange when the time comes.

> Yea, I guess we'd pick it up that way and that's probably cleaner than
> what I was originally thinking in this space.
>
> We realize that 2 is +-0.0.  Then we realize that for the comparison, we
> can constant propagate +0.0 for _2 since +0.0 and -0.0 behave the same
> way.  Ideally that removes the last reference to _2 and we DCE away the
> multiplication.

Yup yup.

Aldy
>
>
> jeff
>
  
Aldy Hernandez March 21, 2022, 3:56 p.m. UTC | #13
On Fri, Mar 18, 2022 at 7:33 PM Aldy Hernandez <aldyh@redhat.com> wrote:

> > >>>>> Consider the following interesting example:
> > >>>>>
> > >>>>> int foo(int x, double y) {
> > >>>>>      return (x * 0.0) < y;
> > >>>>> }
> > >>>>>
> > >>>>> Although we know that x (when converted to double) can't be NaN or
> > >>>>> Inf, we still worry that for negative values of x that (x * 0.0) may
> > >>>>> be -0.0 and so perform the multiplication at run-time. But in this
> > >>>>> case, the result of the comparison (-0.0 < y) will be exactly the
> > >>>>> same
> > >>>>> as (+0.0 < y) for any y, hence the above may be safely constant
> > >>>>> folded to "0.0 <
> > >>>> y"
> > >>>>> avoiding the multiplication at run-time.

Ok, once the "frange" infrastructure is in place, it's actually
trivial.  See attached patch and tests.  We can do everything with
small range-op entries and evrp / ranger will handle everything else.

Roger, I believe this is what you described:

+// { dg-do compile }
+// { dg-options "-O2 -fno-tree-forwprop -fno-thread-jumps
-fdump-tree-evrp -fdump-tree-optimized" }
+
+extern void link_error ();
+
+int foo(int x, double y)
+{
+      return (x * 0.0) < y;
+}
+
+// The multiply should be gone by *vrp time.
+// { dg-final { scan-tree-dump-not " \\* " "evrp" } }
+
+// Ultimately, there sound be no references to "x".
+// { dg-final { scan-tree-dump-not "x_" "optimized" } }

With the attached patch (and pending patches), we keep track that a
cast from int to float is guaranteed to not be NaN, which allows us to
fold x*0.0 into 0.0, and DCE to remove x altogether.  Also, as the
other tests show, we are able to resolve __builtin_isnan's for the
operations implemented.  It should be straightforward for someone with
floating point foo to extend this to all operations.

Aldy
  
Roger Sayle March 26, 2022, 6:52 p.m. UTC | #14
Hi Aldy,
The proposed frange implementation looks cool.  The one technical tweak is
that if x is not NaN and not +Inf/-Inf, then x*0.0 is [-0.0,0.0].  It's because this
result is a range and not a constant that it can’t normally be constant folded,
unless it appears in a context where the sign of zero isn't important (such as
a comparison).

I suspect that frange needs pos_zero_p, neg_zero_p and any_zero_p instead
of just zero_p (and variants r.set_zero) to capture the subtleties.

Cheers,
Roger
--

> -----Original Message-----
> From: Aldy Hernandez <aldyh@redhat.com>
> Sent: 21 March 2022 05:56
> To: Jeff Law <jeffreyalaw@gmail.com>
> Cc: Andrew MacLeod <amacleod@redhat.com>; Roger Sayle
> <roger@nextmovesoftware.com>; Richard Biener
> <richard.guenther@gmail.com>; GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
> comparisons.
> 
> On Fri, Mar 18, 2022 at 7:33 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> 
> > > >>>>> Consider the following interesting example:
> > > >>>>>
> > > >>>>> int foo(int x, double y) {
> > > >>>>>      return (x * 0.0) < y;
> > > >>>>> }
> > > >>>>>
> > > >>>>> Although we know that x (when converted to double) can't be
> > > >>>>> NaN or Inf, we still worry that for negative values of x that
> > > >>>>> (x * 0.0) may be -0.0 and so perform the multiplication at
> > > >>>>> run-time. But in this case, the result of the comparison (-0.0
> > > >>>>> < y) will be exactly the same as (+0.0 < y) for any y, hence
> > > >>>>> the above may be safely constant folded to "0.0 <
> > > >>>> y"
> > > >>>>> avoiding the multiplication at run-time.
> 
> Ok, once the "frange" infrastructure is in place, it's actually trivial.  See attached
> patch and tests.  We can do everything with small range-op entries and evrp /
> ranger will handle everything else.
> 
> Roger, I believe this is what you described:
> 
> +// { dg-do compile }
> +// { dg-options "-O2 -fno-tree-forwprop -fno-thread-jumps
> -fdump-tree-evrp -fdump-tree-optimized" }
> +
> +extern void link_error ();
> +
> +int foo(int x, double y)
> +{
> +      return (x * 0.0) < y;
> +}
> +
> +// The multiply should be gone by *vrp time.
> +// { dg-final { scan-tree-dump-not " \\* " "evrp" } }
> +
> +// Ultimately, there sound be no references to "x".
> +// { dg-final { scan-tree-dump-not "x_" "optimized" } }
> 
> With the attached patch (and pending patches), we keep track that a cast from
> int to float is guaranteed to not be NaN, which allows us to fold x*0.0 into 0.0,
> and DCE to remove x altogether.  Also, as the other tests show, we are able to
> resolve __builtin_isnan's for the operations implemented.  It should be
> straightforward for someone with floating point foo to extend this to all
> operations.
> 
> Aldy
  

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 8b44b5c..f3f7865 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4671,6 +4671,35 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (single_use (@2))
     (cmp @0 @1)))))
 
+/* Floating point comparisons can ignore signed zeros.  */
+(for cmp (tcc_comparison)
+ /* Simplify (X - X) CMP Y even with -frounding-math.  */
+ (simplify
+  (cmp (minus @0 @0) @1)
+  (if (FLOAT_TYPE_P (TREE_TYPE (@0))
+       && !tree_expr_maybe_nan_p (@0)
+       && !tree_expr_maybe_infinite_p (@0))
+   (cmp { build_zero_cst (TREE_TYPE (@0)); } @1)))
+ /* Simplify X CMP (Y - Y) even with -frounding-math.  */
+ (simplify
+  (cmp @0 (minus @1 @1))
+  (if (FLOAT_TYPE_P (TREE_TYPE (@1))
+       && !tree_expr_maybe_nan_p (@1)
+       && !tree_expr_maybe_infinite_p (@1))
+   (cmp @0 { build_zero_cst (TREE_TYPE (@1)); })))
+ /* Simplify (X * 0.0) CMP Y.  */
+ (simplify
+  (cmp (mult @0 real_zerop@1) @2)
+  (if (!tree_expr_maybe_nan_p (@0)
+       && !tree_expr_maybe_infinite_p (@0))
+   (cmp @1 @2)))
+ /* Simplify X CMP (Y * 0.0).  */
+ (simplify
+  (cmp @0 (mult @1 real_zerop@2))
+  (if (!tree_expr_maybe_nan_p (@1)
+       && !tree_expr_maybe_infinite_p (@0))
+   (cmp @0 @2))))
+
 /* Simplify (x < 0) ^ (y < 0) to (x ^ y) < 0 and
    (x >= 0) ^ (y >= 0) to (x ^ y) < 0.  */
 (for cmp (lt ge)
@@ -4743,7 +4772,7 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (simplify
   (cmp @0 @0)
   (if (! FLOAT_TYPE_P (TREE_TYPE (@0))
-       || ! HONOR_NANS (@0))
+       || ! tree_expr_maybe_nan_p (@0))
    { constant_boolean_node (true, type); }
    (if (cmp != EQ_EXPR
 	/* With -ftrapping-math conversion to EQ loses an exception.  */
@@ -4755,7 +4784,7 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cmp @0 @0)
   (if (cmp != NE_EXPR
        || ! FLOAT_TYPE_P (TREE_TYPE (@0))
-       || ! HONOR_NANS (@0))
+       || ! tree_expr_maybe_nan_p (@0))
    { constant_boolean_node (false, type); })))
 (for cmp (unle unge uneq)
  (simplify
@@ -4767,7 +4796,7 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (unordered @0 @0)))
 (simplify
  (ltgt @0 @0)
- (if (!flag_trapping_math)
+ (if (!flag_trapping_math || !tree_expr_maybe_nan_p (@0))
   { constant_boolean_node (false, type); }))
 
 /* x == ~x -> false */
diff --git a/gcc/testsuite/gcc.dg/fold-compare-9.c b/gcc/testsuite/gcc.dg/fold-compare-9.c
new file mode 100644
index 0000000..e56f63d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-compare-9.c
@@ -0,0 +1,8 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(int x, double y) {
+    return (x * 0.0) < y;
+}
+
+/* { dg-final { scan-tree-dump " y_\[0-9\]\\(D\\) > 0.0" "optimized" } } */