Message ID  20221013123649.4744971aldyh@redhat.com 

State  New 
Headers 
From: aldyh@redhat.com (Aldy Hernandez) Date: Thu, 13 Oct 2022 14:36:49 +0200 Subject: [PATCH] [PR24021] Implement PLUS_EXPR rangeop entry for floats. MessageID: <20221013123649.4744971aldyh@redhat.com> 
Series 
[PR24021] Implement PLUS_EXPR rangeop entry for floats.


Commit Message
Aldy Hernandez
Oct. 13, 2022, 12:36 p.m. UTC
[Jakub, this is a cleaned up version of what we iterated on earlier this summer. It contains additional smarts to propagate NAN signs on entry. I'd like a nod before committing.] This is the rangeop entry for floating point PLUS_EXPR. It's the most intricate range entry we have so far, because we need to keep track of rounding and target FP formats. This will be the last FP entry I commit, mostly to avoid disturbing the tree any further, and also because what we have so far is enough for a solid VRP. So far we track NANs and signs correctly. We also handle relationals (symbolics and numeric), both ordered and unordered, ABS_EXPR and NEGATE_EXPR which are used to fold __builtin_isinf, and __builtin_sign (__builtin_copysign is coming up). All in all, I think this provide more than enough for basic VRP on floats, as well as provide a basis to flesh out the rest if there's interest. My goal with this entry is to provide a template for additional binary operators, as they tend to follow a similar pattern: handle NANs, do the arithmetic while keeping track of rounding, and adjust for NAN. I may abstract the general parts as we do for irange's fold_range and wi_fold. Oh yeah... and I'd like to finally close this PR ;). How does this look? PR treeoptimization/24021 gcc/ChangeLog: * rangeopfloat.cc (update_nan_sign): New. (propagate_nans): New. (frange_nextafter): New. (frange_arithmetic): New. (class foperator_plus): New. (floating_op_table::floating_op_table): Add PLUS_EXPR entry. gcc/testsuite/ChangeLog: * gcc.dg/treessa/vrpfloatplus.c: New test.  gcc/rangeopfloat.cc  171 ++++++++++++++++++ .../gcc.dg/treessa/vrpfloatplus.c  21 +++ 2 files changed, 192 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/treessa/vrpfloatplus.c
Comments
On 10/13/22 14:36, Aldy Hernandez via Gccpatches wrote:
> PR treeoptimization/24021
Ah  Verboten in Fortran:
$ cat d.f
DOUBLE PRECISION A, X
A = 0.0
DO X = 0.1, 1.0
A = A + X
ENDDO
END
$ gfortran d.f
d.f:3:9:
3  DO X = 0.1, 1.0
 1
Warning: Deleted feature: Loop variable at (1) must be integer
d.f:3:12:
3  DO X = 0.1, 1.0
 1
Warning: Deleted feature: Start expression in DO loop at (1) must be integer
d.f:3:17:
3  DO X = 0.1, 1.0
 1
Warning: Deleted feature: End expression in DO loop at (1) must be integer
I'm not following. My patch doesn't affect this behavior. What am I missing? Aldy On Thu, Oct 13, 2022 at 3:04 PM Toon Moene <toon@moene.org> wrote: > > On 10/13/22 14:36, Aldy Hernandez via Gccpatches wrote: > > > PR treeoptimization/24021 > > Ah  Verboten in Fortran: > > $ cat d.f > DOUBLE PRECISION A, X > A = 0.0 > DO X = 0.1, 1.0 > A = A + X > ENDDO > END > $ gfortran d.f > d.f:3:9: > > 3  DO X = 0.1, 1.0 >  1 > Warning: Deleted feature: Loop variable at (1) must be integer > d.f:3:12: > > 3  DO X = 0.1, 1.0 >  1 > Warning: Deleted feature: Start expression in DO loop at (1) must be integer > d.f:3:17: > > 3  DO X = 0.1, 1.0 >  1 > Warning: Deleted feature: End expression in DO loop at (1) must be integer > >  > Toon Moene  email: toon@moene.org  phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >
It was just a comment on the code of the PR ... Toon. On 10/13/22 15:44, Aldy Hernandez wrote: > I'm not following. My patch doesn't affect this behavior. > > What am I missing? > > Aldy > > On Thu, Oct 13, 2022 at 3:04 PM Toon Moene <toon@moene.org> wrote: >> >> On 10/13/22 14:36, Aldy Hernandez via Gccpatches wrote: >> >>> PR treeoptimization/24021 >> >> Ah  Verboten in Fortran: >> >> $ cat d.f >> DOUBLE PRECISION A, X >> A = 0.0 >> DO X = 0.1, 1.0 >> A = A + X >> ENDDO >> END >> $ gfortran d.f >> d.f:3:9: >> >> 3  DO X = 0.1, 1.0 >>  1 >> Warning: Deleted feature: Loop variable at (1) must be integer >> d.f:3:12: >> >> 3  DO X = 0.1, 1.0 >>  1 >> Warning: Deleted feature: Start expression in DO loop at (1) must be integer >> d.f:3:17: >> >> 3  DO X = 0.1, 1.0 >>  1 >> Warning: Deleted feature: End expression in DO loop at (1) must be integer >> >>  >> Toon Moene  email: toon@moene.org  phone: +31 346 214290 >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >> >
On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > +// Like real_arithmetic, but round the result to INF if the operation > +// produced inexact results. > +// > +// ?? There is still one problematic case, i387. With > +// fexcessprecision=standard we perform most SF/DFmode arithmetic in > +// XFmode (long_double_type_node), so that case is OK. But without > +// mfpmath=sse, all the SF/DFmode computations are in XFmode > +// precision (64bit mantissa) and only occassionally rounded to > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > +// this is ok as well though it is just occassionally more precise. ?? > + > +static void > +frange_arithmetic (enum tree_code code, tree type, > + REAL_VALUE_TYPE &result, > + const REAL_VALUE_TYPE &op1, > + const REAL_VALUE_TYPE &op2, > + const REAL_VALUE_TYPE &inf) > +{ > + REAL_VALUE_TYPE value; > + enum machine_mode mode = TYPE_MODE (type); > + bool mode_composite = MODE_COMPOSITE_P (mode); > + > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > + real_convert (&result, mode, &value); > + > + // If real_convert above has rounded an inexact value to towards > + // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp > + // later (real_nextafter). > + bool rounding = (flag_rounding_math > + && (real_isneg (&inf) > + ? real_less (&result, &value) > + : !real_less (&value, &result))); I thought the agreement during Cauldron was that we'd do this always, regardless of flag_rounding_math. Because excess precision (the fast one like on ia32 or mfpmath=387 on x86_64), or froundingmath, or FMA contraction can all increase precision and worst case it all behaves like froundingmath for the ranges. So, perhaps use: if ((mode_composite  (real_isneg (&inf) ? real_less (&result, &value) : !real_less (&value, &result)) && (inexact  !real_identical (&result, &value)))) ? No need to do the real_isneg/real_less stuff for mode_composite, then we do it always for inexacts, but otherwise we check if the rounding performed by real.cc has been in the conservative direction (for upper bound to +inf, for lower bound to inf), if yes, we don't need to do anything, if yes, we frange_nextafter. As discussed, for mode_composite, I think we want to do the extra stuff for inexact denormals and otherwise do the nextafter unconditionally, because our internal mode_composite representation isn't precise enough. > + // Be extra careful if there may be discrepancies between the > + // compile and runtime results. > + if ((rounding  mode_composite) > + && (inexact  !real_identical (&result, &value))) > + { > + if (mode_composite) > + { > + bool denormal = (result.sig[SIGSZ1] & SIG_MSB) == 0; Use real_isdenormal here? Though, real_iszero needs the same thing. > + if (denormal) > + { > + REAL_VALUE_TYPE tmp; And explain here why is this, that IBM extended denormals have just DFmode precision. Though, now that I think about it, while this is correct for denormals, > + real_convert (&tmp, DFmode, &value); > + frange_nextafter (DFmode, tmp, inf); > + real_convert (&result, mode, &tmp); > + } there are also the cases where the higher double exponent is in the [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [1021, 968] or so. https://en.wikipedia.org/wiki/Doubleprecision_floatingpoint_format If the upper double is denormal in the DFmode sense, so smaller absolute value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to do, the lower double must be always +/ zero. Now, if the result is __DBL_MIN__, the upper double is already normalized but we can add __DBL_DENORM_MIN__ to it, which will make the number have 54bit precision. If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__ and make it 55bit precision. Etc. until we reach __DBL_MIN__ * 2e53 where it acts like fully normalized 106bit precision number. I must say I'm not really sure what real_nextafter is doing in those cases, I'm afraid it doesn't handle it correctly but the only other use of real_nextafter is guarded with: /* Don't handle composite modes, nor decimal, nor modes without inf or denorm at least for now. */ if (format>pnan < format>p  format>b == 10  !format>has_inf  !format>has_denorm) return false; so it isn't that big deal except for ranges. Jakub
On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > > +// Like real_arithmetic, but round the result to INF if the operation > > +// produced inexact results. > > +// > > +// ?? There is still one problematic case, i387. With > > +// fexcessprecision=standard we perform most SF/DFmode arithmetic in > > +// XFmode (long_double_type_node), so that case is OK. But without > > +// mfpmath=sse, all the SF/DFmode computations are in XFmode > > +// precision (64bit mantissa) and only occassionally rounded to > > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > > +// this is ok as well though it is just occassionally more precise. ?? > > + > > +static void > > +frange_arithmetic (enum tree_code code, tree type, > > + REAL_VALUE_TYPE &result, > > + const REAL_VALUE_TYPE &op1, > > + const REAL_VALUE_TYPE &op2, > > + const REAL_VALUE_TYPE &inf) > > +{ > > + REAL_VALUE_TYPE value; > > + enum machine_mode mode = TYPE_MODE (type); > > + bool mode_composite = MODE_COMPOSITE_P (mode); > > + > > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > > + real_convert (&result, mode, &value); > > + > > + // If real_convert above has rounded an inexact value to towards > > + // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp > > + // later (real_nextafter). > > + bool rounding = (flag_rounding_math > > + && (real_isneg (&inf) > > + ? real_less (&result, &value) > > + : !real_less (&value, &result))); > > I thought the agreement during Cauldron was that we'd do this always, > regardless of flag_rounding_math. > Because excess precision (the fast one like on ia32 or mfpmath=387 on > x86_64), or froundingmath, or FMA contraction can all increase precision > and worst case it all behaves like froundingmath for the ranges. > > So, perhaps use: > if ((mode_composite  (real_isneg (&inf) ? real_less (&result, &value) > : !real_less (&value, &result)) > && (inexact  !real_identical (&result, &value)))) Done. > ? > No need to do the real_isneg/real_less stuff for mode_composite, then > we do it always for inexacts, but otherwise we check if the rounding > performed by real.cc has been in the conservative direction (for upper > bound to +inf, for lower bound to inf), if yes, we don't need to do > anything, if yes, we frange_nextafter. > > As discussed, for mode_composite, I think we want to do the extra > stuff for inexact denormals and otherwise do the nextafter unconditionally, > because our internal mode_composite representation isn't precise enough. > > > + // Be extra careful if there may be discrepancies between the > > + // compile and runtime results. > > + if ((rounding  mode_composite) > > + && (inexact  !real_identical (&result, &value))) > > + { > > + if (mode_composite) > > + { > > + bool denormal = (result.sig[SIGSZ1] & SIG_MSB) == 0; > > Use real_isdenormal here? Done. > Though, real_iszero needs the same thing. So... real_isdenormal()  real_iszero() as in the attached patch? > > > + if (denormal) > > + { > > + REAL_VALUE_TYPE tmp; > > And explain here why is this, that IBM extended denormals have just > DFmode precision. Done. > Though, now that I think about it, while this is correct for denormals, > > > + real_convert (&tmp, DFmode, &value); > > + frange_nextafter (DFmode, tmp, inf); > > + real_convert (&result, mode, &tmp); > > + } > > there are also the cases where the higher double exponent is in the > [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [1021, 968] or so. > https://en.wikipedia.org/wiki/Doubleprecision_floatingpoint_format > If the upper double is denormal in the DFmode sense, so smaller absolute > value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to > do, the lower double must be always +/ zero. > Now, if the result is __DBL_MIN__, the upper double is already normalized > but we can add __DBL_DENORM_MIN__ to it, which will make the number have > 54bit precision. > If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__ > and make it 55bit precision. Etc. until we reach __DBL_MIN__ * 2e53 > where it acts like fully normalized 106bit precision number. > I must say I'm not really sure what real_nextafter is doing in those cases, > I'm afraid it doesn't handle it correctly but the only other use > of real_nextafter is guarded with: > /* Don't handle composite modes, nor decimal, nor modes without > inf or denorm at least for now. */ > if (format>pnan < format>p >  format>b == 10 >  !format>has_inf >  !format>has_denorm) > return false; Dunno. Is there a conservative thing we can do for mode_composites that aren't denormal or zero? How does this look? Aldy
PING On Mon, Oct 17, 2022 at 8:21 AM Aldy Hernandez <aldyh@redhat.com> wrote: > > On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > > > +// Like real_arithmetic, but round the result to INF if the operation > > > +// produced inexact results. > > > +// > > > +// ?? There is still one problematic case, i387. With > > > +// fexcessprecision=standard we perform most SF/DFmode arithmetic in > > > +// XFmode (long_double_type_node), so that case is OK. But without > > > +// mfpmath=sse, all the SF/DFmode computations are in XFmode > > > +// precision (64bit mantissa) and only occassionally rounded to > > > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > > > +// this is ok as well though it is just occassionally more precise. ?? > > > + > > > +static void > > > +frange_arithmetic (enum tree_code code, tree type, > > > + REAL_VALUE_TYPE &result, > > > + const REAL_VALUE_TYPE &op1, > > > + const REAL_VALUE_TYPE &op2, > > > + const REAL_VALUE_TYPE &inf) > > > +{ > > > + REAL_VALUE_TYPE value; > > > + enum machine_mode mode = TYPE_MODE (type); > > > + bool mode_composite = MODE_COMPOSITE_P (mode); > > > + > > > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > > > + real_convert (&result, mode, &value); > > > + > > > + // If real_convert above has rounded an inexact value to towards > > > + // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp > > > + // later (real_nextafter). > > > + bool rounding = (flag_rounding_math > > > + && (real_isneg (&inf) > > > + ? real_less (&result, &value) > > > + : !real_less (&value, &result))); > > > > I thought the agreement during Cauldron was that we'd do this always, > > regardless of flag_rounding_math. > > Because excess precision (the fast one like on ia32 or mfpmath=387 on > > x86_64), or froundingmath, or FMA contraction can all increase precision > > and worst case it all behaves like froundingmath for the ranges. > > > > So, perhaps use: > > if ((mode_composite  (real_isneg (&inf) ? real_less (&result, &value) > > : !real_less (&value, &result)) > > && (inexact  !real_identical (&result, &value)))) > > Done. > > > ? > > No need to do the real_isneg/real_less stuff for mode_composite, then > > we do it always for inexacts, but otherwise we check if the rounding > > performed by real.cc has been in the conservative direction (for upper > > bound to +inf, for lower bound to inf), if yes, we don't need to do > > anything, if yes, we frange_nextafter. > > > > As discussed, for mode_composite, I think we want to do the extra > > stuff for inexact denormals and otherwise do the nextafter unconditionally, > > because our internal mode_composite representation isn't precise enough. > > > > > + // Be extra careful if there may be discrepancies between the > > > + // compile and runtime results. > > > + if ((rounding  mode_composite) > > > + && (inexact  !real_identical (&result, &value))) > > > + { > > > + if (mode_composite) > > > + { > > > + bool denormal = (result.sig[SIGSZ1] & SIG_MSB) == 0; > > > > Use real_isdenormal here? > > Done. > > > Though, real_iszero needs the same thing. > > So... real_isdenormal()  real_iszero() as in the attached patch? > > > > > > + if (denormal) > > > + { > > > + REAL_VALUE_TYPE tmp; > > > > And explain here why is this, that IBM extended denormals have just > > DFmode precision. > > Done. > > > Though, now that I think about it, while this is correct for denormals, > > > > > + real_convert (&tmp, DFmode, &value); > > > + frange_nextafter (DFmode, tmp, inf); > > > + real_convert (&result, mode, &tmp); > > > + } > > > > there are also the cases where the higher double exponent is in the > > [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [1021, 968] or so. > > https://en.wikipedia.org/wiki/Doubleprecision_floatingpoint_format > > If the upper double is denormal in the DFmode sense, so smaller absolute > > value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to > > do, the lower double must be always +/ zero. > > Now, if the result is __DBL_MIN__, the upper double is already normalized > > but we can add __DBL_DENORM_MIN__ to it, which will make the number have > > 54bit precision. > > If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__ > > and make it 55bit precision. Etc. until we reach __DBL_MIN__ * 2e53 > > where it acts like fully normalized 106bit precision number. > > I must say I'm not really sure what real_nextafter is doing in those cases, > > I'm afraid it doesn't handle it correctly but the only other use > > of real_nextafter is guarded with: > > /* Don't handle composite modes, nor decimal, nor modes without > > inf or denorm at least for now. */ > > if (format>pnan < format>p > >  format>b == 10 > >  !format>has_inf > >  !format>has_denorm) > > return false; > > Dunno. Is there a conservative thing we can do for mode_composites > that aren't denormal or zero? > > How does this look? > Aldy
On 10/24/22 00:04, Aldy Hernandez via Gccpatches wrote:
> PING
I'd be a lot more comfortable if Jakub would chime in here.
Jeff
Ping ping On Mon, Oct 24, 2022, 08:04 Aldy Hernandez <aldyh@redhat.com> wrote: > PING > > On Mon, Oct 17, 2022 at 8:21 AM Aldy Hernandez <aldyh@redhat.com> wrote: > > > > On Thu, Oct 13, 2022 at 7:57 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > > > On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote: > > > > +// Like real_arithmetic, but round the result to INF if the > operation > > > > +// produced inexact results. > > > > +// > > > > +// ?? There is still one problematic case, i387. With > > > > +// fexcessprecision=standard we perform most SF/DFmode arithmetic > in > > > > +// XFmode (long_double_type_node), so that case is OK. But without > > > > +// mfpmath=sse, all the SF/DFmode computations are in XFmode > > > > +// precision (64bit mantissa) and only occassionally rounded to > > > > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > > > > +// this is ok as well though it is just occassionally more precise. > ?? > > > > + > > > > +static void > > > > +frange_arithmetic (enum tree_code code, tree type, > > > > + REAL_VALUE_TYPE &result, > > > > + const REAL_VALUE_TYPE &op1, > > > > + const REAL_VALUE_TYPE &op2, > > > > + const REAL_VALUE_TYPE &inf) > > > > +{ > > > > + REAL_VALUE_TYPE value; > > > > + enum machine_mode mode = TYPE_MODE (type); > > > > + bool mode_composite = MODE_COMPOSITE_P (mode); > > > > + > > > > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > > > > + real_convert (&result, mode, &value); > > > > + > > > > + // If real_convert above has rounded an inexact value to towards > > > > + // inf, we can keep the result as is, otherwise we'll adjust by 1 > ulp > > > > + // later (real_nextafter). > > > > + bool rounding = (flag_rounding_math > > > > + && (real_isneg (&inf) > > > > + ? real_less (&result, &value) > > > > + : !real_less (&value, &result))); > > > > > > I thought the agreement during Cauldron was that we'd do this always, > > > regardless of flag_rounding_math. > > > Because excess precision (the fast one like on ia32 or mfpmath=387 on > > > x86_64), or froundingmath, or FMA contraction can all increase > precision > > > and worst case it all behaves like froundingmath for the ranges. > > > > > > So, perhaps use: > > > if ((mode_composite  (real_isneg (&inf) ? real_less (&result, > &value) > > > : !real_less (&value, > &result)) > > > && (inexact  !real_identical (&result, &value)))) > > > > Done. > > > > > ? > > > No need to do the real_isneg/real_less stuff for mode_composite, then > > > we do it always for inexacts, but otherwise we check if the rounding > > > performed by real.cc has been in the conservative direction (for upper > > > bound to +inf, for lower bound to inf), if yes, we don't need to do > > > anything, if yes, we frange_nextafter. > > > > > > As discussed, for mode_composite, I think we want to do the extra > > > stuff for inexact denormals and otherwise do the nextafter > unconditionally, > > > because our internal mode_composite representation isn't precise > enough. > > > > > > > + // Be extra careful if there may be discrepancies between the > > > > + // compile and runtime results. > > > > + if ((rounding  mode_composite) > > > > + && (inexact  !real_identical (&result, &value))) > > > > + { > > > > + if (mode_composite) > > > > + { > > > > + bool denormal = (result.sig[SIGSZ1] & SIG_MSB) == 0; > > > > > > Use real_isdenormal here? > > > > Done. > > > > > Though, real_iszero needs the same thing. > > > > So... real_isdenormal()  real_iszero() as in the attached patch? > > > > > > > > > + if (denormal) > > > > + { > > > > + REAL_VALUE_TYPE tmp; > > > > > > And explain here why is this, that IBM extended denormals have just > > > DFmode precision. > > > > Done. > > > > > Though, now that I think about it, while this is correct for denormals, > > > > > > > + real_convert (&tmp, DFmode, &value); > > > > + frange_nextafter (DFmode, tmp, inf); > > > > + real_convert (&result, mode, &tmp); > > > > + } > > > > > > there are also the cases where the higher double exponent is in the > > > [__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [1021, 968] or so. > > > https://en.wikipedia.org/wiki/Doubleprecision_floatingpoint_format > > > If the upper double is denormal in the DFmode sense, so smaller > absolute > > > value than __DBL_MIN__, then doing nextafter in DFmode is the right > thing to > > > do, the lower double must be always +/ zero. > > > Now, if the result is __DBL_MIN__, the upper double is already > normalized > > > but we can add __DBL_DENORM_MIN__ to it, which will make the number > have > > > 54bit precision. > > > If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__ > > > and make it 55bit precision. Etc. until we reach __DBL_MIN__ * 2e53 > > > where it acts like fully normalized 106bit precision number. > > > I must say I'm not really sure what real_nextafter is doing in those > cases, > > > I'm afraid it doesn't handle it correctly but the only other use > > > of real_nextafter is guarded with: > > > /* Don't handle composite modes, nor decimal, nor modes without > > > inf or denorm at least for now. */ > > > if (format>pnan < format>p > > >  format>b == 10 > > >  !format>has_inf > > >  !format>has_denorm) > > > return false; > > > > Dunno. Is there a conservative thing we can do for mode_composites > > that aren't denormal or zero? > > > > How does this look? > > Aldy >
On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote: >  a/gcc/rangeopfloat.cc > +++ b/gcc/rangeopfloat.cc > @@ 200,6 +200,116 @@ frelop_early_resolve (irange &r, tree type, > && relop_early_resolve (r, type, op1, op2, rel, my_rel)); > } > > +// If R contains a NAN of unknown sign, update the NAN's signbit > +// depending on two operands. > + > +inline void > +update_nan_sign (frange &r, const frange &op1, const frange &op2) > +{ > + if (!r.maybe_isnan ()) > + return; > + > + bool op1_nan = op1.maybe_isnan (); > + bool op2_nan = op2.maybe_isnan (); > + bool sign1, sign2; > + > + gcc_checking_assert (!r.nan_signbit_p (sign1)); > + if (op1_nan && op2_nan) > + { > + if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2)) > + r.update_nan (sign1  sign2); > + } > + else if (op1_nan) > + { > + if (op1.nan_signbit_p (sign1)) > + r.update_nan (sign1); > + } > + else if (op2_nan) > + { > + if (op2.nan_signbit_p (sign2)) > + r.update_nan (sign2); > + } > +} IEEE 7542008 says: "When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however, that operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation." so, while one can e.g. see on x86_64 that in simple O0 int main () { volatile float f1 = __builtin_nansf (""); volatile float f2 = __builtin_copysignf (__builtin_nansf (""), 1.0f); volatile float f3 = __builtin_nanf (""); volatile float f4 = __builtin_copysignf (__builtin_nanf (""), 1.0f); volatile float fzero = 0.0f; __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2); __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4); f1 = f1; f2 = f2; f3 = f3; f4 = f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f1, *(unsigned *)&f2); __builtin_printf ("%08x %08x\n", *(unsigned *)&f3, *(unsigned *)&f4); volatile float f5 = f1 + fzero; volatile float f6 = fzero + f1; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f2 + fzero; f6 = fzero + f2; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f3 + fzero; f6 = fzero + f3; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f4 + fzero; f6 = fzero + f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f1 + f2; f6 = f2 + f1; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f2 + f3; f6 = f3 + f2; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f3 + f4; f6 = f4 + f3; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); f5 = f4 + f1; f6 = f1 + f4; __builtin_printf ("%08x %08x\n", *(unsigned *)&f5, *(unsigned *)&f6); return 0; } result of: 7fa00000 ffa00000 7fc00000 ffc00000 ffa00000 7fa00000 ffc00000 7fc00000 ffe00000 ffe00000 7fe00000 7fe00000 ffc00000 ffc00000 7fc00000 7fc00000 7fe00000 ffe00000 ffc00000 7fe00000 7fc00000 ffc00000 ffe00000 7fc00000 which basically shows that copysign copies sign bit, negation toggles it, binary operation with a single NaN (quiet or signaling) get that NaN with its sign and for binary operation on 2 NaNs (again quiet or signaling) one gets sign from the second? operand, I think the above IEEE text doesn't guarantee it except for a simple assignment (but with no mode conversion; that one copies the bit), NEGATE_EXPR (toggles it), ABS_EXPR (clears it), __builtin_copysign*/IFN_COPYSIGN (copies it from the second operand). Everything else, including invalid operation cases, should set the possible sign bit values of NANs to 0/1 rather than just one of them. Perhaps COND_EXPR 2nd/3rd operand is a move too. As you are adding a binary operation, that should be one of those cases where we drop the NAN sign to VARYING. > + > +// If either operand is a NAN, set R to the combination of both NANs > +// signwise and return TRUE. > + > +inline bool > +propagate_nans (frange &r, const frange &op1, const frange &op2) > +{ > + if (op1.known_isnan ()  op2.known_isnan ()) > + { > + r.set_nan (op1.type ()); > + update_nan_sign (r, op1, op2); > + return true; > + } > + return false; > +} > + > +// Set VALUE to its next real value, or INF if the operation overflows. > + > +inline void > +frange_nextafter (enum machine_mode mode, > + REAL_VALUE_TYPE &value, > + const REAL_VALUE_TYPE &inf) > +{ > + const real_format *fmt = REAL_MODE_FORMAT (mode); > + REAL_VALUE_TYPE tmp; > + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); > + if (overflow) > + value = inf; > + else > + value = tmp; > +} > + > +// Like real_arithmetic, but round the result to INF if the operation > +// produced inexact results. > +// > +// ?? There is still one problematic case, i387. With > +// fexcessprecision=standard we perform most SF/DFmode arithmetic in > +// XFmode (long_double_type_node), so that case is OK. But without > +// mfpmath=sse, all the SF/DFmode computations are in XFmode > +// precision (64bit mantissa) and only occassionally rounded to > +// SF/DFmode (when storing into memory from the 387 stack). Maybe > +// this is ok as well though it is just occassionally more precise. ?? > + > +static void > +frange_arithmetic (enum tree_code code, tree type, > + REAL_VALUE_TYPE &result, > + const REAL_VALUE_TYPE &op1, > + const REAL_VALUE_TYPE &op2, > + const REAL_VALUE_TYPE &inf) > +{ > + REAL_VALUE_TYPE value; > + enum machine_mode mode = TYPE_MODE (type); > + bool mode_composite = MODE_COMPOSITE_P (mode); > + > + bool inexact = real_arithmetic (&value, code, &op1, &op2); > + real_convert (&result, mode, &value); > + > + // Be extra careful if there may be discrepancies between the > + // compile and runtime results. > + if ((mode_composite  (real_isneg (&inf) ? real_less (&result, &value) > + : !real_less (&value, &result))) > + && (inexact  !real_identical (&result, &value))) > + { > + if (mode_composite) > + { > + if (real_isdenormal (&result) > +  real_iszero (&result)) > + { > + // IBM extended denormals only have DFmode precision. > + REAL_VALUE_TYPE tmp; > + real_convert (&tmp, DFmode, &value); > + frange_nextafter (DFmode, tmp, inf); > + real_convert (&result, mode, &tmp); > + return; > + } As discussed before, this might not be correct for the larger double double denormals (is correct for real_iszero). I'll try to play with it in selftests and compare with what one gets from libm nextafterl: int main () { long double a = __builtin_nextafterl (0.0L, 1.0L); __builtin_printf ("%La\n", a); for (int i = 0; i < 108; i++, a *= 2.0L) __builtin_printf ("%d %La %La\n", i, a, __builtin_nextafterl (a, 1.0L)); } 0x0.0000000000001p1022 0 0x0.0000000000001p1022 0x0.0000000000002p1022 1 0x0.0000000000002p1022 0x0.0000000000003p1022 2 0x0.0000000000004p1022 0x0.0000000000005p1022 3 0x0.0000000000008p1022 0x0.0000000000009p1022 4 0x0.000000000001p1022 0x0.0000000000011p1022 5 0x0.000000000002p1022 0x0.0000000000021p1022 6 0x0.000000000004p1022 0x0.0000000000041p1022 7 0x0.000000000008p1022 0x0.0000000000081p1022 8 0x0.00000000001p1022 0x0.0000000000101p1022 9 0x0.00000000002p1022 0x0.0000000000201p1022 10 0x0.00000000004p1022 0x0.0000000000401p1022 11 0x0.00000000008p1022 0x0.0000000000801p1022 12 0x0.0000000001p1022 0x0.0000000001001p1022 13 0x0.0000000002p1022 0x0.0000000002001p1022 14 0x0.0000000004p1022 0x0.0000000004001p1022 15 0x0.0000000008p1022 0x0.0000000008001p1022 16 0x0.000000001p1022 0x0.0000000010001p1022 17 0x0.000000002p1022 0x0.0000000020001p1022 18 0x0.000000004p1022 0x0.0000000040001p1022 19 0x0.000000008p1022 0x0.0000000080001p1022 20 0x0.00000001p1022 0x0.0000000100001p1022 21 0x0.00000002p1022 0x0.0000000200001p1022 22 0x0.00000004p1022 0x0.0000000400001p1022 23 0x0.00000008p1022 0x0.0000000800001p1022 24 0x0.0000001p1022 0x0.0000001000001p1022 25 0x0.0000002p1022 0x0.0000002000001p1022 26 0x0.0000004p1022 0x0.0000004000001p1022 27 0x0.0000008p1022 0x0.0000008000001p1022 28 0x0.000001p1022 0x0.0000010000001p1022 29 0x0.000002p1022 0x0.0000020000001p1022 30 0x0.000004p1022 0x0.0000040000001p1022 31 0x0.000008p1022 0x0.0000080000001p1022 32 0x0.00001p1022 0x0.0000100000001p1022 33 0x0.00002p1022 0x0.0000200000001p1022 34 0x0.00004p1022 0x0.0000400000001p1022 35 0x0.00008p1022 0x0.0000800000001p1022 36 0x0.0001p1022 0x0.0001000000001p1022 37 0x0.0002p1022 0x0.0002000000001p1022 38 0x0.0004p1022 0x0.0004000000001p1022 39 0x0.0008p1022 0x0.0008000000001p1022 40 0x0.001p1022 0x0.0010000000001p1022 41 0x0.002p1022 0x0.0020000000001p1022 42 0x0.004p1022 0x0.0040000000001p1022 43 0x0.008p1022 0x0.0080000000001p1022 44 0x0.01p1022 0x0.0100000000001p1022 45 0x0.02p1022 0x0.0200000000001p1022 46 0x0.04p1022 0x0.0400000000001p1022 47 0x0.08p1022 0x0.0800000000001p1022 48 0x0.1p1022 0x0.1000000000001p1022 49 0x0.2p1022 0x0.2000000000001p1022 50 0x0.4p1022 0x0.4000000000001p1022 51 0x0.8p1022 0x0.8000000000001p1022 52 0x1p1022 0x1.0000000000001p1022 53 0x1p1021 0x1.00000000000008p1021 54 0x1p1020 0x1.00000000000004p1020 55 0x1p1019 0x1.00000000000002p1019 56 0x1p1018 0x1.00000000000001p1018 57 0x1p1017 0x1.000000000000008p1017 58 0x1p1016 0x1.000000000000004p1016 59 0x1p1015 0x1.000000000000002p1015 60 0x1p1014 0x1.000000000000001p1014 61 0x1p1013 0x1.0000000000000008p1013 62 0x1p1012 0x1.0000000000000004p1012 63 0x1p1011 0x1.0000000000000002p1011 64 0x1p1010 0x1.0000000000000001p1010 65 0x1p1009 0x1.00000000000000008p1009 66 0x1p1008 0x1.00000000000000004p1008 67 0x1p1007 0x1.00000000000000002p1007 68 0x1p1006 0x1.00000000000000001p1006 69 0x1p1005 0x1.000000000000000008p1005 70 0x1p1004 0x1.000000000000000004p1004 71 0x1p1003 0x1.000000000000000002p1003 72 0x1p1002 0x1.000000000000000001p1002 73 0x1p1001 0x1.0000000000000000008p1001 74 0x1p1000 0x1.0000000000000000004p1000 75 0x1p999 0x1.0000000000000000002p999 76 0x1p998 0x1.0000000000000000001p998 77 0x1p997 0x1.00000000000000000008p997 78 0x1p996 0x1.00000000000000000004p996 79 0x1p995 0x1.00000000000000000002p995 80 0x1p994 0x1.00000000000000000001p994 81 0x1p993 0x1.000000000000000000008p993 82 0x1p992 0x1.000000000000000000004p992 83 0x1p991 0x1.000000000000000000002p991 84 0x1p990 0x1.000000000000000000001p990 85 0x1p989 0x1.0000000000000000000008p989 86 0x1p988 0x1.0000000000000000000004p988 87 0x1p987 0x1.0000000000000000000002p987 88 0x1p986 0x1.0000000000000000000001p986 89 0x1p985 0x1.00000000000000000000008p985 90 0x1p984 0x1.00000000000000000000004p984 91 0x1p983 0x1.00000000000000000000002p983 92 0x1p982 0x1.00000000000000000000001p982 93 0x1p981 0x1.000000000000000000000008p981 94 0x1p980 0x1.000000000000000000000004p980 95 0x1p979 0x1.000000000000000000000002p979 96 0x1p978 0x1.000000000000000000000001p978 97 0x1p977 0x1.0000000000000000000000008p977 98 0x1p976 0x1.0000000000000000000000004p976 99 0x1p975 0x1.0000000000000000000000002p975 100 0x1p974 0x1.0000000000000000000000001p974 101 0x1p973 0x1.00000000000000000000000008p973 102 0x1p972 0x1.00000000000000000000000004p972 103 0x1p971 0x1.00000000000000000000000002p971 104 0x1p970 0x1.00000000000000000000000001p970 105 0x1p969 0x1.000000000000000000000000008p969 106 0x1p968 0x1.000000000000000000000000008p968 107 0x1p967 0x1.000000000000000000000000008p967 Will need to find out which of these numbers GCC real_isdenormal actually treats as denormals and if there are any which aren't. Powerpc64le mlongdouble128 mabi=ibmlongdouble gcc certain prints: #define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e308L) #define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e324L) #define __LDBL_MIN__ 2.00416836000897277799610805135016205e292L #define __LDBL_DENORM_MIN__ 4.94065645841246544176568792868221372e324L and so the LDBL denorm min is correctly the same as double denorm min, while minimum normal is actually quite larger for long double than double (also ok, but unusual, as e.g. __DBL_MIN__ is much smaller than __FLT_MIN__ and __FLT128_MIN__ much smaller than __DBL_MIN__). Note, seems glibc nextafterl for IBM double double is actually adding __DBL_DENORM_MIN__ to the value for the small ones, so perhaps we should do that too and not convert to DFmode and back. Jakub
On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote: > +// Set VALUE to its next real value, or INF if the operation overflows. > + > +inline void > +frange_nextafter (enum machine_mode mode, > + REAL_VALUE_TYPE &value, > + const REAL_VALUE_TYPE &inf) > +{ > + const real_format *fmt = REAL_MODE_FORMAT (mode); > + REAL_VALUE_TYPE tmp; > + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); > + if (overflow) > + value = inf; > + else > + value = tmp; Please change the above 5 lines to just real_nextafter (&tmp, fmt, &value, &inf); value = tmp; What real_nextafter returns isn't bool whether it overflowed, but a bool whether some exception should be raised for the nextafter, which can be either an overflow (in that case tmp will be already set to infinity of the right sign: if (REAL_EXP (r) > fmt>emax) { get_inf (r, x>sign); return true; } or underflow to zero: if (REAL_EXP (r) <= fmt>emin  fmt>p) { get_zero (r, x>sign); return true; } or when the result is subnormal: return r>cl == rvc_zero  REAL_EXP (r) < fmt>emin; But we shouldn't care about exceptions, we aren't emitting code, just estimating floating point range, and +inf, or +0, or subnormal are the right answers. Just for playing with this, I've tried:  rangeopfloat.cc.jj 20221102 10:06:02.620960861 +0100 +++ rangeopfloat.cc 20221104 19:49:14.188075070 +0100 @@ 1815,6 +1815,20 @@ frange_float (const char *lb, const char return frange (type, min, max); } +inline void +frange_nextafter (enum machine_mode mode, + REAL_VALUE_TYPE &value, + const REAL_VALUE_TYPE &inf) +{ + const real_format *fmt = REAL_MODE_FORMAT (mode); + REAL_VALUE_TYPE tmp; + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); + if (overflow) + value = inf; + else + value = tmp; +} + void range_op_float_tests () { @@ 1833,6 +1847,30 @@ range_op_float_tests () r1 = frange_float ("1", "0"); r1.update_nan (false); ASSERT_EQ (r, r1); + + if (MODE_COMPOSITE_P (TYPE_MODE (long_double_type_node))) + { + enum machine_mode mode = TYPE_MODE (long_double_type_node); + REAL_VALUE_TYPE result = dconst0; + { + REAL_VALUE_TYPE tmp; + real_convert (&tmp, DFmode, &result); + frange_nextafter (DFmode, tmp, dconstinf); + real_convert (&result, mode, &tmp); + } + for (int i = 0; i < 108; ++i) + { + char buf[1024], buf2[1024]; + REAL_VALUE_TYPE tmp, tmp2; + real_convert (&tmp, DFmode, &result); + frange_nextafter (DFmode, tmp, dconstinf); + real_convert (&tmp2, mode, &tmp); + real_to_hexadecimal (buf, &result, 1024, 0, 1); + real_to_hexadecimal (buf2, &tmp2, 1024, 0, 1); + fprintf (stderr, "%d %d %s %s\n", i, real_isdenormal (&result), buf, buf2); + real_arithmetic (&result, MULT_EXPR, &result, &dconst2); + } + } } } // namespace selftest in cross to powerpc64lelinux and I get: 0 0 0x0.8p1073 0x0.8p1072 1 0 0x0.8p1072 0x0.cp1072 2 0 0x0.8p1071 0x0.ap1071 3 0 0x0.8p1070 0x0.9p1070 4 0 0x0.8p1069 0x0.88p1069 5 0 0x0.8p1068 0x0.84p1068 6 0 0x0.8p1067 0x0.82p1067 7 0 0x0.8p1066 0x0.81p1066 8 0 0x0.8p1065 0x0.808p1065 9 0 0x0.8p1064 0x0.804p1064 10 0 0x0.8p1063 0x0.802p1063 11 0 0x0.8p1062 0x0.801p1062 12 0 0x0.8p1061 0x0.8008p1061 13 0 0x0.8p1060 0x0.8004p1060 14 0 0x0.8p1059 0x0.8002p1059 15 0 0x0.8p1058 0x0.8001p1058 16 0 0x0.8p1057 0x0.80008p1057 17 0 0x0.8p1056 0x0.80004p1056 18 0 0x0.8p1055 0x0.80002p1055 19 0 0x0.8p1054 0x0.80001p1054 20 0 0x0.8p1053 0x0.800008p1053 21 0 0x0.8p1052 0x0.800004p1052 22 0 0x0.8p1051 0x0.800002p1051 23 0 0x0.8p1050 0x0.800001p1050 24 0 0x0.8p1049 0x0.8000008p1049 25 0 0x0.8p1048 0x0.8000004p1048 26 0 0x0.8p1047 0x0.8000002p1047 27 0 0x0.8p1046 0x0.8000001p1046 28 0 0x0.8p1045 0x0.80000008p1045 29 0 0x0.8p1044 0x0.80000004p1044 30 0 0x0.8p1043 0x0.80000002p1043 31 0 0x0.8p1042 0x0.80000001p1042 32 0 0x0.8p1041 0x0.800000008p1041 33 0 0x0.8p1040 0x0.800000004p1040 34 0 0x0.8p1039 0x0.800000002p1039 35 0 0x0.8p1038 0x0.800000001p1038 36 0 0x0.8p1037 0x0.8000000008p1037 37 0 0x0.8p1036 0x0.8000000004p1036 38 0 0x0.8p1035 0x0.8000000002p1035 39 0 0x0.8p1034 0x0.8000000001p1034 40 0 0x0.8p1033 0x0.80000000008p1033 41 0 0x0.8p1032 0x0.80000000004p1032 42 0 0x0.8p1031 0x0.80000000002p1031 43 0 0x0.8p1030 0x0.80000000001p1030 44 0 0x0.8p1029 0x0.800000000008p1029 45 0 0x0.8p1028 0x0.800000000004p1028 46 0 0x0.8p1027 0x0.800000000002p1027 47 0 0x0.8p1026 0x0.800000000001p1026 48 0 0x0.8p1025 0x0.8000000000008p1025 49 0 0x0.8p1024 0x0.8000000000004p1024 50 0 0x0.8p1023 0x0.8000000000002p1023 51 0 0x0.8p1022 0x0.8000000000001p1022 52 0 0x0.8p1021 0x0.80000000000008p1021 53 0 0x0.8p1020 0x0.80000000000008p1020 54 0 0x0.8p1019 0x0.80000000000008p1019 55 0 0x0.8p1018 0x0.80000000000008p1018 56 0 0x0.8p1017 0x0.80000000000008p1017 57 0 0x0.8p1016 0x0.80000000000008p1016 58 0 0x0.8p1015 0x0.80000000000008p1015 59 0 0x0.8p1014 0x0.80000000000008p1014 60 0 0x0.8p1013 0x0.80000000000008p1013 61 0 0x0.8p1012 0x0.80000000000008p1012 62 0 0x0.8p1011 0x0.80000000000008p1011 63 0 0x0.8p1010 0x0.80000000000008p1010 64 0 0x0.8p1009 0x0.80000000000008p1009 65 0 0x0.8p1008 0x0.80000000000008p1008 66 0 0x0.8p1007 0x0.80000000000008p1007 67 0 0x0.8p1006 0x0.80000000000008p1006 68 0 0x0.8p1005 0x0.80000000000008p1005 69 0 0x0.8p1004 0x0.80000000000008p1004 70 0 0x0.8p1003 0x0.80000000000008p1003 71 0 0x0.8p1002 0x0.80000000000008p1002 72 0 0x0.8p1001 0x0.80000000000008p1001 73 0 0x0.8p1000 0x0.80000000000008p1000 74 0 0x0.8p999 0x0.80000000000008p999 75 0 0x0.8p998 0x0.80000000000008p998 76 0 0x0.8p997 0x0.80000000000008p997 77 0 0x0.8p996 0x0.80000000000008p996 78 0 0x0.8p995 0x0.80000000000008p995 79 0 0x0.8p994 0x0.80000000000008p994 80 0 0x0.8p993 0x0.80000000000008p993 81 0 0x0.8p992 0x0.80000000000008p992 82 0 0x0.8p991 0x0.80000000000008p991 83 0 0x0.8p990 0x0.80000000000008p990 84 0 0x0.8p989 0x0.80000000000008p989 85 0 0x0.8p988 0x0.80000000000008p988 86 0 0x0.8p987 0x0.80000000000008p987 87 0 0x0.8p986 0x0.80000000000008p986 88 0 0x0.8p985 0x0.80000000000008p985 89 0 0x0.8p984 0x0.80000000000008p984 90 0 0x0.8p983 0x0.80000000000008p983 91 0 0x0.8p982 0x0.80000000000008p982 92 0 0x0.8p981 0x0.80000000000008p981 93 0 0x0.8p980 0x0.80000000000008p980 94 0 0x0.8p979 0x0.80000000000008p979 95 0 0x0.8p978 0x0.80000000000008p978 96 0 0x0.8p977 0x0.80000000000008p977 97 0 0x0.8p976 0x0.80000000000008p976 98 0 0x0.8p975 0x0.80000000000008p975 99 0 0x0.8p974 0x0.80000000000008p974 100 0 0x0.8p973 0x0.80000000000008p973 101 0 0x0.8p972 0x0.80000000000008p972 102 0 0x0.8p971 0x0.80000000000008p971 103 0 0x0.8p970 0x0.80000000000008p970 104 0 0x0.8p969 0x0.80000000000008p969 105 0 0x0.8p968 0x0.80000000000008p968 106 0 0x0.8p967 0x0.80000000000008p967 107 0 0x0.8p966 0x0.80000000000008p966 One thing that is clear is that real_isdenormal is never true for these (but then a question is if flush_denormals_to_zero actually works). Jakub
On Fri, Nov 04, 2022 at 08:14:23PM +0100, Jakub Jelinek via Gccpatches wrote: > One thing that is clear is that real_isdenormal is never true for these > (but then a question is if flush_denormals_to_zero actually works). So, after some more investigation, I think that is actually the case, real_isdenormal is only meaningful (will ever return true) right after round_for_format. The uses inside of real.cc are fine, real_to_target first calls round_for_format and then calls fmt>encode in which the real_isdenormal calls are done. And, round_for_format is what transforms the normalized way of expressing the number into the denormal form: /* Check the range of the exponent. If we're out of range, either underflow or overflow. */ if (REAL_EXP (r) > emax2) goto overflow; else if (REAL_EXP (r) <= emin2m1) { int diff; if (!fmt>has_denorm) { /* Don't underflow completely until we've had a chance to round. */ if (REAL_EXP (r) < emin2m1) goto underflow; } else { diff = emin2m1  REAL_EXP (r) + 1; if (diff > p2) goto underflow; /* Denormalize the significand. */ r>sig[0] = sticky_rshift_significand (r, r, diff); SET_REAL_EXP (r, REAL_EXP (r) + diff); } } But, real_to_target is one of the 4 callers of static round_for_format, another one is real_convert, but that one undoes that immediately: round_for_format (fmt, r); /* Make resulting NaN value to be qNaN. The caller has the responsibility to avoid the operation if flag_signaling_nans is on. */ if (r>cl == rvc_nan) r>signalling = 0; /* round_for_format denormalizes denormals. Undo just that part. */ if (r>cl == rvc_normal) normalize (r); and the last two are inside of encoding the IBM double double composite. So, I think everywhere in the frange you'll see normalized REAL_VALUE_TYPE and so real_isdenormal will always be false there. You need to check for REAL_EXP (r) < fmt>emin or so (and ideally only on REAL_VALUE_TYPE already real_converted to the right mode (your frange_arithmetics does that already, which is good, but say conversions and other unary ops might need it too). If I in the hack from last mail replace  fprintf (stderr, "%d %d %s %s\n", i, real_isdenormal (&result), buf, buf2); + fprintf (stderr, "%d %d %s %s\n", i, REAL_EXP (&result) < REAL_MODE_FORMAT (mode)>emin, buf, buf2); then it is 1 until: 102 1 0x0.8p971 0x0.80000000000008p971 103 1 0x0.8p970 0x0.80000000000008p970 104 1 0x0.8p969 0x0.80000000000008p969 105 0 0x0.8p968 0x0.80000000000008p968 106 0 0x0.8p967 0x0.80000000000008p967 107 0 0x0.8p966 0x0.80000000000008p966 Now, the __LDBL_MIN__ powerpc64 gcc announces is 2.00416836000897277799610805135016205e292L which is equivalent to: 0x1p969 and that is equivalent to 0x0.8p968, so I think that is exactly what we want. Jakub
On Fri, Nov 4, 2022 at 8:53 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Fri, Nov 04, 2022 at 08:14:23PM +0100, Jakub Jelinek via Gccpatches wrote: > > One thing that is clear is that real_isdenormal is never true for these > > (but then a question is if flush_denormals_to_zero actually works). > > So, after some more investigation, I think that is actually the case, > real_isdenormal is only meaningful (will ever return true) right after > round_for_format. > The uses inside of real.cc are fine, real_to_target first calls > round_for_format and then calls fmt>encode in which the real_isdenormal > calls are done. And, round_for_format is what transforms the > normalized way of expressing the number into the denormal form: > /* Check the range of the exponent. If we're out of range, > either underflow or overflow. */ > if (REAL_EXP (r) > emax2) > goto overflow; > else if (REAL_EXP (r) <= emin2m1) > { > int diff; > > if (!fmt>has_denorm) > { > /* Don't underflow completely until we've had a chance to round. */ > if (REAL_EXP (r) < emin2m1) > goto underflow; > } > else > { > diff = emin2m1  REAL_EXP (r) + 1; > if (diff > p2) > goto underflow; > > /* Denormalize the significand. */ > r>sig[0] = sticky_rshift_significand (r, r, diff); > SET_REAL_EXP (r, REAL_EXP (r) + diff); > } > } > But, real_to_target is one of the 4 callers of static round_for_format, > another one is real_convert, but that one undoes that immediately: > round_for_format (fmt, r); > > /* Make resulting NaN value to be qNaN. The caller has the > responsibility to avoid the operation if flag_signaling_nans > is on. */ > if (r>cl == rvc_nan) > r>signalling = 0; > > /* round_for_format denormalizes denormals. Undo just that part. */ > if (r>cl == rvc_normal) > normalize (r); > and the last two are inside of encoding the IBM double double composite. > So, I think everywhere in the frange you'll see normalized REAL_VALUE_TYPE > and so real_isdenormal will always be false there. > You need to check for REAL_EXP (r) < fmt>emin or so (and ideally only on > REAL_VALUE_TYPE already real_converted to the right mode (your > frange_arithmetics does that already, which is good, but say conversions > and other unary ops might need it too). Let me see if I understand you correctly... real_isdenormal is always returning false for our uses in frange? So instead of using real_isdenormal in flush_denormals_to_zero, perhaps we should be using: REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin ?? Could we perhaps make real_isdenormal always work for all values? Is that possible? Aldy
On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote: > Let me see if I understand you correctly... > > real_isdenormal is always returning false for our uses in frange? So > instead of using real_isdenormal in flush_denormals_to_zero, perhaps > we should be using: > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin > > ?? > > Could we perhaps make real_isdenormal always work for all values? Is > that possible? Yes. Or have real_isdenormal_target for the uses in real.cc which would be private to real.cc and would do what real_isdenormal currently does, and then real_isdenormal with the above definition (well, r>cl == rvc_normal && ...). Jakub
On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote: > > Let me see if I understand you correctly... > > > > real_isdenormal is always returning false for our uses in frange? So > > instead of using real_isdenormal in flush_denormals_to_zero, perhaps > > we should be using: > > > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin > > > > ?? > > > > Could we perhaps make real_isdenormal always work for all values? Is > > that possible? > > Yes. Or have real_isdenormal_target for the uses in real.cc > which would be private to real.cc and would do what real_isdenormal > currently does, and then real_isdenormal with the above definition > (well, r>cl == rvc_normal && ...). If the REAL_EXP(r)... definition works for everyone, can't we just use that? Or do you think there'd be a measurable performance impact? Aldy
On Mon, Nov 07, 2022 at 01:48:28PM +0100, Aldy Hernandez wrote: > On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote: > > > Let me see if I understand you correctly... > > > > > > real_isdenormal is always returning false for our uses in frange? So > > > instead of using real_isdenormal in flush_denormals_to_zero, perhaps > > > we should be using: > > > > > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin > > > > > > ?? > > > > > > Could we perhaps make real_isdenormal always work for all values? Is > > > that possible? > > > > Yes. Or have real_isdenormal_target for the uses in real.cc > > which would be private to real.cc and would do what real_isdenormal > > currently does, and then real_isdenormal with the above definition > > (well, r>cl == rvc_normal && ...). > > If the REAL_EXP(r)... definition works for everyone, can't we just use > that? Or do you think there'd be a measurable performance impact? It doesn't work alone. Either denormals are in normalized form, then r>cl == rvc_normal && REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin is true, or they are in denormal form, then r>cl == rvc_normal && (r>sig[SIGSZ1] & SIG_MSB) == 0 is true. You could use r>cl == rvc_normal && (REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin  (r>sig[SIGSZ1] & SIG_MSB) == 0) but that would be waste of compile time both in real.cc and outside of real.cc. Jakub
Fair enough. How's this? Tested on x8664 Linux. LAPACK regression testing as well. On Mon, Nov 7, 2022 at 1:56 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 07, 2022 at 01:48:28PM +0100, Aldy Hernandez wrote: > > On Mon, Nov 7, 2022 at 1:43 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > > > On Mon, Nov 07, 2022 at 01:35:35PM +0100, Aldy Hernandez wrote: > > > > Let me see if I understand you correctly... > > > > > > > > real_isdenormal is always returning false for our uses in frange? So > > > > instead of using real_isdenormal in flush_denormals_to_zero, perhaps > > > > we should be using: > > > > > > > > REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin > > > > > > > > ?? > > > > > > > > Could we perhaps make real_isdenormal always work for all values? Is > > > > that possible? > > > > > > Yes. Or have real_isdenormal_target for the uses in real.cc > > > which would be private to real.cc and would do what real_isdenormal > > > currently does, and then real_isdenormal with the above definition > > > (well, r>cl == rvc_normal && ...). > > > > If the REAL_EXP(r)... definition works for everyone, can't we just use > > that? Or do you think there'd be a measurable performance impact? > > It doesn't work alone. > > Either denormals are in normalized form, then > r>cl == rvc_normal && REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin > is true, or they are in denormal form, then > r>cl == rvc_normal && (r>sig[SIGSZ1] & SIG_MSB) == 0 > is true. > > You could use > r>cl == rvc_normal > && (REAL_EXP (r) < REAL_MODE_FORMAT (mode)>emin >  (r>sig[SIGSZ1] & SIG_MSB) == 0) > but that would be waste of compile time both in real.cc and outside of > real.cc. > > Jakub >
On Fri, Nov 4, 2022 at 8:14 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Oct 17, 2022 at 08:21:01AM +0200, Aldy Hernandez wrote: > > +// Set VALUE to its next real value, or INF if the operation overflows. > > + > > +inline void > > +frange_nextafter (enum machine_mode mode, > > + REAL_VALUE_TYPE &value, > > + const REAL_VALUE_TYPE &inf) > > +{ > > + const real_format *fmt = REAL_MODE_FORMAT (mode); > > + REAL_VALUE_TYPE tmp; > > + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); > > + if (overflow) > > + value = inf; > > + else > > + value = tmp; > > Please change the above 5 lines to just > real_nextafter (&tmp, fmt, &value, &inf); > value = tmp; Done. As suggested upthread, I have also adjusted update_nan_sign() to drop the NAN sign to VARYING if both operands are NAN. As an optimization I keep the sign if both operands are NAN and have the same sign. How's this? Tested on x8664 Linux. LAPACK testing as well. Aldy
On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote: > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez <aldyh@redhat.com> > Date: Mon, 7 Nov 2022 14:18:57 +0100 > Subject: [PATCH] Provide normalized and denormal format version of > real_isdenormal. > > Implement real_isdenormal_target() to be used within real.cc where the > argument is known to be in denormal format. Rewrite real_isdenormal() > for use outside of real.cc where the argument is known to be > normalized. > > gcc/ChangeLog: > > * real.cc (real_isdenormal_target): New. > (encode_ieee_single): Use real_isdenormal_target. > (encode_ieee_double): Same. > (encode_ieee_extended): Same. > (encode_ieee_quad): Same. > (encode_ieee_half): Same. > (encode_arm_bfloat_half): Same. > * valuerange.cc (frange::flush_denormals_to_zero): Same. > * real.h (real_isdenormal): Rewrite to look at mode. I'd make real_isdenormal_target static inline bool rather than inline bool, it is only defined in real.cc, so there is no point exporting it. Though, as you've added the mode argument, the real.cc inline could very well also be called real_isdenormal too, it wouldn't be a redeclaration or ODR violation. Jakub
On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > As suggested upthread, I have also adjusted update_nan_sign() to drop > the NAN sign to VARYING if both operands are NAN. As an optimization > I keep the sign if both operands are NAN and have the same sign. For NaNs this still relies on something IEEE754 doesn't guarantee, as I cited, after a binary operation the sign bit of the NaN is unspecified, whether there is one NaN operand or two. It might be that all CPUs handle it the way you've implemented (that for one NaN operand the sign of NaN result will be the same as that NaN operand and for two it will be the sign of one of the two NaNs operands, never something else), but I think we'd need to check more than one implementation for that (I've only tried x86_64 and thus SSE behavior in it), so one would need to test i387 long double behavior too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... The guarantee given by IEEE754 is only for those copy, negate, abs, copySign operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, __builtin_copysign*. Otherwise LGTM (but would be nice to get into GCC13 not just +, but also , *, /, sqrt at least). Jakub
On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote: > > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001 > > From: Aldy Hernandez <aldyh@redhat.com> > > Date: Mon, 7 Nov 2022 14:18:57 +0100 > > Subject: [PATCH] Provide normalized and denormal format version of > > real_isdenormal. > > > > Implement real_isdenormal_target() to be used within real.cc where the > > argument is known to be in denormal format. Rewrite real_isdenormal() > > for use outside of real.cc where the argument is known to be > > normalized. > > > > gcc/ChangeLog: > > > > * real.cc (real_isdenormal_target): New. > > (encode_ieee_single): Use real_isdenormal_target. > > (encode_ieee_double): Same. > > (encode_ieee_extended): Same. > > (encode_ieee_quad): Same. > > (encode_ieee_half): Same. > > (encode_arm_bfloat_half): Same. > > * valuerange.cc (frange::flush_denormals_to_zero): Same. > > * real.h (real_isdenormal): Rewrite to look at mode. > > I'd make real_isdenormal_target static inline bool > rather than inline bool, it is only defined in real.cc, so there is > no point exporting it. Huh. I thought inline alone would inhibit the exporting. Thanks. > Though, as you've added the mode argument, the real.cc inline > could very well also be called real_isdenormal too, it wouldn't be > a redeclaration or ODR violation. Great, even better. OK pending tests? Aldy
On Tue, Nov 8, 2022 at 12:20 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > > As suggested upthread, I have also adjusted update_nan_sign() to drop > > the NAN sign to VARYING if both operands are NAN. As an optimization > > I keep the sign if both operands are NAN and have the same sign. > > For NaNs this still relies on something IEEE754 doesn't guarantee, > as I cited, after a binary operation the sign bit of the NaN is > unspecified, whether there is one NaN operand or two. > It might be that all CPUs handle it the way you've implemented > (that for one NaN operand the sign of NaN result will be the same > as that NaN operand and for two it will be the sign of one of the two > NaNs operands, never something else), but I think we'd need to check > more than one implementation for that (I've only tried x86_64 and thus > SSE behavior in it), so one would need to test i387 long double behavior > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, > __builtin_copysign*. Ughh, that's unfortunate. OK, I've added a big note. > > Otherwise LGTM (but would be nice to get into GCC13 not just > +, but also , *, /, sqrt at least). Minus is trivial as we can implement it with a negate and plus. I have a patch queued up for that. The rest require a bit more thought, though perhaps with what we have so far can serve as a base. I'll look into it. Attached is the patch I'm retesting. Thanks for your patience, and copious help here. Aldy
On Tue, Nov 08, 2022 at 01:47:58PM +0100, Aldy Hernandez wrote: > On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote: > > > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001 > > > From: Aldy Hernandez <aldyh@redhat.com> > > > Date: Mon, 7 Nov 2022 14:18:57 +0100 > > > Subject: [PATCH] Provide normalized and denormal format version of > > > real_isdenormal. > > > > > > Implement real_isdenormal_target() to be used within real.cc where the > > > argument is known to be in denormal format. Rewrite real_isdenormal() > > > for use outside of real.cc where the argument is known to be > > > normalized. > > > > > > gcc/ChangeLog: > > > > > > * real.cc (real_isdenormal_target): New. > > > (encode_ieee_single): Use real_isdenormal_target. > > > (encode_ieee_double): Same. > > > (encode_ieee_extended): Same. > > > (encode_ieee_quad): Same. > > > (encode_ieee_half): Same. > > > (encode_arm_bfloat_half): Same. > > > * valuerange.cc (frange::flush_denormals_to_zero): Same. > > > * real.h (real_isdenormal): Rewrite to look at mode. > > > > I'd make real_isdenormal_target static inline bool > > rather than inline bool, it is only defined in real.cc, so there is > > no point exporting it. > > Huh. I thought inline alone would inhibit the exporting. Thanks. That is what happens with C99 inline (unless there is extern for the decl), but C++ inline is different. It isn't guaranteed to be exported, but with fkeepinlinefunctions or if you say take address of the inline in a way that can't be optimized back into a call to the inline (or even just call it with O0), it is exported. > > > Though, as you've added the mode argument, the real.cc inline > > could very well also be called real_isdenormal too, it wouldn't be > > a redeclaration or ODR violation. > > Great, even better. > > OK pending tests? > Aldy > From c3ca1d606bfb22bf4f8bc7ac0ce561bd6afc3368 Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez <aldyh@redhat.com> > Date: Mon, 7 Nov 2022 14:18:57 +0100 > Subject: [PATCH] Provide normalized and denormal format version of > real_isdenormal. > > Implement a variant of real_isdenormal() to be used within real.cc > where the argument is known to be in denormal format. Rewrite > real_isdenormal() for use outside of real.cc where the argument is > known to be normalized. > > gcc/ChangeLog: > > * real.cc (real_isdenormal): New. > * real.h (real_isdenormal): Add mode argument. Rewrite for > normalized values. > * valuerange.cc (frange::flush_denormals_to_zero): Pass mode to > real_isdenormal. >  > gcc/real.cc  10 ++++++++++ > gcc/real.h  7 ++++ > gcc/valuerange.cc  5 +++ > 3 files changed, 17 insertions(+), 5 deletions() > > diff git a/gcc/real.cc b/gcc/real.cc > index aae7c335d59..028aad95ec4 100644 >  a/gcc/real.cc > +++ b/gcc/real.cc > @@ 111,6 +111,16 @@ static const REAL_VALUE_TYPE * real_digit (int); > static void times_pten (REAL_VALUE_TYPE *, int); > > static void round_for_format (const struct real_format *, REAL_VALUE_TYPE *); > + > +/* Determine whether a floatingpoint value X is a denormal. R is > + expected to be in denormal form, so this function is only > + meaningful after a call to round_for_format. */ > + > +static inline bool > +real_isdenormal (const REAL_VALUE_TYPE *r) > +{ > + return (r>sig[SIGSZ1] & SIG_MSB) == 0; I would probably keep the r>cl == rvc_normal in here too. I know the code in real.cc didn't do it before, but what r>sig is for the rvc_zero/rvc_inf is unclear. It is true that get_zero/get_canonical_?nan/get_inf clear the whole sig, but not really sure if we guarantee that everywhere. The real.cc uses were like: bool denormal = ...; at the start of the function and then switch (...) { ... case rvc_normal: if (denormal) ... } so another even better possibility would be to use your simple real.cc (real_isdenormal) and drop all the denormal variables, so:  bool denormal = ...; switch (...) { ... case rvc_normal:  if (denormal) + if (real_isdenormal (r)) ... } Otherwise LGTM. Jakub
On Tue, Nov 08, 2022 at 02:06:58PM +0100, Aldy Hernandez wrote: > + gcc_checking_assert (!r.nan_signbit_p (sign1)); > + if (op1_nan && op2_nan) > + { > + // If boths signs agree, we could use that sign, but IEEE754 > + // does not guarantee this for a binary operator. The x86_64 > + // architure does keep the common known sign, but further tests > + // are needed to see if other architectures do the same (i387 > + // long double, ARM/aarch64, PowerPC, s390,{,x}, RSICV, etc). s/RSICV/RISCV/ > + // In the meantime, keep sign VARYING. > + ; > + } > + else if (op1_nan) > + { > + if (op1.nan_signbit_p (sign1)) > + r.update_nan (sign1); > + } > + else if (op2_nan) > + { > + if (op2.nan_signbit_p (sign2)) > + r.update_nan (sign2); > + } Well, these cases also aren't guaranteed for binary operator. I think a conforming implementation can say copy the NaN operand to result and toggle the sign. Or, if the operand would be a sNaN, it must turn it into a qNaN (don't remember right now if there are requirements on what it can do with the mantissa which needs to change for the sNaN > qNaN difference at least, but whether it can just generate a canonical qNaN or needs to preserve at least some bits), but could e.g. clear or toggle the sign of the NaN as well. Whether there are any such implementations or not is a question. For the single qNaN operand case, it would surprise me if anybody bothered to tweak the sign bit in any way, just copying the input seems simpler to me, but for the sNaN > qNaN conversion it wouldn't surprise me that much. Jakub
On Tue, Nov 8, 2022 at 2:25 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 02:06:58PM +0100, Aldy Hernandez wrote: > > + gcc_checking_assert (!r.nan_signbit_p (sign1)); > > + if (op1_nan && op2_nan) > > + { > > + // If boths signs agree, we could use that sign, but IEEE754 > > + // does not guarantee this for a binary operator. The x86_64 > > + // architure does keep the common known sign, but further tests > > + // are needed to see if other architectures do the same (i387 > > + // long double, ARM/aarch64, PowerPC, s390,{,x}, RSICV, etc). > > s/RSICV/RISCV/ > > > + // In the meantime, keep sign VARYING. > > + ; > > + } > > + else if (op1_nan) > > + { > > + if (op1.nan_signbit_p (sign1)) > > + r.update_nan (sign1); > > + } > > + else if (op2_nan) > > + { > > + if (op2.nan_signbit_p (sign2)) > > + r.update_nan (sign2); > > + } > > Well, these cases also aren't guaranteed for binary operator. > I think a conforming implementation can say copy the NaN operand > to result and toggle the sign. Or, if the operand would be a sNaN, > it must turn it into a qNaN (don't remember right now if there are > requirements on what it can do with the mantissa which needs to change > for the sNaN > qNaN difference at least, but whether it can just > generate a canonical qNaN or needs to preserve at least some bits), > but could e.g. clear or toggle the sign of the NaN as well. > Whether there are any such implementations or not is a question. > For the single qNaN operand case, it would surprise me if anybody > bothered to tweak the sign bit in any way, just copying the input > seems simpler to me, but for the sNaN > qNaN conversion it wouldn't > surprise me that much. Well, perhaps we should just nuke update_nan_sign() altogether, and always keep the sign varying? inline bool propagate_nans (frange &r, const frange &op1, const frange &op2) { if (op1.known_isnan ()  op2.known_isnan ()) { r.set_nan (op1.type ()); return true; } return false; } I'm fine either way. The less code the better :). Aldy
On Tue, Nov 08, 2022 at 02:47:35PM +0100, Aldy Hernandez wrote: > Well, perhaps we should just nuke update_nan_sign() altogether, and > always keep the sign varying? > > inline bool > propagate_nans (frange &r, const frange &op1, const frange &op2) > { > if (op1.known_isnan ()  op2.known_isnan ()) > { > r.set_nan (op1.type ()); > return true; > } > return false; > } > > I'm fine either way. The less code the better :). Yes, but you had 2 callers, so something needs to be done also if in foperator_plus::fold_range. Jakub
On Tue, Nov 8, 2022 at 2:15 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 01:47:58PM +0100, Aldy Hernandez wrote: > > On Tue, Nov 8, 2022 at 12:07 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > > > On Mon, Nov 07, 2022 at 04:38:29PM +0100, Aldy Hernandez wrote: > > > > From d214bcdff2cb90ad1eb808d29bda6fb98d510b4c Mon Sep 17 00:00:00 2001 > > > > From: Aldy Hernandez <aldyh@redhat.com> > > > > Date: Mon, 7 Nov 2022 14:18:57 +0100 > > > > Subject: [PATCH] Provide normalized and denormal format version of > > > > real_isdenormal. > > > > > > > > Implement real_isdenormal_target() to be used within real.cc where the > > > > argument is known to be in denormal format. Rewrite real_isdenormal() > > > > for use outside of real.cc where the argument is known to be > > > > normalized. > > > > > > > > gcc/ChangeLog: > > > > > > > > * real.cc (real_isdenormal_target): New. > > > > (encode_ieee_single): Use real_isdenormal_target. > > > > (encode_ieee_double): Same. > > > > (encode_ieee_extended): Same. > > > > (encode_ieee_quad): Same. > > > > (encode_ieee_half): Same. > > > > (encode_arm_bfloat_half): Same. > > > > * valuerange.cc (frange::flush_denormals_to_zero): Same. > > > > * real.h (real_isdenormal): Rewrite to look at mode. > > > > > > I'd make real_isdenormal_target static inline bool > > > rather than inline bool, it is only defined in real.cc, so there is > > > no point exporting it. > > > > Huh. I thought inline alone would inhibit the exporting. Thanks. > > That is what happens with C99 inline (unless there is extern for the decl), > but C++ inline is different. It isn't guaranteed to be exported, but > with fkeepinlinefunctions or if you say take address of the inline > in a way that can't be optimized back into a call to the inline (or even > just call it with O0), it is exported. > > > > > Though, as you've added the mode argument, the real.cc inline > > > could very well also be called real_isdenormal too, it wouldn't be > > > a redeclaration or ODR violation. > > > > Great, even better. > > > > OK pending tests? > > Aldy > > > From c3ca1d606bfb22bf4f8bc7ac0ce561bd6afc3368 Mon Sep 17 00:00:00 2001 > > From: Aldy Hernandez <aldyh@redhat.com> > > Date: Mon, 7 Nov 2022 14:18:57 +0100 > > Subject: [PATCH] Provide normalized and denormal format version of > > real_isdenormal. > > > > Implement a variant of real_isdenormal() to be used within real.cc > > where the argument is known to be in denormal format. Rewrite > > real_isdenormal() for use outside of real.cc where the argument is > > known to be normalized. > > > > gcc/ChangeLog: > > > > * real.cc (real_isdenormal): New. > > * real.h (real_isdenormal): Add mode argument. Rewrite for > > normalized values. > > * valuerange.cc (frange::flush_denormals_to_zero): Pass mode to > > real_isdenormal. > >  > > gcc/real.cc  10 ++++++++++ > > gcc/real.h  7 ++++ > > gcc/valuerange.cc  5 +++ > > 3 files changed, 17 insertions(+), 5 deletions() > > > > diff git a/gcc/real.cc b/gcc/real.cc > > index aae7c335d59..028aad95ec4 100644 > >  a/gcc/real.cc > > +++ b/gcc/real.cc > > @@ 111,6 +111,16 @@ static const REAL_VALUE_TYPE * real_digit (int); > > static void times_pten (REAL_VALUE_TYPE *, int); > > > > static void round_for_format (const struct real_format *, REAL_VALUE_TYPE *); > > + > > +/* Determine whether a floatingpoint value X is a denormal. R is > > + expected to be in denormal form, so this function is only > > + meaningful after a call to round_for_format. */ > > + > > +static inline bool > > +real_isdenormal (const REAL_VALUE_TYPE *r) > > +{ > > + return (r>sig[SIGSZ1] & SIG_MSB) == 0; > > I would probably keep the r>cl == rvc_normal in here too. > I know the code in real.cc didn't do it before, but what > r>sig is for the rvc_zero/rvc_inf is unclear. It is true > that get_zero/get_canonical_?nan/get_inf clear the whole sig, > but not really sure if we guarantee that everywhere. > The real.cc uses were like: > bool denormal = ...; > at the start of the function and then > switch (...) > { > ... > case rvc_normal: > if (denormal) > ... > } > so another even better possibility would be to use your simple > real.cc (real_isdenormal) and drop all the denormal variables, so: >  bool denormal = ...; > switch (...) > { > ... > case rvc_normal: >  if (denormal) > + if (real_isdenormal (r)) > ... > } Sure. Attached patch in testing. Aldy
On Tue, Nov 08, 2022 at 03:02:40PM +0100, Aldy Hernandez wrote: > From d02ce8eaf16d2fc6db6472268fd962e09c2fd81e Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez <aldyh@redhat.com> > Date: Mon, 7 Nov 2022 14:18:57 +0100 > Subject: [PATCH] Provide normalized and denormal format version of > real_isdenormal. > > Implement a variant of real_isdenormal() to be used within real.cc > where the argument is known to be in denormal format. Rewrite > real_isdenormal() for use outside of real.cc where the argument is > known to be normalized. > > gcc/ChangeLog: > > * real.cc (real_isdenormal): New. > (encode_ieee_single): Call real_isdenormal. > (encode_ieee_double): Same. > (encode_ieee_extended): Same. > (encode_ieee_quad): Same. > (encode_ieee_half): Same. > (encode_arm_bfloat_half): Same. > * real.h (real_isdenormal): Add mode argument. Rewrite for > normalized values. > * valuerange.cc (frange::flush_denormals_to_zero): Pass mode to > real_isdenormal. LGTM, thanks. Jakub
On Tue, Nov 8, 2022 at 2:50 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 02:47:35PM +0100, Aldy Hernandez wrote: > > Well, perhaps we should just nuke update_nan_sign() altogether, and > > always keep the sign varying? > > > > inline bool > > propagate_nans (frange &r, const frange &op1, const frange &op2) > > { > > if (op1.known_isnan ()  op2.known_isnan ()) > > { > > r.set_nan (op1.type ()); > > return true; > > } > > return false; > > } > > > > I'm fine either way. The less code the better :). > > Yes, but you had 2 callers, so something needs to be done also if > in foperator_plus::fold_range. We can also remove the update_nan_sign() in the other call because the r.set() before it sets a default NAN (with a varying sign). Attached patch in testing. Aldy
On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote: > +// If either operand is a NAN, set R to the combination of both NANs > +// signwise and return TRUE. This comment doesn't describe what it does now. If either operand is a NAN, set R to NAN with unspecified sign bit and return TRUE. ? Other than this LGTM. Jakub
On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote: > > +// If either operand is a NAN, set R to the combination of both NANs > > +// signwise and return TRUE. > > This comment doesn't describe what it does now. > If either operand is a NAN, set R to NAN with unspecified sign bit and return > TRUE. > ? OMG, I suck! // If either operand is a NAN, set R to NAN and return TRUE. Tests ongoing :). Aldy
On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gccpatches <gccpatches@gcc.gnu.org> wrote: > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > > As suggested upthread, I have also adjusted update_nan_sign() to drop > > the NAN sign to VARYING if both operands are NAN. As an optimization > > I keep the sign if both operands are NAN and have the same sign. > > For NaNs this still relies on something IEEE754 doesn't guarantee, > as I cited, after a binary operation the sign bit of the NaN is > unspecified, whether there is one NaN operand or two. > It might be that all CPUs handle it the way you've implemented > (that for one NaN operand the sign of NaN result will be the same > as that NaN operand and for two it will be the sign of one of the two > NaNs operands, never something else), but I think we'd need to check > more than one implementation for that (I've only tried x86_64 and thus > SSE behavior in it), so one would need to test i387 long double behavior > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, > __builtin_copysign*. FWIW, RISCV canonicalizes NaNs by clearing the sign bit; the signs of the input NaNs do not factor in. > > Otherwise LGTM (but would be nice to get into GCC13 not just > +, but also , *, /, sqrt at least). > > Jakub >
On Tue, Nov 08, 2022 at 09:44:40AM 0800, Andrew Waterman wrote: > On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gccpatches > <gccpatches@gcc.gnu.org> wrote: > > > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > > > As suggested upthread, I have also adjusted update_nan_sign() to drop > > > the NAN sign to VARYING if both operands are NAN. As an optimization > > > I keep the sign if both operands are NAN and have the same sign. > > > > For NaNs this still relies on something IEEE754 doesn't guarantee, > > as I cited, after a binary operation the sign bit of the NaN is > > unspecified, whether there is one NaN operand or two. > > It might be that all CPUs handle it the way you've implemented > > (that for one NaN operand the sign of NaN result will be the same > > as that NaN operand and for two it will be the sign of one of the two > > NaNs operands, never something else), but I think we'd need to check > > more than one implementation for that (I've only tried x86_64 and thus > > SSE behavior in it), so one would need to test i387 long double behavior > > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... > > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign > > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, > > __builtin_copysign*. > > FWIW, RISCV canonicalizes NaNs by clearing the sign bit; the signs of > the input NaNs do not factor in. Just for binary operations and some unary, or also the ones that IEEE754 spells out (moves, negations, absolute value and copysign)? Jakub
On Tue, Nov 8, 2022 at 10:11 AM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 09:44:40AM 0800, Andrew Waterman wrote: > > On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gccpatches > > <gccpatches@gcc.gnu.org> wrote: > > > > > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote: > > > > As suggested upthread, I have also adjusted update_nan_sign() to drop > > > > the NAN sign to VARYING if both operands are NAN. As an optimization > > > > I keep the sign if both operands are NAN and have the same sign. > > > > > > For NaNs this still relies on something IEEE754 doesn't guarantee, > > > as I cited, after a binary operation the sign bit of the NaN is > > > unspecified, whether there is one NaN operand or two. > > > It might be that all CPUs handle it the way you've implemented > > > (that for one NaN operand the sign of NaN result will be the same > > > as that NaN operand and for two it will be the sign of one of the two > > > NaNs operands, never something else), but I think we'd need to check > > > more than one implementation for that (I've only tried x86_64 and thus > > > SSE behavior in it), so one would need to test i387 long double behavior > > > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ... > > > The guarantee given by IEEE754 is only for those copy, negate, abs, copySign > > > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*, > > > __builtin_copysign*. > > > > FWIW, RISCV canonicalizes NaNs by clearing the sign bit; the signs of > > the input NaNs do not factor in. > > Just for binary operations and some unary, or also the ones that > IEEE754 spells out (moves, negations, absolute value and copysign)? I should've been more specific in my earlier email: I was referring to the arithmetic operators. Copysign and friends do not canonicalize NaNs. > > Jakub >
Sigh, one more thing. There are further possibilities for a NAN result, even if the operands are !NAN and the result from frange_arithmetic is free of NANs. Adding different signed infinities. For example, [INF,+INF] + [INF,+INF] has the possibility of adding INF and +INF, which is a NAN. Since we end up calling frange arithmetic on the lower bounds and then on the upper bounds, we miss this, and mistakenly think we're free of NANs. I have a patch in testing, but FYI, in case anyone notices this before I get around to it tomorrow. Aldy On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote: > > +// If either operand is a NAN, set R to the combination of both NANs > > +// signwise and return TRUE. > > This comment doesn't describe what it does now. > If either operand is a NAN, set R to NAN with unspecified sign bit and return > TRUE. > ? > > Other than this LGTM. > > Jakub >
This patch fixes the oversight. Tested on x8664 Linux. Pushed. On Wed, Nov 9, 2022 at 12:05 AM Aldy Hernandez <aldyh@redhat.com> wrote: > > Sigh, one more thing. > > There are further possibilities for a NAN result, even if the operands > are !NAN and the result from frange_arithmetic is free of NANs. > Adding different signed infinities. > > For example, [INF,+INF] + [INF,+INF] has the possibility of adding > INF and +INF, which is a NAN. Since we end up calling frange > arithmetic on the lower bounds and then on the upper bounds, we miss > this, and mistakenly think we're free of NANs. > > I have a patch in testing, but FYI, in case anyone notices this before > I get around to it tomorrow. > > Aldy > > On Tue, Nov 8, 2022 at 3:11 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Tue, Nov 08, 2022 at 03:06:53PM +0100, Aldy Hernandez wrote: > > > +// If either operand is a NAN, set R to the combination of both NANs > > > +// signwise and return TRUE. > > > > This comment doesn't describe what it does now. > > If either operand is a NAN, set R to NAN with unspecified sign bit and return > > TRUE. > > ? > > > > Other than this LGTM. > > > > Jakub > >
diff git a/gcc/rangeopfloat.cc b/gcc/rangeopfloat.cc index 23e0f5ef4e2..a967c4da393 100644  a/gcc/rangeopfloat.cc +++ b/gcc/rangeopfloat.cc @@ 200,6 +200,124 @@ frelop_early_resolve (irange &r, tree type, && relop_early_resolve (r, type, op1, op2, rel, my_rel)); } +// If R contains a NAN of unknown sign, update the NAN's signbit +// depending on two operands. + +inline void +update_nan_sign (frange &r, const frange &op1, const frange &op2) +{ + if (!r.maybe_isnan ()) + return; + + bool op1_nan = op1.maybe_isnan (); + bool op2_nan = op2.maybe_isnan (); + bool sign1, sign2; + + gcc_checking_assert (!r.nan_signbit_p (sign1)); + if (op1_nan && op2_nan) + { + if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2)) + r.update_nan (sign1  sign2); + } + else if (op1_nan) + { + if (op1.nan_signbit_p (sign1)) + r.update_nan (sign1); + } + else if (op2_nan) + { + if (op2.nan_signbit_p (sign2)) + r.update_nan (sign2); + } +} + +// If either operand is a NAN, set R to the combination of both NANs +// signwise and return TRUE. + +inline bool +propagate_nans (frange &r, const frange &op1, const frange &op2) +{ + if (op1.known_isnan ()  op2.known_isnan ()) + { + r.set_nan (op1.type ()); + update_nan_sign (r, op1, op2); + return true; + } + return false; +} + +// Set VALUE to its next real value, or INF if the operation overflows. + +inline void +frange_nextafter (enum machine_mode mode, + REAL_VALUE_TYPE &value, + const REAL_VALUE_TYPE &inf) +{ + const real_format *fmt = REAL_MODE_FORMAT (mode); + REAL_VALUE_TYPE tmp; + bool overflow = real_nextafter (&tmp, fmt, &value, &inf); + if (overflow) + value = inf; + else + value = tmp; +} + +// Like real_arithmetic, but round the result to INF if the operation +// produced inexact results. +// +// ?? There is still one problematic case, i387. With +// fexcessprecision=standard we perform most SF/DFmode arithmetic in +// XFmode (long_double_type_node), so that case is OK. But without +// mfpmath=sse, all the SF/DFmode computations are in XFmode +// precision (64bit mantissa) and only occassionally rounded to +// SF/DFmode (when storing into memory from the 387 stack). Maybe +// this is ok as well though it is just occassionally more precise. ?? + +static void +frange_arithmetic (enum tree_code code, tree type, + REAL_VALUE_TYPE &result, + const REAL_VALUE_TYPE &op1, + const REAL_VALUE_TYPE &op2, + const REAL_VALUE_TYPE &inf) +{ + REAL_VALUE_TYPE value; + enum machine_mode mode = TYPE_MODE (type); + bool mode_composite = MODE_COMPOSITE_P (mode); + + bool inexact = real_arithmetic (&value, code, &op1, &op2); + real_convert (&result, mode, &value); + + // If real_convert above has rounded an inexact value to towards + // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp + // later (real_nextafter). + bool rounding = (flag_rounding_math + && (real_isneg (&inf) + ? real_less (&result, &value) + : !real_less (&value, &result))); + + // Be extra careful if there may be discrepancies between the + // compile and runtime results. + if ((rounding  mode_composite) + && (inexact  !real_identical (&result, &value))) + { + if (mode_composite) + { + bool denormal = (result.sig[SIGSZ1] & SIG_MSB) == 0; + if (denormal) + { + REAL_VALUE_TYPE tmp; + real_convert (&tmp, DFmode, &value); + frange_nextafter (DFmode, tmp, inf); + real_convert (&result, mode, &tmp); + } + else + frange_nextafter (mode, result, inf); + } + else + frange_nextafter (mode, result, inf); + } +} + // Crop R to [INF, MAX] where MAX is the maximum representable number // for TYPE. @@ 1620,6 +1738,58 @@ foperator_unordered_equal::op1_range (frange &r, tree type, return true; } +class foperator_plus : public range_operator_float +{ + using range_operator_float::fold_range; + +public: + bool fold_range (frange &r, tree type, + const frange &lh, + const frange &rh, + relation_kind rel = VREL_VARYING) const final override; +} fop_plus; + +bool +foperator_plus::fold_range (frange &r, tree type, + const frange &op1, const frange &op2, + relation_kind) const +{ + if (empty_range_varying (r, type, op1, op2)) + return true; + if (propagate_nans (r, op1, op2)) + return true; + + REAL_VALUE_TYPE lb, ub; + frange_arithmetic (PLUS_EXPR, type, lb, + op1.lower_bound (), op2.lower_bound (), dconstninf); + frange_arithmetic (PLUS_EXPR, type, ub, + op1.upper_bound (), op2.upper_bound (), dconstinf); + + // Handle possible NANs by saturating to the appropriate INF if only + // one end is a NAN. If both ends are a NAN, just return a NAN. + bool lb_nan = real_isnan (&lb); + bool ub_nan = real_isnan (&ub); + if (lb_nan && ub_nan) + { + r.set_nan (type); + return true; + } + if (lb_nan) + lb = dconstninf; + else if (ub_nan) + ub = dconstinf; + + // The setter sets NAN by default for HONOR_NANS. + r.set (type, lb, ub); + + if (lb_nan  ub_nan) + update_nan_sign (r, op1, op2); + else if (!op1.maybe_isnan () && !op2.maybe_isnan ()) + r.clear_nan (); + + return true; +} + // Instantiate a range_op_table for floating point operations. static floating_op_table global_floating_table; @@ 1652,6 +1822,7 @@ floating_op_table::floating_op_table () set (ABS_EXPR, fop_abs); set (NEGATE_EXPR, fop_negate); + set (PLUS_EXPR, fop_plus); } // Return a pointer to the range_operator_float instance, if there is diff git a/gcc/testsuite/gcc.dg/treessa/vrpfloatplus.c b/gcc/testsuite/gcc.dg/treessa/vrpfloatplus.c new file mode 100644 index 00000000000..3739ea4e810  /dev/null +++ b/gcc/testsuite/gcc.dg/treessa/vrpfloatplus.c @@ 0,0 +1,21 @@ +// { dgdo compile } +// { dgoptions "O2 fnotreefre fnotreedominatoropts fnothreadjumps fdumptreevrp2" } + +double BG_SplineLength () +{ + double lastPoint; + double i; + + for (i = 0.01;i<=1;i+=0.1f) + if (!(i != 0.0)) + { + lastPoint = i; + } + else + { + lastPoint = 2; + } + return lastPoint; +} + +// { dgfinal { scantreedumptimes "return 2\\.0e" 1 "vrp2" } }