powerpc: Fix logbl on power7 [BZ# 21280]
Commit Message
1. Fix the results for negative subnormals by ignoring the signal when
normalizing the value.
2. Fix the output when the high part is a power of 2 and the low part
is a nonzero number with opposite sign. This fix is based on commit
380bd0fd2418f8988217de950f8b8ff18af0cb2b.
After applying this patch, logbl() tests pass cleanly on POWER >= 7.
Tested on powerpc, powerpc64 and powerpc64le
2017-03-20 Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
[BZ #21280]
* sysdeps/powerpc/power7/fpu/s_logbl.c (__logbl): Ignore the
signal of subnormals and adjust the exponent of power of 2 down
when low part has opposite sign.
---
sysdeps/powerpc/power7/fpu/s_logbl.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
Comments
On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
> 1. Fix the results for negative subnormals by ignoring the signal when
> normalizing the value.
> 2. Fix the output when the high part is a power of 2 and the low part
> is a nonzero number with opposite sign. This fix is based on commit
> 380bd0fd2418f8988217de950f8b8ff18af0cb2b.
It is not a blocker, but based on the complexity of the change and the
optimization real case usage, I wonder if it would be better to just remove
it and use the default implementation instead.
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>> 1. Fix the results for negative subnormals by ignoring the signal when
>> normalizing the value.
>> 2. Fix the output when the high part is a power of 2 and the low part
>> is a nonzero number with opposite sign. This fix is based on commit
>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b.
>
> It is not a blocker, but based on the complexity of the change and the
> optimization real case usage, I wonder if it would be better to just remove
> it and use the default implementation instead.
I tried to do that before fixing these, but the performance difference is
still around 2x.
On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>
>> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>>> 1. Fix the results for negative subnormals by ignoring the signal when
>>> normalizing the value.
>>> 2. Fix the output when the high part is a power of 2 and the low part
>>> is a nonzero number with opposite sign. This fix is based on commit
>>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b.
>>
>> It is not a blocker, but based on the complexity of the change and the
>> optimization real case usage, I wonder if it would be better to just remove
>> it and use the default implementation instead.
>
> I tried to do that before fixing these, but the performance difference is
> still around 2x.
I would expect it, but the question is do we really care about the performance
of logbl-ibm128 to maintain a separated implementation for power7+ only?
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote:
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>>
>>> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>>>> 1. Fix the results for negative subnormals by ignoring the signal when
>>>> normalizing the value.
>>>> 2. Fix the output when the high part is a power of 2 and the low part
>>>> is a nonzero number with opposite sign. This fix is based on commit
>>>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b.
>>>
>>> It is not a blocker, but based on the complexity of the change and the
>>> optimization real case usage, I wonder if it would be better to just remove
>>> it and use the default implementation instead.
>>
>> I tried to do that before fixing these, but the performance difference is
>> still around 2x.
>
> I would expect it, but the question is do we really care about the performance
> of logbl-ibm128 to maintain a separated implementation for power7+ only?
Yes, we still need it.
Considering your proposal was not a blocker, I plan to merge this fix.
Anyway, I agree in re-evaluating the removal of this optimization
in a few releases.
On Tue, 2017-03-21 at 17:18 -0300, Adhemerval Zanella wrote:
>
> On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote:
> > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> >
> >> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
> >>> 1. Fix the results for negative subnormals by ignoring the signal when
> >>> normalizing the value.
> >>> 2. Fix the output when the high part is a power of 2 and the low part
> >>> is a nonzero number with opposite sign. This fix is based on commit
> >>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b.
> >>
> >> It is not a blocker, but based on the complexity of the change and the
> >> optimization real case usage, I wonder if it would be better to just remove
> >> it and use the default implementation instead.
> >
> > I tried to do that before fixing these, but the performance difference is
> > still around 2x.
>
> I would expect it, but the question is do we really care about the performance
> of logbl-ibm128 to maintain a separated implementation for power7+ only?
>
Yes we still care. The power7 optimization avoids LHS by performing the
mask and convert in VSX. and applies to power8, power9 for IBM long
double..
It will be some time before _float128 is fully deployed and until then
customers will notice, if this IBM long double optimization went
missing.
On 28/04/2017 15:26, Steven Munroe wrote:
> On Tue, 2017-03-21 at 17:18 -0300, Adhemerval Zanella wrote:
>>
>> On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote:
>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>>>
>>>> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>>>>> 1. Fix the results for negative subnormals by ignoring the signal when
>>>>> normalizing the value.
>>>>> 2. Fix the output when the high part is a power of 2 and the low part
>>>>> is a nonzero number with opposite sign. This fix is based on commit
>>>>> 380bd0fd2418f8988217de950f8b8ff18af0cb2b.
>>>>
>>>> It is not a blocker, but based on the complexity of the change and the
>>>> optimization real case usage, I wonder if it would be better to just remove
>>>> it and use the default implementation instead.
>>>
>>> I tried to do that before fixing these, but the performance difference is
>>> still around 2x.
>>
>> I would expect it, but the question is do we really care about the performance
>> of logbl-ibm128 to maintain a separated implementation for power7+ only?
>>
> Yes we still care. The power7 optimization avoids LHS by performing the
> mask and convert in VSX. and applies to power8, power9 for IBM long
> double..
Yeah I am aware of it, I was the one that code it ;)
>
> It will be some time before _float128 is fully deployed and until then
> customers will notice, if this IBM long double optimization went
> missing.
>
It is not a blocker, just a realization if it worth the maintenance of
keep a separate implementation for powerpc for this specific symbol.
I usually prefer to simplify the code and try to use the default
implementation where possible, however if the powerpc arch maintainer
sees that it is still worth the trouble I am ok with it.
@@ -35,14 +35,16 @@ static const union {
long double
__logbl (long double x)
{
- double xh;
+ double xh, xl;
double ret;
+ int64_t hx;
if (__builtin_expect (x == 0.0L, 0))
/* Raise FE_DIVBYZERO and return -HUGE_VAL[LF]. */
return -1.0L / __builtin_fabsl (x);
- xh = ldbl_high (x);
+ ldbl_unpack (x, &xh, &xl);
+ EXTRACT_WORDS64 (hx, xh);
/* ret = x & 0x7ff0000000000000; */
asm (
"xxland %x0,%x1,%x2\n"
@@ -58,10 +60,20 @@ __logbl (long double x)
{
/* POSIX specifies that denormal number is treated as
though it were normalized. */
- int64_t hx;
-
- EXTRACT_WORDS64 (hx, xh);
- return (long double) (-1023 - (__builtin_clzll (hx) - 12));
+ return (long double) (- (__builtin_clzll (hx & 0x7fffffffffffffffLL) \
+ - 12) - 1023);
+ }
+ else if ((hx & 0x000fffffffffffffLL) == 0)
+ {
+ /* If the high part is a power of 2, and the low part is nonzero
+ with the opposite sign, the low part affects the
+ exponent. */
+ int64_t lx, rhx;
+ EXTRACT_WORDS64 (lx, xl);
+ rhx = (hx & 0x7ff0000000000000LL) >> 52;
+ if ((hx ^ lx) < 0 && (lx & 0x7fffffffffffffffLL) != 0)
+ rhx--;
+ return (long double) (rhx - 1023);
}
/* Test to avoid logb_downward (0.0) == -0.0. */
return ret == -0.0 ? 0.0 : ret;