powerpc: Fix logbl on power7 [BZ# 21280]

Message ID 1490043778-30228-1-git-send-email-tuliom@linux.vnet.ibm.com
State Committed
Headers

Commit Message

Tulio Magno Quites Machado Filho March 20, 2017, 9:02 p.m. UTC
  1. Fix the results for negative subnormals by ignoring the signal when
    normalizing the value.
 2. Fix the output when the high part is a power of 2 and the low part
    is a nonzero number with opposite sign.  This fix is based on commit
    380bd0fd2418f8988217de950f8b8ff18af0cb2b.

After applying this patch, logbl() tests pass cleanly on POWER >= 7.

Tested on powerpc, powerpc64 and powerpc64le

2017-03-20  Tulio Magno Quites Machado Filho  <tuliom@linux.vnet.ibm.com>

	[BZ #21280]
	* sysdeps/powerpc/power7/fpu/s_logbl.c (__logbl): Ignore the
	signal of subnormals and adjust the exponent of power of 2 down
	when low part has opposite sign.
---
 sysdeps/powerpc/power7/fpu/s_logbl.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)
  

Comments

Adhemerval Zanella Netto March 21, 2017, 1:21 p.m. UTC | #1
On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>  1. Fix the results for negative subnormals by ignoring the signal when
>     normalizing the value.
>  2. Fix the output when the high part is a power of 2 and the low part
>     is a nonzero number with opposite sign.  This fix is based on commit
>     380bd0fd2418f8988217de950f8b8ff18af0cb2b.

It is not a blocker, but based on the complexity of the change and the 
optimization real case usage, I wonder if it would be better to just remove 
it and use the default implementation instead.
  
Tulio Magno Quites Machado Filho March 21, 2017, 8:09 p.m. UTC | #2
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>>  1. Fix the results for negative subnormals by ignoring the signal when
>>     normalizing the value.
>>  2. Fix the output when the high part is a power of 2 and the low part
>>     is a nonzero number with opposite sign.  This fix is based on commit
>>     380bd0fd2418f8988217de950f8b8ff18af0cb2b.
>
> It is not a blocker, but based on the complexity of the change and the 
> optimization real case usage, I wonder if it would be better to just remove 
> it and use the default implementation instead.

I tried to do that before fixing these, but the performance difference is
still around 2x.
  
Adhemerval Zanella Netto March 21, 2017, 8:18 p.m. UTC | #3
On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> 
>> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>>>  1. Fix the results for negative subnormals by ignoring the signal when
>>>     normalizing the value.
>>>  2. Fix the output when the high part is a power of 2 and the low part
>>>     is a nonzero number with opposite sign.  This fix is based on commit
>>>     380bd0fd2418f8988217de950f8b8ff18af0cb2b.
>>
>> It is not a blocker, but based on the complexity of the change and the 
>> optimization real case usage, I wonder if it would be better to just remove 
>> it and use the default implementation instead.
> 
> I tried to do that before fixing these, but the performance difference is
> still around 2x.

I would expect it, but the question is do we really care about the performance
of logbl-ibm128 to maintain a separated implementation for power7+ only?
  
Tulio Magno Quites Machado Filho April 28, 2017, 5:59 p.m. UTC | #4
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote:
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>> 
>>> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>>>>  1. Fix the results for negative subnormals by ignoring the signal when
>>>>     normalizing the value.
>>>>  2. Fix the output when the high part is a power of 2 and the low part
>>>>     is a nonzero number with opposite sign.  This fix is based on commit
>>>>     380bd0fd2418f8988217de950f8b8ff18af0cb2b.
>>>
>>> It is not a blocker, but based on the complexity of the change and the 
>>> optimization real case usage, I wonder if it would be better to just remove 
>>> it and use the default implementation instead.
>> 
>> I tried to do that before fixing these, but the performance difference is
>> still around 2x.
>
> I would expect it, but the question is do we really care about the performance
> of logbl-ibm128 to maintain a separated implementation for power7+ only?

Yes, we still need it.
Considering your proposal was not a blocker, I plan to merge this fix.

Anyway, I agree in re-evaluating the removal of this optimization
in a few releases.
  
Steven Munroe April 28, 2017, 6:26 p.m. UTC | #5
On Tue, 2017-03-21 at 17:18 -0300, Adhemerval Zanella wrote:
> 
> On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote:
> > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> > 
> >> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
> >>>  1. Fix the results for negative subnormals by ignoring the signal when
> >>>     normalizing the value.
> >>>  2. Fix the output when the high part is a power of 2 and the low part
> >>>     is a nonzero number with opposite sign.  This fix is based on commit
> >>>     380bd0fd2418f8988217de950f8b8ff18af0cb2b.
> >>
> >> It is not a blocker, but based on the complexity of the change and the 
> >> optimization real case usage, I wonder if it would be better to just remove 
> >> it and use the default implementation instead.
> > 
> > I tried to do that before fixing these, but the performance difference is
> > still around 2x.
> 
> I would expect it, but the question is do we really care about the performance
> of logbl-ibm128 to maintain a separated implementation for power7+ only?
> 
Yes we still care. The power7 optimization avoids LHS by performing the
mask and convert in VSX. and applies to power8, power9 for IBM long
double.. 

It will be some time before _float128 is fully deployed and until then
customers will notice, if this IBM long double optimization went
missing.
  
Adhemerval Zanella Netto April 28, 2017, 7:48 p.m. UTC | #6
On 28/04/2017 15:26, Steven Munroe wrote:
> On Tue, 2017-03-21 at 17:18 -0300, Adhemerval Zanella wrote:
>>
>> On 21/03/2017 17:09, Tulio Magno Quites Machado Filho wrote:
>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
>>>
>>>> On 20/03/2017 18:02, Tulio Magno Quites Machado Filho wrote:
>>>>>  1. Fix the results for negative subnormals by ignoring the signal when
>>>>>     normalizing the value.
>>>>>  2. Fix the output when the high part is a power of 2 and the low part
>>>>>     is a nonzero number with opposite sign.  This fix is based on commit
>>>>>     380bd0fd2418f8988217de950f8b8ff18af0cb2b.
>>>>
>>>> It is not a blocker, but based on the complexity of the change and the 
>>>> optimization real case usage, I wonder if it would be better to just remove 
>>>> it and use the default implementation instead.
>>>
>>> I tried to do that before fixing these, but the performance difference is
>>> still around 2x.
>>
>> I would expect it, but the question is do we really care about the performance
>> of logbl-ibm128 to maintain a separated implementation for power7+ only?
>>
> Yes we still care. The power7 optimization avoids LHS by performing the
> mask and convert in VSX. and applies to power8, power9 for IBM long
> double.. 

Yeah I am aware of it, I was the one that code it ;)

> 
> It will be some time before _float128 is fully deployed and until then
> customers will notice, if this IBM long double optimization went
> missing.
> 

It is not a blocker, just a realization if it worth the maintenance of
keep a separate implementation for powerpc for this specific symbol.
I usually prefer to simplify the code and try to use the default
implementation where possible, however if the powerpc arch maintainer
sees that it is still worth the trouble I am ok with it.
  

Patch

diff --git a/sysdeps/powerpc/power7/fpu/s_logbl.c b/sysdeps/powerpc/power7/fpu/s_logbl.c
index f7ecbd1..3ae383a 100644
--- a/sysdeps/powerpc/power7/fpu/s_logbl.c
+++ b/sysdeps/powerpc/power7/fpu/s_logbl.c
@@ -35,14 +35,16 @@  static const union {
 long double
 __logbl (long double x)
 {
-  double xh;
+  double xh, xl;
   double ret;
+  int64_t hx;
 
   if (__builtin_expect (x == 0.0L, 0))
     /* Raise FE_DIVBYZERO and return -HUGE_VAL[LF].  */
     return -1.0L / __builtin_fabsl (x);
 
-  xh = ldbl_high (x);
+  ldbl_unpack (x, &xh, &xl);
+  EXTRACT_WORDS64 (hx, xh);
   /* ret = x & 0x7ff0000000000000;  */
   asm (
     "xxland %x0,%x1,%x2\n"
@@ -58,10 +60,20 @@  __logbl (long double x)
     {
       /* POSIX specifies that denormal number is treated as
          though it were normalized.  */
-      int64_t hx;
-
-      EXTRACT_WORDS64 (hx, xh);
-      return (long double) (-1023 - (__builtin_clzll (hx) - 12));
+      return (long double) (- (__builtin_clzll (hx & 0x7fffffffffffffffLL) \
+			       - 12) - 1023);
+    }
+  else if ((hx & 0x000fffffffffffffLL) == 0)
+    {
+      /* If the high part is a power of 2, and the low part is nonzero
+	 with the opposite sign, the low part affects the
+	 exponent.  */
+      int64_t lx, rhx;
+      EXTRACT_WORDS64 (lx, xl);
+      rhx = (hx & 0x7ff0000000000000LL) >> 52;
+      if ((hx ^ lx) < 0 && (lx & 0x7fffffffffffffffLL) != 0)
+	rhx--;
+      return (long double) (rhx - 1023);
     }
   /* Test to avoid logb_downward (0.0) == -0.0.  */
   return ret == -0.0 ? 0.0 : ret;