powerpc: Add optimized ilogbf128 for POWER9

Message ID 20201222153039.17722-1-rzinsly@linux.ibm.com
State Superseded
Delegated to: Tulio Magno Quites Machado Filho
Headers
Series powerpc: Add optimized ilogbf128 for POWER9 |

Commit Message

Raphael M Zinsly Dec. 22, 2020, 3:30 p.m. UTC
  The instruction xsxexpqp introduced on POWER9 extracts the exponent
from a quad-precision floating-point, thus it can be used to improve
ilogbf128 and llogbf128.
---
 .../powerpc/powerpc64/le/fpu/e_ilogbf128.c    | 22 +++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
  

Comments

Raphael M Zinsly Dec. 22, 2020, 3:36 p.m. UTC | #1
Benchtests results with and without this patch on a POWER9:

without:
   "ilogbf128": {
    "subnormal": {
     "duration": 5.09834e+08,
     "iterations": 2.8146e+07,
     "max": 38.979,
     "min": 2.939,
     "mean": 18.1139
    },
    "normal": {
     "duration": 4.99378e+08,
     "iterations": 1.6151e+08,
     "max": 16.698,
     "min": 2.942,
     "mean": 3.09193
    }
   }

with:
   "ilogbf128": {
    "subnormal": {
     "duration": 5.09989e+08,
     "iterations": 2.5978e+07,
     "max": 41.027,
     "min": 4.674,
     "mean": 19.6316
    },
    "normal": {
     "duration": 4.98105e+08,
     "iterations": 1.77912e+08,
     "max": 12.663,
     "min": 2.792,
     "mean": 2.79972
    }
   }

Best Regards,
  
Paul E Murphy Jan. 4, 2021, 11:20 p.m. UTC | #2
On 12/22/20 9:30 AM, Raphael Moreira Zinsly via Libc-alpha wrote:
> The instruction xsxexpqp introduced on POWER9 extracts the exponent
> from a quad-precision floating-point, thus it can be used to improve
> ilogbf128 and llogbf128.
> ---
>   .../powerpc/powerpc64/le/fpu/e_ilogbf128.c    | 22 +++++++++++++++++++
>   1 file changed, 22 insertions(+)
>   create mode 100644 sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
> 
> diff --git a/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
> new file mode 100644
> index 0000000000..47558bbadc
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
> @@ -0,0 +1,22 @@
> +#ifdef _ARCH_PWR9
> +int _ilogbf128 (_Float128 __x);

This should be a locally (static) scoped function.

> +
> +int
> +#if defined(_F128_ENABLE_IFUNC)
> +__ieee754_ilogbf128_power9 (_Float128 __x)
> +#else
> +__ieee754_ilogbf128 (_Float128 __x)
> +#endif
> +{
> +  /* Check for exceptional cases.  */
> +  if (!__builtin_vsx_scalar_test_data_class_qp (__x, 0x7f))
> +    return __builtin_vsx_scalar_extract_expq (__x) - 0x3fff;
> +  else
> +    /* Fallback to the generic ilogb if __x is NaN, Inf or subnormal.  */
> +    return _ilogbf128(__x);
> +}
> +
> +#define __ieee754_ilogbf128 _ilogbf128
> +#endif
> +
> +#include<sysdeps/ieee754/float128/e_ilogbf128.c>

A space seems to be missing between include and <.

Otherwise, LGTM.

As a side note, I think the benchtests are not too impressive. I am 
surprised normal values don't show better results.
  
Paul E Murphy Jan. 5, 2021, 6:19 p.m. UTC | #3
On 1/4/21 5:20 PM, Paul E Murphy via Libc-alpha wrote:
> 
> 
> On 12/22/20 9:30 AM, Raphael Moreira Zinsly via Libc-alpha wrote:
>> The instruction xsxexpqp introduced on POWER9 extracts the exponent
>> from a quad-precision floating-point, thus it can be used to improve
>> ilogbf128 and llogbf128.
>> ---
>>   .../powerpc/powerpc64/le/fpu/e_ilogbf128.c    | 22 +++++++++++++++++++
>>   1 file changed, 22 insertions(+)
>>   create mode 100644 sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
>>
>> diff --git a/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c 
>> b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
>> new file mode 100644
>> index 0000000000..47558bbadc
>> --- /dev/null
>> +++ b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
>> @@ -0,0 +1,22 @@
>> +#ifdef _ARCH_PWR9
>> +int _ilogbf128 (_Float128 __x);
> 
> This should be a locally (static) scoped function.
> 
>> +
>> +int
>> +#if defined(_F128_ENABLE_IFUNC)
>> +__ieee754_ilogbf128_power9 (_Float128 __x)
>> +#else
>> +__ieee754_ilogbf128 (_Float128 __x)
>> +#endif
>> +{
>> +  /* Check for exceptional cases.  */
>> +  if (!__builtin_vsx_scalar_test_data_class_qp (__x, 0x7f))
>> +    return __builtin_vsx_scalar_extract_expq (__x) - 0x3fff;
>> +  else
>> +    /* Fallback to the generic ilogb if __x is NaN, Inf or 
>> subnormal.  */
>> +    return _ilogbf128(__x);
>> +}
>> +
>> +#define __ieee754_ilogbf128 _ilogbf128
>> +#endif
>> +
>> +#include<sysdeps/ieee754/float128/e_ilogbf128.c>
> 
> A space seems to be missing between include and <.
> 
> Otherwise, LGTM.
> 
> As a side note, I think the benchtests are not too impressive. I am 
> surprised normal values don't show better results.

After spending a little time looking at this, the call overhead of the 
wrapper is hiding most of the improvement.  Similarly, power9 adds 
similar instructions for float32/float64.

I would recommend refactoring this patch to provide an override to 
w_ilogb_template.c so all three formats can use these new instructions 
without the call overhead for normal numbers.
  

Patch

diff --git a/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
new file mode 100644
index 0000000000..47558bbadc
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/le/fpu/e_ilogbf128.c
@@ -0,0 +1,22 @@ 
+#ifdef _ARCH_PWR9
+int _ilogbf128 (_Float128 __x);
+
+int
+#if defined(_F128_ENABLE_IFUNC)
+__ieee754_ilogbf128_power9 (_Float128 __x)
+#else
+__ieee754_ilogbf128 (_Float128 __x)
+#endif
+{
+  /* Check for exceptional cases.  */
+  if (!__builtin_vsx_scalar_test_data_class_qp (__x, 0x7f))
+    return __builtin_vsx_scalar_extract_expq (__x) - 0x3fff;
+  else
+    /* Fallback to the generic ilogb if __x is NaN, Inf or subnormal.  */
+    return _ilogbf128(__x);
+}
+
+#define __ieee754_ilogbf128 _ilogbf128
+#endif
+
+#include<sysdeps/ieee754/float128/e_ilogbf128.c>