[powerpc] fegetmode: utilize faster method to get rounding mode

Message ID 1559870599-2079-1-git-send-email-pc@us.ibm.com
State Superseded
Headers

Commit Message

Paul A. Clarke June 7, 2019, 1:23 a.m. UTC
  From: "Paul A. Clarke" <pc@us.ibm.com>

Add support to use 'mffsl' instruction if compiled for POWER9 (or later).

2019-06-06  Paul A. Clarke  <pc@us.ibm.com>

	* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status): New.
	* sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Use fegetenv_status()
	instead of fegetenv_register().
---
 sysdeps/powerpc/fpu/fegetmode.c |  2 +-
 sysdeps/powerpc/fpu/fenv_libc.h | 12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)
  

Comments

Adhemerval Zanella June 7, 2019, 2:20 p.m. UTC | #1
On 06/06/2019 22:23, Paul A. Clarke wrote:
> From: "Paul A. Clarke" <pc@us.ibm.com>
> 
> Add support to use 'mffsl' instruction if compiled for POWER9 (or later).

What about using dl_hwcap to check for ISA 3.0 for fegetenv_register
(as fesetenv_register does for fdp)? Not sure how performance-wise it would 
be, but in the other hand it would not require to build glibc using -mcpu=power9
to actually use this instruction.

> 
> 2019-06-06  Paul A. Clarke  <pc@us.ibm.com>
> 
> 	* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status): New.
> 	* sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Use fegetenv_status()
> 	instead of fegetenv_register().
> ---
>  sysdeps/powerpc/fpu/fegetmode.c |  2 +-
>  sysdeps/powerpc/fpu/fenv_libc.h | 12 ++++++++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/sysdeps/powerpc/fpu/fegetmode.c b/sysdeps/powerpc/fpu/fegetmode.c
> index f43ab60..466f5b7 100644
> --- a/sysdeps/powerpc/fpu/fegetmode.c
> +++ b/sysdeps/powerpc/fpu/fegetmode.c
> @@ -21,6 +21,6 @@
>  int
>  fegetmode (femode_t *modep)
>  {
> -  *modep = fegetenv_register ();
> +  *modep = fegetenv_status ();
>    return 0;
>  }
> diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
> index f8dd1b7..64f7398 100644
> --- a/sysdeps/powerpc/fpu/fenv_libc.h
> +++ b/sysdeps/powerpc/fpu/fenv_libc.h
> @@ -34,6 +34,18 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
>     pointer.  */
>  #define fegetenv_register() __builtin_mffs()
>  
> +/* Equivalent to fegetenv_register, but only returns bits for
> +   status, exception enables, and mode.  */
> +#ifdef _ARCH_PWR9
> +#define fegetenv_status() \
> +  ({register double __fr;				\
> +    __asm__ ("mffsl %0" : "=f" (__fr));			\
> +    __fr;						\
> +  })
> +#else
> +#define fegetenv_status() fegetenv_register()
> +#endif
> +
>  /* Equivalent to fesetenv, but takes a fenv_t instead of a pointer.  */
>  #define fesetenv_register(env) \
>  	do { \
>
  
Joseph Myers June 10, 2019, 3:36 p.m. UTC | #2
On Thu, 6 Jun 2019, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <pc@us.ibm.com>
> 
> Add support to use 'mffsl' instruction if compiled for POWER9 (or later).

The patch subject says "rounding mode".  fegetmode should save all control 
modes, not just the rounding mode.  (The C2x wording is "all the dynamic 
floating-point control modes".)

My understanding is that the change is in fact correct - mffsl does save 
all control modes (e.g. exception enable and non-IEEE bit), not just the 
rounding mode (and includes both binary and decimal rounding modes) - and 
so this is just an issue with the patch description, not the code changes.
  
Paul A. Clarke June 10, 2019, 6:49 p.m. UTC | #3
Adhemerval,

On 6/7/19 9:20 AM, Adhemerval Zanella wrote:
> On 06/06/2019 22:23, Paul A. Clarke wrote:
>> From: "Paul A. Clarke" <pc@us.ibm.com>
>> Add support to use 'mffsl' instruction if compiled for POWER9 (or later).
> 
> What about using dl_hwcap to check for ISA 3.0 for fegetenv_register
> (as fesetenv_register does for fdp)? Not sure how performance-wise it would 
> be, but in the other hand it would not require to build glibc using -mcpu=power9
> to actually use this instruction.

Good idea.  I think I can do both, where the POWER9 instruction is used conditional
on hwcap2, and unconditionally if compiled with -mcpu=power9 (pseudocode, not actual
reviewable code, follows):

#ifdef _ARCH_PWR9
#define IS_ISA300() 1
#else
#define IS_ISA300() (hwcap2 & PPC_FEATURE2_ARCH_3_00)
#endif

rn = ({
  union { double __d; unsigned long long __ll; } __u;
  if (__builtin_expect(IS_ISA300(),1))
    __asm__ volatile ("mffsl %0" : "=f" (__u.__d));
  else
    __u.__d = __builtin_mffs ();
  __u.__ll & 0x0000000000000003LL;
});

PC
  
Paul A. Clarke June 10, 2019, 6:51 p.m. UTC | #4
On 6/10/19 10:36 AM, Joseph Myers wrote:
> On Thu, 6 Jun 2019, Paul A. Clarke wrote:
>> From: "Paul A. Clarke" <pc@us.ibm.com>
>> Add support to use 'mffsl' instruction if compiled for POWER9 (or later).
> 
> The patch subject says "rounding mode".  fegetmode should save all control 
> modes, not just the rounding mode.  (The C2x wording is "all the dynamic 
> floating-point control modes".)
> 
> My understanding is that the change is in fact correct - mffsl does save 
> all control modes (e.g. exception enable and non-IEEE bit), not just the 
> rounding mode (and includes both binary and decimal rounding modes) - and 
> so this is just an issue with the patch description, not the code changes.

Good catch, and your understanding is correct.  I will update the description in v2,
which will also incorporate Adhemerval's suggestion.  Coming soon.

PC
  
Adhemerval Zanella June 11, 2019, 5:45 p.m. UTC | #5
On 10/06/2019 15:49, Paul Clarke wrote:
> Adhemerval,
> 
> On 6/7/19 9:20 AM, Adhemerval Zanella wrote:
>> On 06/06/2019 22:23, Paul A. Clarke wrote:
>>> From: "Paul A. Clarke" <pc@us.ibm.com>
>>> Add support to use 'mffsl' instruction if compiled for POWER9 (or later).
>>
>> What about using dl_hwcap to check for ISA 3.0 for fegetenv_register
>> (as fesetenv_register does for fdp)? Not sure how performance-wise it would 
>> be, but in the other hand it would not require to build glibc using -mcpu=power9
>> to actually use this instruction.
> 
> Good idea.  I think I can do both, where the POWER9 instruction is used conditional
> on hwcap2, and unconditionally if compiled with -mcpu=power9 (pseudocode, not actual
> reviewable code, follows):
> 
> #ifdef _ARCH_PWR9
> #define IS_ISA300() 1
> #else
> #define IS_ISA300() (hwcap2 & PPC_FEATURE2_ARCH_3_00)
> #endif
> 
> rn = ({
>   union { double __d; unsigned long long __ll; } __u;
>   if (__builtin_expect(IS_ISA300(),1))
>     __asm__ volatile ("mffsl %0" : "=f" (__u.__d));
>   else
>     __u.__d = __builtin_mffs ();
>   __u.__ll & 0x0000000000000003LL;
> });
> 
> PC
> 

Yeah, something like that.
  

Patch

diff --git a/sysdeps/powerpc/fpu/fegetmode.c b/sysdeps/powerpc/fpu/fegetmode.c
index f43ab60..466f5b7 100644
--- a/sysdeps/powerpc/fpu/fegetmode.c
+++ b/sysdeps/powerpc/fpu/fegetmode.c
@@ -21,6 +21,6 @@ 
 int
 fegetmode (femode_t *modep)
 {
-  *modep = fegetenv_register ();
+  *modep = fegetenv_status ();
   return 0;
 }
diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index f8dd1b7..64f7398 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -34,6 +34,18 @@  extern const fenv_t *__fe_mask_env (void) attribute_hidden;
    pointer.  */
 #define fegetenv_register() __builtin_mffs()
 
+/* Equivalent to fegetenv_register, but only returns bits for
+   status, exception enables, and mode.  */
+#ifdef _ARCH_PWR9
+#define fegetenv_status() \
+  ({register double __fr;				\
+    __asm__ ("mffsl %0" : "=f" (__fr));			\
+    __fr;						\
+  })
+#else
+#define fegetenv_status() fegetenv_register()
+#endif
+
 /* Equivalent to fesetenv, but takes a fenv_t instead of a pointer.  */
 #define fesetenv_register(env) \
 	do { \