[2/3] powerpc: POWER8 fmod optimization

Message ID 54627990.1060805@linux.vnet.ibm.com
State Dropped
Delegated to: Adhemerval Zanella Netto
Headers

Commit Message

Adhemerval Zanella Netto Nov. 11, 2014, 9:03 p.m. UTC
  This patch adds a POWER8 fmod optimization.  The implementation uses
floating point operations instead of default one which uses integers.
Inputs are handled for finite mode (since the symbol is meant to be used
with -ffinite-math-only) and inexact exceptions are disabled in FP
calculation.

Using GLIBC fmod benchmark:

* Default fmod:

  "fmod": {
   "": {
    "duration": 5.1043e+09,
    "iterations": 2.67654e+08,
    "max": 106.892,
    "min": 5.939,
    "mean": 19.0705
   }

* Patched version:

  "fmod": {
   "": {
    "duration": 5.08466e+09,
    "iterations": 6.35088e+08,
    "max": 92.284,
    "min": 6.849,
    "mean": 8.00623
   }

Tested on powerpc64 and powerpc64le.

--

	* sysdeps/powerpc/powerpc64/power8/fpu/e_fmod.S: New file: POWER8 fmod
	optimization.
	* sysdeps/powerpc/powerpc64/power8/fpu/e_fmodf.S: New file: POWER8
	fmodf optimization.

---
  

Comments

Joseph Myers Nov. 11, 2014, 9:28 p.m. UTC | #1
On Tue, 11 Nov 2014, Adhemerval Zanella wrote:

> +	mtfsb0  4*cr7+lt	/* Disable FE_INEXACT exception  */

Disable trapping?  If you disable trapping, you need to restore the 
previous state of that bit before returning.

> +	fdiv    fp0,fp1,fp2	/* fp0 = -trunc (fp1 / fp2)  */
> +	fneg    fp0,fp0
> +	friz    fp0,fp0
> +	fmadd   fp2,fp0,fp2,fp1 /* fp2 = x - (fp0) * y  */

Are you sure this is correct when the integer quotient is not exactly 
representable in IEEE double?  I'd expect you to need a loop in the case 
of large exponent differences.

> +	fcpsgn  fp1,fp1,fp2
> +	mtfsb0  4*cr1+eq	/* Clear any FE_INEXACT exception  */

Clear any FE_INEXACT bit, whether or not set before the function?  It's 
not correct for <math.h> functions to clear exception bits that were 
already set - only bits they set, if previously clear.  (See bug 15491 for 
such a bug in x86 nearbyint.)
  
Adhemerval Zanella Netto Nov. 12, 2014, 3:38 p.m. UTC | #2
On 11-11-2014 19:28, Joseph Myers wrote:
> On Tue, 11 Nov 2014, Adhemerval Zanella wrote:
>
>> +	mtfsb0  4*cr7+lt	/* Disable FE_INEXACT exception  */
> Disable trapping?  If you disable trapping, you need to restore the 
> previous state of that bit before returning.
>
>> +	fdiv    fp0,fp1,fp2	/* fp0 = -trunc (fp1 / fp2)  */
>> +	fneg    fp0,fp0
>> +	friz    fp0,fp0
>> +	fmadd   fp2,fp0,fp2,fp1 /* fp2 = x - (fp0) * y  */
> Are you sure this is correct when the integer quotient is not exactly 
> representable in IEEE double?  I'd expect you to need a loop in the case 
> of large exponent differences.

Indeed I overlook it, as I am seeing now with a more extensive testing.  Checking now,
it seems using a FP division operation in the loop to correct it not the best
strategy to achieve better performance.

>> +	fcpsgn  fp1,fp1,fp2
>> +	mtfsb0  4*cr1+eq	/* Clear any FE_INEXACT exception  */
> Clear any FE_INEXACT bit, whether or not set before the function?  It's 
> not correct for <math.h> functions to clear exception bits that were 
> already set - only bits they set, if previously clear.  (See bug 15491 for 
> such a bug in x86 nearbyint.)
>
  
Joseph Myers Nov. 12, 2014, 4:19 p.m. UTC | #3
On Wed, 12 Nov 2014, Adhemerval Zanella wrote:

> Indeed I overlook it, as I am seeing now with a more extensive testing.  
> Checking now, it seems using a FP division operation in the loop to 
> correct it not the best strategy to achieve better performance.

And there are other problems with the division I didn't notice previously.  
If the exponent difference is large enough, the division may overflow.  
And you're assuming the result of (round to next integer toward 0 (divide 
and round in current rounding mode)) is the same as (divide in infinite 
precision, round once to an integer toward 0) - but depending on the 
arguments and rounding mode, the result of the division may be rounded 
away from 0 to an integer with magnitude greater than the desired result.
  
Adhemerval Zanella Netto Nov. 12, 2014, 6:29 p.m. UTC | #4
On 12-11-2014 14:19, Joseph Myers wrote:
> On Wed, 12 Nov 2014, Adhemerval Zanella wrote:
>
>> Indeed I overlook it, as I am seeing now with a more extensive testing.  
>> Checking now, it seems using a FP division operation in the loop to 
>> correct it not the best strategy to achieve better performance.
> And there are other problems with the division I didn't notice previously.  
> If the exponent difference is large enough, the division may overflow.  
> And you're assuming the result of (round to next integer toward 0 (divide 
> and round in current rounding mode)) is the same as (divide in infinite 
> precision, round once to an integer toward 0) - but depending on the 
> arguments and rounding mode, the result of the division may be rounded 
> away from 0 to an integer with magnitude greater than the desired result.
>
Yes, I noted that my naive approach is far from correct.  I'm dropping this patch,
thanks for the review.
  

Patch

diff --git a/sysdeps/powerpc/powerpc64/power8/fpu/e_fmod.S b/sysdeps/powerpc/powerpc64/power8/fpu/e_fmod.S
new file mode 100644
index 0000000..e9b35a2
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/power8/fpu/e_fmod.S
@@ -0,0 +1,46 @@ 
+/* Finite fmod optimization - PowerPC64/POWER8 version.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <math_ldbl_opt.h>
+
+#define MFVSRD_R3_V2  .long 0x7c430066     /* mfvsrd  r3,vs1  */
+
+/* double [fp1] __ieee754_fmod (double [fp1] x, double [fp2] y)  */
+
+        .machine power8
+ENTRY(__ieee754_fmod)
+	/* First check if 'y' is InF and return 'x' if it is the case.  */
+	MFVSRD_R3_V2
+	lis     r9,0x7ff0       /* r9 = 0x7ff0  */
+	rldicl  r10,r3,0,1      /* r10 = r3 & (0x8000000000000000)  */
+	sldi    r9,r9,32	/* r9 = r9 << 52  */
+	cmpd    cr7,r10,r9      /* fp1 & 0x7ff0000000000000 ?  */
+	beqlr   cr7
+
+	mtfsb0  4*cr7+lt	/* Disable FE_INEXACT exception  */
+	fdiv    fp0,fp1,fp2	/* fp0 = -trunc (fp1 / fp2)  */
+	fneg    fp0,fp0
+	friz    fp0,fp0
+	fmadd   fp2,fp0,fp2,fp1 /* fp2 = x - (fp0) * y  */
+	fcpsgn  fp1,fp1,fp2
+	mtfsb0  4*cr1+eq	/* Clear any FE_INEXACT exception  */
+	blr
+END (__ieee754_fmod)
+
+strong_alias (__ieee754_fmod, __fmod_finite)
diff --git a/sysdeps/powerpc/powerpc64/power8/fpu/e_fmodf.S b/sysdeps/powerpc/powerpc64/power8/fpu/e_fmodf.S
new file mode 100644
index 0000000..7e16374
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/power8/fpu/e_fmodf.S
@@ -0,0 +1,46 @@ 
+/* Finite fmodf optimization - PowerPC64/POWER8 version.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <math_ldbl_opt.h>
+
+#define MFVSRD_R3_V2  .long 0x7c430066     /* mfvsrd  r3,vs1  */
+
+/* double [fp1] __ieee754_fmod (double [fp1] x, double [fp2] y)  */
+
+	.machine power8
+ENTRY(__ieee754_fmodf)
+	/* First check if 'y' is InF and return 'x' if it is the case.  */
+	MFVSRD_R3_V2
+	lis     r9,0x7ff0       /* r9 = 0x7ff0  */
+	rldicl  r10,r3,0,1      /* r10 = r3 & (0x8000000000000000)  */
+	sldi    r9,r9,32	/* r9 = r9 << 52  */
+	cmpd    cr7,r10,r9      /* fp1 & 0x7ff0000000000000 ?  */
+	beqlr   cr7
+
+	mtfsb0  4*cr7+lt	/* Disable FE_INEXACT exception  */
+	fdivs   fp0,fp1,fp2	/* fp0 = -trunc (fp1 / fp2)  */
+	fneg    fp0,fp0
+	friz    fp0,fp0
+	fmadds  fp2,fp0,fp2,fp1 /* fp2 = x - (fp0) * y  */
+	fcpsgn  fp1,fp1,fp2
+	mtfsb0  4*cr1+eq	/* Clear any FE_INEXACT exception  */
+	blr
+END (__ieee754_fmodf)
+
+strong_alias (__ieee754_fmodf, __fmodf_finite)