[8/9] i386: Move hypot implementation to C

Message ID 20211006180557.933826-9-adhemerval.zanella@linaro.org (mailing list archive)
State Superseded
Headers
Series Improve hypot() |

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent

Commit Message

Adhemerval Zanella Oct. 6, 2021, 6:05 p.m. UTC
  The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling.  Results on a Ryzen 5900X
with gcc 10.3.1:

master:
  "hypotf": {
   "workload-random": {
    "duration": 3.76493e+09,
    "iterations": 6.8e+07,
    "reciprocal-throughput": 38.4243,
    "latency": 72.309,
    "max-throughput": 2.60252e+07,
    "min-throughput": 1.38295e+07
   }
  }

patched:
  "hypotf": {
   "workload-random": {
    "duration": 3.78098e+09,
    "iterations": 6.8e+07,
    "reciprocal-throughput": 28.9206,
    "latency": 82.2848,
    "max-throughput": 3.45774e+07,
    "min-throughput": 1.21529e+07
   }
  }

The generic hypot is way slower, since the optimized implementation
uses the i386 default excessive precision to issue the operation
directly.  A similar implementation is provided instead of using
the generic implementation:

master:
  "hypot": {
   "workload-random": {
    "duration": 3.7452e+09,
    "iterations": 6.6e+07,
    "reciprocal-throughput": 40.3203,
    "latency": 73.1707,
    "max-throughput": 2.48014e+07,
    "min-throughput": 1.36667e+07
   }
  }

patched:
./benchtests/bench-hypot
  "hypot": {
   "workload-random": {
    "duration": 3.72606e+09,
    "iterations": 7.6e+07,
    "reciprocal-throughput": 25.9437,
    "latency": 72.1104,
    "max-throughput": 3.8545e+07,
    "min-throughput": 1.38676e+07
   }
  }

Checked on i686-linux-gnu.
---
 sysdeps/i386/fpu/e_hypot.S  | 75 -------------------------------------
 sysdeps/i386/fpu/e_hypot.c  | 42 +++++++++++++++++++++
 sysdeps/i386/fpu/e_hypotf.S | 64 -------------------------------
 3 files changed, 42 insertions(+), 139 deletions(-)
 delete mode 100644 sysdeps/i386/fpu/e_hypot.S
 create mode 100644 sysdeps/i386/fpu/e_hypot.c
 delete mode 100644 sysdeps/i386/fpu/e_hypotf.S
  

Comments

Joseph Myers Oct. 6, 2021, 6:37 p.m. UTC | #1
On Wed, 6 Oct 2021, Adhemerval Zanella via Libc-alpha wrote:

> +/* The i386 allows ot use the default excess of precision to optimize the

s/ot/to/

> +   hypot implementation, since internal multiplication and sqrt is carried
> +   with 80-bit FP type.  */
> +double
> +__ieee754_hypot (double x, double y)
> +{
> +  if ((isinf (x) || isinf (y))
> +      && !issignaling (x) && !issignaling (y))
> +    return INFINITY;
> +  if (isnan (x) || isnan (y))
> +    return x + y;
> +
> +  double r = math_narrow_eval (sqrt (x * x + y * y));

There is no guarantee of when excess precision might or might not be used 
for intermediate computations, or whether it might not be used (so 
resulting in spurious overflows).  I'd expect explicit conversions of x 
and y to long double here, use of sqrtl and conversion of the result to 
double.
  
Adhemerval Zanella Oct. 6, 2021, 7:19 p.m. UTC | #2
On 06/10/2021 15:37, Joseph Myers wrote:
> On Wed, 6 Oct 2021, Adhemerval Zanella via Libc-alpha wrote:
> 
>> +/* The i386 allows ot use the default excess of precision to optimize the
> 
> s/ot/to/

Ack.

> 
>> +   hypot implementation, since internal multiplication and sqrt is carried
>> +   with 80-bit FP type.  */
>> +double
>> +__ieee754_hypot (double x, double y)
>> +{
>> +  if ((isinf (x) || isinf (y))
>> +      && !issignaling (x) && !issignaling (y))
>> +    return INFINITY;
>> +  if (isnan (x) || isnan (y))
>> +    return x + y;
>> +
>> +  double r = math_narrow_eval (sqrt (x * x + y * y));
> 
> There is no guarantee of when excess precision might or might not be used 
> for intermediate computations, or whether it might not be used (so 
> resulting in spurious overflows).  I'd expect explicit conversions of x 
> and y to long double here, use of sqrtl and conversion of the result to 
> double.
> 

Something like the below?

  long double lx = x;
  long double ly = y;
  double r = sqrtl (lx * lx + ly * ly);
  return r;
  
Adhemerval Zanella Oct. 6, 2021, 7:20 p.m. UTC | #3
On 06/10/2021 16:19, Adhemerval Zanella wrote:
> 
> 
> On 06/10/2021 15:37, Joseph Myers wrote:
>> On Wed, 6 Oct 2021, Adhemerval Zanella via Libc-alpha wrote:
>>
>>> +/* The i386 allows ot use the default excess of precision to optimize the
>>
>> s/ot/to/
> 
> Ack.
> 
>>
>>> +   hypot implementation, since internal multiplication and sqrt is carried
>>> +   with 80-bit FP type.  */
>>> +double
>>> +__ieee754_hypot (double x, double y)
>>> +{
>>> +  if ((isinf (x) || isinf (y))
>>> +      && !issignaling (x) && !issignaling (y))
>>> +    return INFINITY;
>>> +  if (isnan (x) || isnan (y))
>>> +    return x + y;
>>> +
>>> +  double r = math_narrow_eval (sqrt (x * x + y * y));
>>
>> There is no guarantee of when excess precision might or might not be used 
>> for intermediate computations, or whether it might not be used (so 
>> resulting in spurious overflows).  I'd expect explicit conversions of x 
>> and y to long double here, use of sqrtl and conversion of the result to 
>> double.
>>
> 
> Something like the below?
> 
>   long double lx = x;
>   long double ly = y;
>   double r = sqrtl (lx * lx + ly * ly);
>   return r;
> 

In fact we need to handle underflow:

  long double lx = x;
  long double ly = y;
  double r = sqrtl (lx * lx + ly * ly);
  math_check_force_underflow_nonneg (r);
  return r;
  
Joseph Myers Oct. 6, 2021, 7:52 p.m. UTC | #4
On Wed, 6 Oct 2021, Adhemerval Zanella via Libc-alpha wrote:

> In fact we need to handle underflow:
> 
>   long double lx = x;
>   long double ly = y;
>   double r = sqrtl (lx * lx + ly * ly);

You probably also need a math_narrow_eval here to ensure the value checked 
for overflow and returned has definitely been properly narrowed to double 
first.

>   math_check_force_underflow_nonneg (r);
>   return r;
  
Paul Zimmermann Oct. 7, 2021, 8:28 a.m. UTC | #5
> > In fact we need to handle underflow:
> > 
> >   long double lx = x;
> >   long double ly = y;
> >   double r = sqrtl (lx * lx + ly * ly);
> 
> You probably also need a math_narrow_eval here to ensure the value checked 
> for overflow and returned has definitely been properly narrowed to double 
> first.

right, If x = y = 0x1.fffffffffffffp1023, then lx * lx + ly * ly will not
overflow as long double, nor sqrtl(.), but the conversion to double will.
This should probably be added as a test case (if not already).

Paul
  
Joseph Myers Oct. 7, 2021, 5:05 p.m. UTC | #6
On Thu, 7 Oct 2021, Paul Zimmermann wrote:

> > > In fact we need to handle underflow:
> > > 
> > >   long double lx = x;
> > >   long double ly = y;
> > >   double r = sqrtl (lx * lx + ly * ly);
> > 
> > You probably also need a math_narrow_eval here to ensure the value checked 
> > for overflow and returned has definitely been properly narrowed to double 
> > first.
> 
> right, If x = y = 0x1.fffffffffffffp1023, then lx * lx + ly * ly will not
> overflow as long double, nor sqrtl(.), but the conversion to double will.
> This should probably be added as a test case (if not already).

This is the sort of test that could be represented simply by a line

hypot max max

in auto-libm-test-in (which would then include all pairs of the maximum 
values of supported floating-point formats), but is problematic to add in 
practice because of limitations of the libm test machinery.

For example, that line would generate tests including hypot (DBL_MAX, 
FLT_MAX) for double.  Logically, by the rules glibc follows about 
acceptable errors (that most libm functions should act like they compute 
an infinite-precision result close to the mathematical value, then round 
it to the relevant type):

* If, for round-to-nearest, this overflows (generates Inf as the result 
with FE_OVERFLOW raised), that's a small (1ulp) error and so OK.

* If, for round-upward, this does not overflow (generates DBL_MAX as the 
result without FE_OVERFLOW raised), again that's a 1ulp error and so OK.

But the libm-test machinery isn't able to represent the information that a 
small error might result in overflowing when a correctly rounded result 
does not, or vice versa (a similar issue can also arise for underflow).  
And if the mathematical result in an overflow case is more than a few ulp 
above the overflow threshold, such a variation becomes a large error which 
is no longer OK.

We have a few cases close to the overflow threshold that are XFAILed in 
auto-libm-test-in (for lgamma, for example).  But XFAILing like that is 
problematic for "hypot max max", because that line generates lots of 
different tests, and only a few of them are appropriate to XFAIL.
  

Patch

diff --git a/sysdeps/i386/fpu/e_hypot.S b/sysdeps/i386/fpu/e_hypot.S
deleted file mode 100644
index f2c956b77a..0000000000
--- a/sysdeps/i386/fpu/e_hypot.S
+++ /dev/null
@@ -1,75 +0,0 @@ 
-/* Compute the hypothenuse of X and Y.
-   Copyright (C) 1998-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-#include <i386-math-asm.h>
-#include <libm-alias-finite.h>
-
-DEFINE_DBL_MIN
-
-#ifdef PIC
-# define MO(op) op##@GOTOFF(%edx)
-#else
-# define MO(op) op
-#endif
-
-	.text
-ENTRY(__ieee754_hypot)
-#ifdef  PIC
-	LOAD_PIC_REG (dx)
-#endif
-	fldl	4(%esp)		// x
-	fxam
-	fnstsw
-	fldl	12(%esp)	// y : x
-	movb	%ah, %ch
-	fxam
-	fnstsw
-	movb	%ah, %al
-	orb	%ch, %ah
-	sahf
-	jc	1f
-	fmul	%st(0)		// y * y : x
-	fxch			// x : y * y
-	fmul	%st(0)		// x * x : y * y
-	faddp			// x * x + y * y
-	fsqrt
-	DBL_NARROW_EVAL_UFLOW_NONNEG
-2:	ret
-
-	// We have to test whether any of the parameters is Inf.
-	// In this case the result is infinity.
-1:	andb	$0x45, %al
-	cmpb	$5, %al
-	je	3f		// jump if y is Inf
-	andb	$0x45, %ch
-	cmpb	$5, %ch
-	jne	4f		// jump if x is not Inf
-	fxch
-3:	fstp	%st(1)
-	fabs
-	jmp	2b
-
-4:	testb	$1, %al
-	jnz	5f		// y is NaN
-	fxch
-5:	fstp	%st(1)
-	jmp	2b
-
-END(__ieee754_hypot)
-libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/i386/fpu/e_hypot.c b/sysdeps/i386/fpu/e_hypot.c
new file mode 100644
index 0000000000..4920a8cb49
--- /dev/null
+++ b/sysdeps/i386/fpu/e_hypot.c
@@ -0,0 +1,42 @@ 
+/* Euclidean distance function.  Double/Binary64 i386 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <math.h>
+#include <math_private.h>
+#include <math-underflow.h>
+#include <math-narrow-eval.h>
+#include <libm-alias-finite.h>
+#include <math_config.h>
+
+/* The i386 allows ot use the default excess of precision to optimize the
+   hypot implementation, since internal multiplication and sqrt is carried
+   with 80-bit FP type.  */
+double
+__ieee754_hypot (double x, double y)
+{
+  if ((isinf (x) || isinf (y))
+      && !issignaling (x) && !issignaling (y))
+    return INFINITY;
+  if (isnan (x) || isnan (y))
+    return x + y;
+
+  double r = math_narrow_eval (sqrt (x * x + y * y));
+  math_check_force_underflow_nonneg (r);
+  return r;
+}
+libm_alias_finite (__ieee754_hypot, __hypot)
diff --git a/sysdeps/i386/fpu/e_hypotf.S b/sysdeps/i386/fpu/e_hypotf.S
deleted file mode 100644
index cec5d15403..0000000000
--- a/sysdeps/i386/fpu/e_hypotf.S
+++ /dev/null
@@ -1,64 +0,0 @@ 
-/* Compute the hypothenuse of X and Y.
-   Copyright (C) 1998-2021 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-#include <i386-math-asm.h>
-#include <libm-alias-finite.h>
-
-	.text
-ENTRY(__ieee754_hypotf)
-	flds	4(%esp)		// x
-	fxam
-	fnstsw
-	flds	8(%esp)		// y : x
-	movb	%ah, %ch
-	fxam
-	fnstsw
-	movb	%ah, %al
-	orb	%ch, %ah
-	sahf
-	jc	1f
-	fmul	%st(0)		// y * y : x
-	fxch			// x : y * y
-	fmul	%st(0)		// x * x : y * y
-	faddp			// x * x + y * y
-	fsqrt
-	FLT_NARROW_EVAL
-2:	ret
-
-	// We have to test whether any of the parameters is Inf.
-	// In this case the result is infinity.
-1:	andb	$0x45, %al
-	cmpb	$5, %al
-	je	3f		// jump if y is Inf
-	andb	$0x45, %ch
-	cmpb	$5, %ch
-	jne	4f		// jump if x is not Inf
-	fxch
-3:	fstp	%st(1)
-	fabs
-	jmp	2b
-
-4:	testb	$1, %al
-	jnz	5f		// y is NaN
-	fxch
-5:	fstp	%st(1)
-	jmp	2b
-
-END(__ieee754_hypotf)
-libm_alias_finite (__ieee754_hypotf, __hypotf)