[RFC] fma vs gcc 4.9

Message ID	53B59CDF.1010604@twiddle.net
State	Superseded
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org Message-ID: <53B59CDF.1010604@twiddle.net> Date: Thu, 03 Jul 2014 11:11:43 -0700 From: Richard Henderson <rth@twiddle.net> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: libc-alpha <libc-alpha@sourceware.org> CC: "Joseph S. Myers" <joseph@codesourcery.com> Subject: [PATCH, RFC] fma vs gcc 4.9 Content-Type: multipart/mixed; boundary="------------030309080500080803020301"

Message ID

53B59CDF.1010604@twiddle.net

State

Superseded

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
Message-ID: <53B59CDF.1010604@twiddle.net>
Date: Thu, 03 Jul 2014 11:11:43 -0700
From: Richard Henderson <rth@twiddle.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: libc-alpha <libc-alpha@sourceware.org>
CC: "Joseph S. Myers" <joseph@codesourcery.com>
Subject: [PATCH, RFC] fma vs gcc 4.9
Content-Type: multipart/mixed;
	boundary="------------030309080500080803020301"

Commit Message

Richard Henderson July 3, 2014, 6:11 p.m. UTC

  GCC 4.9 optimizes

  double a1 = z + m1;
  double t1 = a1 - z;
  double t2 = a1 - t1;
  t1 = m1 - t1;
  t2 = z - t2;
  double a2 = t1 + t2;
  feclearexcept (FE_INEXACT);

  if (a1 == 0 && m2 == 0)
    ... return ...

into

  double a1 = z + m1;
  feclearexcept (FE_INEXACT);

  if (a1 == 0 && m2 == 0)
    ... return ...

  double t1 = a1 - z;
  double t2 = a1 - t1;
  t1 = m1 - t1;
  t2 = z - t2;
  double a2 = t1 + t2;

because the later computation is partially redundant for the path leading to
the return.  I noticed this because, of course, the moved computation raises
INEXACT, leading to incorrect results.

At first I was simply going to add a math_force_eval to make sure everything is
complete before the feclearexcept, but on further reflection it seemed odd that
most of the computation is in fact redundant.

It seems to me that there's a typo on that exact zero test: a2 should be used,
not m2.  Correct, or have I mis-read the code?


r~

Comments

Richard Henderson July 10, 2014, 4:14 p.m. UTC | #1

Ping?

On 07/03/2014 11:11 AM, Richard Henderson wrote:
> GCC 4.9 optimizes
> 
>   double a1 = z + m1;
>   double t1 = a1 - z;
>   double t2 = a1 - t1;
>   t1 = m1 - t1;
>   t2 = z - t2;
>   double a2 = t1 + t2;
>   feclearexcept (FE_INEXACT);
> 
>   if (a1 == 0 && m2 == 0)
>     ... return ...
> 
> into
> 
>   double a1 = z + m1;
>   feclearexcept (FE_INEXACT);
> 
>   if (a1 == 0 && m2 == 0)
>     ... return ...
> 
>   double t1 = a1 - z;
>   double t2 = a1 - t1;
>   t1 = m1 - t1;
>   t2 = z - t2;
>   double a2 = t1 + t2;
> 
> because the later computation is partially redundant for the path leading to
> the return.  I noticed this because, of course, the moved computation raises
> INEXACT, leading to incorrect results.
> 
> At first I was simply going to add a math_force_eval to make sure everything is
> complete before the feclearexcept, but on further reflection it seemed odd that
> most of the computation is in fact redundant.
> 
> It seems to me that there's a typo on that exact zero test: a2 should be used,
> not m2.  Correct, or have I mis-read the code?
> 
> 
> r~
>

Joseph Myers July 16, 2014, 8:56 p.m. UTC | #2

On Thu, 3 Jul 2014, Richard Henderson wrote:

> It seems to me that there's a typo on that exact zero test: a2 should be used,
> not m2.  Correct, or have I mis-read the code?

The existing exact zero test seems correct to me.  The conditions for such 
an exact zero are that the result of the multiplication is exactly 
representable in 53 bits (i.e., m2 == 0, with m1 being the exact result of 
the multiplication), and that the result of the addition (of z to the high 
part of the multipliation result) is an exact zero (i.e. a1 == 0, which 
implies a2 == 0).

diff mbox

Patch

diff --git a/ChangeLog b/ChangeLog
index 5eb0d43..4a3e672 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@ 
 2014-07-03  Richard Henderson  <rth@redhat.com>
 
+	* sysdeps/ieee754/dbl-64/s_fma.c (__fma): Use math_force_eval before
+	feclearexcept; use math_opt_barrier instead of open-coded asm; fix
+	typo in exact zero test.
+
 	* sysdeps/alpha/fpu/s_nearbyintf.c: Remove file.
 	* sysdeps/alpha/fpu/s_nearbyint.c (__nearbyint): Remove;
 	include sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c.
diff --git a/sysdeps/ieee754/dbl-64/s_fma.c b/sysdeps/ieee754/dbl-64/s_fma.c
index 389acd4..3bd7f71 100644
--- a/sysdeps/ieee754/dbl-64/s_fma.c
+++ b/sysdeps/ieee754/dbl-64/s_fma.c
@@ -198,16 +198,16 @@  __fma (double x, double y, double z)
   t1 = m1 - t1;
   t2 = z - t2;
   double a2 = t1 + t2;
+  /* Ensure the addition is not scheduled after feclearexcept call.  */
+  math_force_eval (a2);
   feclearexcept (FE_INEXACT);
 
-  /* If the result is an exact zero, ensure it has the correct
-     sign.  */
-  if (a1 == 0 && m2 == 0)
+  /* If the result is an exact zero, ensure it has the correct sign.  */
+  if (a1 == 0 && a2 == 0)
     {
       libc_feupdateenv (&env);
-      /* Ensure that round-to-nearest value of z + m1 is not
-	 reused.  */
-      asm volatile ("" : "=m" (z) : "m" (z));
+      /* Ensure that round-to-nearest value of z + m1 is not reused.  */
+      z = math_opt_barrier (z);
       return z + m1;
     }