[v2] fma vs gcc 4.9

Message ID	53C701C0.70503@twiddle.net
State	Superseded
Headers	Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org Message-ID: <53C701C0.70503@twiddle.net> Date: Wed, 16 Jul 2014 15:50:40 -0700 From: Richard Henderson <rth@twiddle.net> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: "Joseph S. Myers" <joseph@codesourcery.com> CC: libc-alpha <libc-alpha@sourceware.org> Subject: [PATCH v2] fma vs gcc 4.9 References: <53B59CDF.1010604@twiddle.net> <Pine.LNX.4.64.1407162054010.22313@digraph.polyomino.org.uk> In-Reply-To: <Pine.LNX.4.64.1407162054010.22313@digraph.polyomino.org.uk> Content-Type: multipart/mixed; boundary="------------090007040004060407090509"

Message ID

53C701C0.70503@twiddle.net

State

Superseded

Headers

Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
Message-ID: <53C701C0.70503@twiddle.net>
Date: Wed, 16 Jul 2014 15:50:40 -0700
From: Richard Henderson <rth@twiddle.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: "Joseph S. Myers" <joseph@codesourcery.com>
CC: libc-alpha <libc-alpha@sourceware.org>
Subject: [PATCH v2] fma vs gcc 4.9
References: <53B59CDF.1010604@twiddle.net>
	<Pine.LNX.4.64.1407162054010.22313@digraph.polyomino.org.uk>
In-Reply-To: <Pine.LNX.4.64.1407162054010.22313@digraph.polyomino.org.uk>
Content-Type: multipart/mixed;
	boundary="------------090007040004060407090509"

Commit Message

Richard Henderson July 16, 2014, 10:50 p.m. UTC

  On 07/16/2014 01:56 PM, Joseph S. Myers wrote:
> On Thu, 3 Jul 2014, Richard Henderson wrote:
> 
>> It seems to me that there's a typo on that exact zero test: a2 should be used,
>> not m2.  Correct, or have I mis-read the code?
> 
> The existing exact zero test seems correct to me.  The conditions for such 
> an exact zero are that the result of the multiplication is exactly 
> representable in 53 bits (i.e., m2 == 0, with m1 being the exact result of 
> the multiplication), and that the result of the addition (of z to the high 
> part of the multipliation result) is an exact zero (i.e. a1 == 0, which 
> implies a2 == 0).
> 

Thanks.  This second version, then, does not s/m2/a2/, but merely adds the
appropriate barriers.  It also makes sure that m2 is complete before clearing
inexact, even though I saw no evidence of it being scheduled after the call.

Ok?


r~

Comments

Joseph Myers July 17, 2014, 2:12 p.m. UTC | #1

On Wed, 16 Jul 2014, Richard Henderson wrote:

> On 07/16/2014 01:56 PM, Joseph S. Myers wrote:
> > On Thu, 3 Jul 2014, Richard Henderson wrote:
> > 
> >> It seems to me that there's a typo on that exact zero test: a2 should be used,
> >> not m2.  Correct, or have I mis-read the code?
> > 
> > The existing exact zero test seems correct to me.  The conditions for such 
> > an exact zero are that the result of the multiplication is exactly 
> > representable in 53 bits (i.e., m2 == 0, with m1 being the exact result of 
> > the multiplication), and that the result of the addition (of z to the high 
> > part of the multipliation result) is an exact zero (i.e. a1 == 0, which 
> > implies a2 == 0).
> > 
> 
> Thanks.  This second version, then, does not s/m2/a2/, but merely adds the
> appropriate barriers.  It also makes sure that m2 is complete before clearing
> inexact, even though I saw no evidence of it being scheduled after the call.
> 
> Ok?

It seems ldbl-96/s_fma.c, ldbl-96/s_fmal.c and ldbl-128/s_fmal.c could all 
have the same issue and so should have corresponding changes made.

diff mbox

Patch

diff --git a/sysdeps/ieee754/dbl-64/s_fma.c b/sysdeps/ieee754/dbl-64/s_fma.c
index 389acd4..77065aa 100644
--- a/sysdeps/ieee754/dbl-64/s_fma.c
+++ b/sysdeps/ieee754/dbl-64/s_fma.c
@@ -198,16 +198,17 @@  __fma (double x, double y, double z)
   t1 = m1 - t1;
   t2 = z - t2;
   double a2 = t1 + t2;
+  /* Ensure the arithmetic is not scheduled after feclearexcept call.  */
+  math_force_eval (m2);
+  math_force_eval (a2);
   feclearexcept (FE_INEXACT);
 
-  /* If the result is an exact zero, ensure it has the correct
-     sign.  */
+  /* If the result is an exact zero, ensure it has the correct sign.  */
   if (a1 == 0 && m2 == 0)
     {
       libc_feupdateenv (&env);
-      /* Ensure that round-to-nearest value of z + m1 is not
-	 reused.  */
-      asm volatile ("" : "=m" (z) : "m" (z));
+      /* Ensure that round-to-nearest value of z + m1 is not reused.  */
+      z = math_opt_barrier (z);
       return z + m1;
     }