math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]

Message ID Z6Y8Lc0n730rjRcr@mx3210.local (mailing list archive)
State Committed
Commit 2fe5e2af0995a6e6ee2c761e55e7596a3220d07c
Headers
Series math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664] |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed

Commit Message

John David Anglin Feb. 7, 2025, 5 p.m. UTC
  This patch fixes the test fma fails seen on hppa with gcc-13 and
later.

There may be more instances of this problem still lurking in
sysdeps/ieee754/dbl-64/s_fma.c and other similar files.  Whether
they occur in practice is very hardware dependent.  I'm not able
to test changes to any other files on hppa.

Okay?

Dave
---

math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]

A number of fma tests started to fail on hppa when gcc was changed to
use Ranger rather than EVRP.  Eventually I found that the value of
a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
mode and not the original rounding mode:

    if (TININESS_AFTER_ROUNDING)
      {
        w.d = a1 + u.d;
        if (w.ieee.exponent == 109)
          return w.d * 0x1p-108;
      }

This caused the exponent value to be wrong and the wrong return path
to be used.

Here we add an optimization barrier after the rounding mode is reset
to ensure that the previous value of a1 + u.d is not reused.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
  

Comments

John David Anglin Feb. 21, 2025, 4:19 p.m. UTC | #1
Ping.

Dave

On Fri, Feb 07, 2025 at 12:00:30PM -0500, John David Anglin wrote:
> This patch fixes the test fma fails seen on hppa with gcc-13 and
> later.
> 
> There may be more instances of this problem still lurking in
> sysdeps/ieee754/dbl-64/s_fma.c and other similar files.  Whether
> they occur in practice is very hardware dependent.  I'm not able
> to test changes to any other files on hppa.
> 
> Okay?
> 
> Dave
> ---
> 
> math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
> 
> A number of fma tests started to fail on hppa when gcc was changed to
> use Ranger rather than EVRP.  Eventually I found that the value of
> a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
> mode and not the original rounding mode:
> 
>     if (TININESS_AFTER_ROUNDING)
>       {
>         w.d = a1 + u.d;
>         if (w.ieee.exponent == 109)
>           return w.d * 0x1p-108;
>       }
> 
> This caused the exponent value to be wrong and the wrong return path
> to be used.
> 
> Here we add an optimization barrier after the rounding mode is reset
> to ensure that the previous value of a1 + u.d is not reused.
> 
> Signed-off-by: John David Anglin <dave.anglin@bell.net>
> 
> diff --git a/sysdeps/ieee754/dbl-64/s_fma.c b/sysdeps/ieee754/dbl-64/s_fma.c
> index 20f617b996..42351c6b34 100644
> --- a/sysdeps/ieee754/dbl-64/s_fma.c
> +++ b/sysdeps/ieee754/dbl-64/s_fma.c
> @@ -244,6 +244,9 @@ __fma (double x, double y, double z)
>    /* Reset rounding mode and test for inexact simultaneously.  */
>    int j = libc_feupdateenv_test (&env, FE_INEXACT) != 0;
>  
> +  /* Ensure value of a1 + u.d is not reused.  */
> +  a1 = math_opt_barrier (a1);
> +
>    if (__glibc_likely (adjust == 0))
>      {
>        if ((u.ieee.mantissa1 & 1) == 0 && u.ieee.exponent != 0x7ff)
  
Adhemerval Zanella Netto Feb. 25, 2025, 7:27 p.m. UTC | #2
On 07/02/25 14:00, John David Anglin wrote:
> This patch fixes the test fma fails seen on hppa with gcc-13 and
> later.
> 
> There may be more instances of this problem still lurking in
> sysdeps/ieee754/dbl-64/s_fma.c and other similar files.  Whether
> they occur in practice is very hardware dependent.  I'm not able
> to test changes to any other files on hppa.
> 
> Okay?
> 
> Dave
> ---
> 
> math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
> 
> A number of fma tests started to fail on hppa when gcc was changed to
> use Ranger rather than EVRP.  Eventually I found that the value of
> a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
> mode and not the original rounding mode:
> 
>     if (TININESS_AFTER_ROUNDING)
>       {
>         w.d = a1 + u.d;
>         if (w.ieee.exponent == 109)
>           return w.d * 0x1p-108;
>       }
> 
> This caused the exponent value to be wrong and the wrong return path
> to be used.

Is there some compiler annotation/attribute to mark libc_fesetround and
any other function as compiler barrier to avoid such issue?

Another possibility would to rewrite it to avoid FP operations.

> 
> Here we add an optimization barrier after the rounding mode is reset
> to ensure that the previous value of a1 + u.d is not reused.
> 
> Signed-off-by: John David Anglin <dave.anglin@bell.net>

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> 
> diff --git a/sysdeps/ieee754/dbl-64/s_fma.c b/sysdeps/ieee754/dbl-64/s_fma.c
> index 20f617b996..42351c6b34 100644
> --- a/sysdeps/ieee754/dbl-64/s_fma.c
> +++ b/sysdeps/ieee754/dbl-64/s_fma.c
> @@ -244,6 +244,9 @@ __fma (double x, double y, double z)
>    /* Reset rounding mode and test for inexact simultaneously.  */
>    int j = libc_feupdateenv_test (&env, FE_INEXACT) != 0;
>  
> +  /* Ensure value of a1 + u.d is not reused.  */
> +  a1 = math_opt_barrier (a1);
> +
>    if (__glibc_likely (adjust == 0))
>      {
>        if ((u.ieee.mantissa1 & 1) == 0 && u.ieee.exponent != 0x7ff)
  
Dave Anglin Feb. 25, 2025, 8:47 p.m. UTC | #3
On 2025-02-25 2:27 p.m., Adhemerval Zanella Netto wrote:
> On 07/02/25 14:00, John David Anglin wrote:
>> This patch fixes the test fma fails seen on hppa with gcc-13 and
>> later.
>>
>> There may be more instances of this problem still lurking in
>> sysdeps/ieee754/dbl-64/s_fma.c and other similar files.  Whether
>> they occur in practice is very hardware dependent.  I'm not able
>> to test changes to any other files on hppa.
>>
>> Okay?
>>
>> Dave
>> ---
>>
>> math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
>>
>> A number of fma tests started to fail on hppa when gcc was changed to
>> use Ranger rather than EVRP.  Eventually I found that the value of
>> a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
>> mode and not the original rounding mode:
>>
>>     if (TININESS_AFTER_ROUNDING)
>>       {
>>         w.d = a1 + u.d;
>>         if (w.ieee.exponent == 109)
>>           return w.d * 0x1p-108;
>>       }
>>
>> This caused the exponent value to be wrong and the wrong return path
>> to be used.
> Is there some compiler annotation/attribute to mark libc_fesetround and
> any other function as compiler barrier to avoid such issue?

I'm not aware of a simple way to do this.  The problem is the values of non
call-clobbered registers are preserved across calls.

"asm volatile ("" : : : "memory");" is a compiler scheduling barrier for all
expressions that load from or store to memory.  For this to work, floating
point values would have to be forced to memory to complete their evaluation.
This is effectively what math_opt_barrier does but for a single argument.

It evaluates and returns its floating-point argument.  This ensures that the
evaluation of any expression using the result of math_opt_barrier is not moved
before the barrier.  This is probably the best solution from an optimization
standpoint but it is tricky to know which values need barriers.

> Another possibility would to rewrite it to avoid FP operations.

This probably would significantly degrade performance.  It would be straight
forward to implement if soft add and multiply routines were available with
rounding support.

Dave
  
Adhemerval Zanella Netto Feb. 26, 2025, 7:35 p.m. UTC | #4
On 25/02/25 17:47, Dave Anglin wrote:
> On 2025-02-25 2:27 p.m., Adhemerval Zanella Netto wrote:
>> On 07/02/25 14:00, John David Anglin wrote:
>>> This patch fixes the test fma fails seen on hppa with gcc-13 and
>>> later.
>>>
>>> There may be more instances of this problem still lurking in
>>> sysdeps/ieee754/dbl-64/s_fma.c and other similar files.  Whether
>>> they occur in practice is very hardware dependent.  I'm not able
>>> to test changes to any other files on hppa.
>>>
>>> Okay?
>>>
>>> Dave
>>> ---
>>>
>>> math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
>>>
>>> A number of fma tests started to fail on hppa when gcc was changed to
>>> use Ranger rather than EVRP.  Eventually I found that the value of
>>> a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
>>> mode and not the original rounding mode:
>>>
>>>     if (TININESS_AFTER_ROUNDING)
>>>       {
>>>         w.d = a1 + u.d;
>>>         if (w.ieee.exponent == 109)
>>>           return w.d * 0x1p-108;
>>>       }
>>>
>>> This caused the exponent value to be wrong and the wrong return path
>>> to be used.
>> Is there some compiler annotation/attribute to mark libc_fesetround and
>> any other function as compiler barrier to avoid such issue?
> 
> I'm not aware of a simple way to do this.  The problem is the values of non
> call-clobbered registers are preserved across calls.
> 
> "asm volatile ("" : : : "memory");" is a compiler scheduling barrier for all
> expressions that load from or store to memory.  For this to work, floating
> point values would have to be forced to memory to complete their evaluation.
> This is effectively what math_opt_barrier does but for a single argument.
> 
> It evaluates and returns its floating-point argument.  This ensures that the
> evaluation of any expression using the result of math_opt_barrier is not moved
> before the barrier.  This is probably the best solution from an optimization
> standpoint but it is tricky to know which values need barriers.

It seems that glibc is not the first project to be bitten with such 
issue [1], and while the C standard does have a way to prevent this
(FENV_ACCESS), it is not implemented by gcc and the bug report work 
around is indeed to add math barriers.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678

> 
>> Another possibility would to rewrite it to avoid FP operations.
> 
> This probably would significantly degrade performance.  It would be straight
> forward to implement if soft add and multiply routines were available with
> rounding support.

Last time I checked a integer based implementation (musl), it was faster
than glibc one even on recent hardware.  But I don't think this is pressing
issue since now most hardware does have fma instructions.
  
Joseph Myers Feb. 26, 2025, 9:37 p.m. UTC | #5
On Wed, 26 Feb 2025, Adhemerval Zanella Netto wrote:

> Last time I checked a integer based implementation (musl), it was faster
> than glibc one even on recent hardware.  But I don't think this is pressing
> issue since now most hardware does have fma instructions.

We have an implementation based on soft-fp (in sysdeps/ieee754/soft-fp/).  
That does require a suitable sfp-machine.h, including the parts 
integrating with hardware exceptions and rounding modes in the case of 
architectures supporting those.
  
Dave Anglin Feb. 26, 2025, 10:17 p.m. UTC | #6
On 2025-02-26 4:37 p.m., Joseph Myers wrote:
> On Wed, 26 Feb 2025, Adhemerval Zanella Netto wrote:
> 
>> Last time I checked a integer based implementation (musl), it was faster
>> than glibc one even on recent hardware.  But I don't think this is pressing
>> issue since now most hardware does have fma instructions.
> 
> We have an implementation based on soft-fp (in sysdeps/ieee754/soft-fp/).  
> That does require a suitable sfp-machine.h, including the parts 
> integrating with hardware exceptions and rounding modes in the case of 
> architectures supporting those.

The is a hppa implementation of sfp-machine.h in libgcc.  I'll test.
  

Patch

diff --git a/sysdeps/ieee754/dbl-64/s_fma.c b/sysdeps/ieee754/dbl-64/s_fma.c
index 20f617b996..42351c6b34 100644
--- a/sysdeps/ieee754/dbl-64/s_fma.c
+++ b/sysdeps/ieee754/dbl-64/s_fma.c
@@ -244,6 +244,9 @@  __fma (double x, double y, double z)
   /* Reset rounding mode and test for inexact simultaneously.  */
   int j = libc_feupdateenv_test (&env, FE_INEXACT) != 0;
 
+  /* Ensure value of a1 + u.d is not reused.  */
+  a1 = math_opt_barrier (a1);
+
   if (__glibc_likely (adjust == 0))
     {
       if ((u.ieee.mantissa1 & 1) == 0 && u.ieee.exponent != 0x7ff)