math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
Commit Message
This patch fixes the test fma fails seen on hppa with gcc-13 and
later.
There may be more instances of this problem still lurking in
sysdeps/ieee754/dbl-64/s_fma.c and other similar files. Whether
they occur in practice is very hardware dependent. I'm not able
to test changes to any other files on hppa.
Okay?
Dave
---
math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
A number of fma tests started to fail on hppa when gcc was changed to
use Ranger rather than EVRP. Eventually I found that the value of
a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
mode and not the original rounding mode:
if (TININESS_AFTER_ROUNDING)
{
w.d = a1 + u.d;
if (w.ieee.exponent == 109)
return w.d * 0x1p-108;
}
This caused the exponent value to be wrong and the wrong return path
to be used.
Here we add an optimization barrier after the rounding mode is reset
to ensure that the previous value of a1 + u.d is not reused.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Comments
Ping.
Dave
On Fri, Feb 07, 2025 at 12:00:30PM -0500, John David Anglin wrote:
> This patch fixes the test fma fails seen on hppa with gcc-13 and
> later.
>
> There may be more instances of this problem still lurking in
> sysdeps/ieee754/dbl-64/s_fma.c and other similar files. Whether
> they occur in practice is very hardware dependent. I'm not able
> to test changes to any other files on hppa.
>
> Okay?
>
> Dave
> ---
>
> math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
>
> A number of fma tests started to fail on hppa when gcc was changed to
> use Ranger rather than EVRP. Eventually I found that the value of
> a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
> mode and not the original rounding mode:
>
> if (TININESS_AFTER_ROUNDING)
> {
> w.d = a1 + u.d;
> if (w.ieee.exponent == 109)
> return w.d * 0x1p-108;
> }
>
> This caused the exponent value to be wrong and the wrong return path
> to be used.
>
> Here we add an optimization barrier after the rounding mode is reset
> to ensure that the previous value of a1 + u.d is not reused.
>
> Signed-off-by: John David Anglin <dave.anglin@bell.net>
>
> diff --git a/sysdeps/ieee754/dbl-64/s_fma.c b/sysdeps/ieee754/dbl-64/s_fma.c
> index 20f617b996..42351c6b34 100644
> --- a/sysdeps/ieee754/dbl-64/s_fma.c
> +++ b/sysdeps/ieee754/dbl-64/s_fma.c
> @@ -244,6 +244,9 @@ __fma (double x, double y, double z)
> /* Reset rounding mode and test for inexact simultaneously. */
> int j = libc_feupdateenv_test (&env, FE_INEXACT) != 0;
>
> + /* Ensure value of a1 + u.d is not reused. */
> + a1 = math_opt_barrier (a1);
> +
> if (__glibc_likely (adjust == 0))
> {
> if ((u.ieee.mantissa1 & 1) == 0 && u.ieee.exponent != 0x7ff)
On 07/02/25 14:00, John David Anglin wrote:
> This patch fixes the test fma fails seen on hppa with gcc-13 and
> later.
>
> There may be more instances of this problem still lurking in
> sysdeps/ieee754/dbl-64/s_fma.c and other similar files. Whether
> they occur in practice is very hardware dependent. I'm not able
> to test changes to any other files on hppa.
>
> Okay?
>
> Dave
> ---
>
> math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
>
> A number of fma tests started to fail on hppa when gcc was changed to
> use Ranger rather than EVRP. Eventually I found that the value of
> a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
> mode and not the original rounding mode:
>
> if (TININESS_AFTER_ROUNDING)
> {
> w.d = a1 + u.d;
> if (w.ieee.exponent == 109)
> return w.d * 0x1p-108;
> }
>
> This caused the exponent value to be wrong and the wrong return path
> to be used.
Is there some compiler annotation/attribute to mark libc_fesetround and
any other function as compiler barrier to avoid such issue?
Another possibility would to rewrite it to avoid FP operations.
>
> Here we add an optimization barrier after the rounding mode is reset
> to ensure that the previous value of a1 + u.d is not reused.
>
> Signed-off-by: John David Anglin <dave.anglin@bell.net>
LGTM, thanks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
>
> diff --git a/sysdeps/ieee754/dbl-64/s_fma.c b/sysdeps/ieee754/dbl-64/s_fma.c
> index 20f617b996..42351c6b34 100644
> --- a/sysdeps/ieee754/dbl-64/s_fma.c
> +++ b/sysdeps/ieee754/dbl-64/s_fma.c
> @@ -244,6 +244,9 @@ __fma (double x, double y, double z)
> /* Reset rounding mode and test for inexact simultaneously. */
> int j = libc_feupdateenv_test (&env, FE_INEXACT) != 0;
>
> + /* Ensure value of a1 + u.d is not reused. */
> + a1 = math_opt_barrier (a1);
> +
> if (__glibc_likely (adjust == 0))
> {
> if ((u.ieee.mantissa1 & 1) == 0 && u.ieee.exponent != 0x7ff)
On 2025-02-25 2:27 p.m., Adhemerval Zanella Netto wrote:
> On 07/02/25 14:00, John David Anglin wrote:
>> This patch fixes the test fma fails seen on hppa with gcc-13 and
>> later.
>>
>> There may be more instances of this problem still lurking in
>> sysdeps/ieee754/dbl-64/s_fma.c and other similar files. Whether
>> they occur in practice is very hardware dependent. I'm not able
>> to test changes to any other files on hppa.
>>
>> Okay?
>>
>> Dave
>> ---
>>
>> math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
>>
>> A number of fma tests started to fail on hppa when gcc was changed to
>> use Ranger rather than EVRP. Eventually I found that the value of
>> a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
>> mode and not the original rounding mode:
>>
>> if (TININESS_AFTER_ROUNDING)
>> {
>> w.d = a1 + u.d;
>> if (w.ieee.exponent == 109)
>> return w.d * 0x1p-108;
>> }
>>
>> This caused the exponent value to be wrong and the wrong return path
>> to be used.
> Is there some compiler annotation/attribute to mark libc_fesetround and
> any other function as compiler barrier to avoid such issue?
I'm not aware of a simple way to do this. The problem is the values of non
call-clobbered registers are preserved across calls.
"asm volatile ("" : : : "memory");" is a compiler scheduling barrier for all
expressions that load from or store to memory. For this to work, floating
point values would have to be forced to memory to complete their evaluation.
This is effectively what math_opt_barrier does but for a single argument.
It evaluates and returns its floating-point argument. This ensures that the
evaluation of any expression using the result of math_opt_barrier is not moved
before the barrier. This is probably the best solution from an optimization
standpoint but it is tricky to know which values need barriers.
> Another possibility would to rewrite it to avoid FP operations.
This probably would significantly degrade performance. It would be straight
forward to implement if soft add and multiply routines were available with
rounding support.
Dave
On 25/02/25 17:47, Dave Anglin wrote:
> On 2025-02-25 2:27 p.m., Adhemerval Zanella Netto wrote:
>> On 07/02/25 14:00, John David Anglin wrote:
>>> This patch fixes the test fma fails seen on hppa with gcc-13 and
>>> later.
>>>
>>> There may be more instances of this problem still lurking in
>>> sysdeps/ieee754/dbl-64/s_fma.c and other similar files. Whether
>>> they occur in practice is very hardware dependent. I'm not able
>>> to test changes to any other files on hppa.
>>>
>>> Okay?
>>>
>>> Dave
>>> ---
>>>
>>> math: Add optimization barrier to ensure a1 + u.d is not reused [BZ #30664]
>>>
>>> A number of fma tests started to fail on hppa when gcc was changed to
>>> use Ranger rather than EVRP. Eventually I found that the value of
>>> a1 + u.d in this is block of code was being computed in FE_TOWARDZERO
>>> mode and not the original rounding mode:
>>>
>>> if (TININESS_AFTER_ROUNDING)
>>> {
>>> w.d = a1 + u.d;
>>> if (w.ieee.exponent == 109)
>>> return w.d * 0x1p-108;
>>> }
>>>
>>> This caused the exponent value to be wrong and the wrong return path
>>> to be used.
>> Is there some compiler annotation/attribute to mark libc_fesetround and
>> any other function as compiler barrier to avoid such issue?
>
> I'm not aware of a simple way to do this. The problem is the values of non
> call-clobbered registers are preserved across calls.
>
> "asm volatile ("" : : : "memory");" is a compiler scheduling barrier for all
> expressions that load from or store to memory. For this to work, floating
> point values would have to be forced to memory to complete their evaluation.
> This is effectively what math_opt_barrier does but for a single argument.
>
> It evaluates and returns its floating-point argument. This ensures that the
> evaluation of any expression using the result of math_opt_barrier is not moved
> before the barrier. This is probably the best solution from an optimization
> standpoint but it is tricky to know which values need barriers.
It seems that glibc is not the first project to be bitten with such
issue [1], and while the C standard does have a way to prevent this
(FENV_ACCESS), it is not implemented by gcc and the bug report work
around is indeed to add math barriers.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678
>
>> Another possibility would to rewrite it to avoid FP operations.
>
> This probably would significantly degrade performance. It would be straight
> forward to implement if soft add and multiply routines were available with
> rounding support.
Last time I checked a integer based implementation (musl), it was faster
than glibc one even on recent hardware. But I don't think this is pressing
issue since now most hardware does have fma instructions.
On Wed, 26 Feb 2025, Adhemerval Zanella Netto wrote:
> Last time I checked a integer based implementation (musl), it was faster
> than glibc one even on recent hardware. But I don't think this is pressing
> issue since now most hardware does have fma instructions.
We have an implementation based on soft-fp (in sysdeps/ieee754/soft-fp/).
That does require a suitable sfp-machine.h, including the parts
integrating with hardware exceptions and rounding modes in the case of
architectures supporting those.
On 2025-02-26 4:37 p.m., Joseph Myers wrote:
> On Wed, 26 Feb 2025, Adhemerval Zanella Netto wrote:
>
>> Last time I checked a integer based implementation (musl), it was faster
>> than glibc one even on recent hardware. But I don't think this is pressing
>> issue since now most hardware does have fma instructions.
>
> We have an implementation based on soft-fp (in sysdeps/ieee754/soft-fp/).
> That does require a suitable sfp-machine.h, including the parts
> integrating with hardware exceptions and rounding modes in the case of
> architectures supporting those.
The is a hppa implementation of sfp-machine.h in libgcc. I'll test.
@@ -244,6 +244,9 @@ __fma (double x, double y, double z)
/* Reset rounding mode and test for inexact simultaneously. */
int j = libc_feupdateenv_test (&env, FE_INEXACT) != 0;
+ /* Ensure value of a1 + u.d is not reused. */
+ a1 = math_opt_barrier (a1);
+
if (__glibc_likely (adjust == 0))
{
if ((u.ieee.mantissa1 & 1) == 0 && u.ieee.exponent != 0x7ff)