LoongArch: Optimize math barriers
Checks
| Context |
Check |
Description |
| redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
| redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
| linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Test passed
|
Commit Message
The constraints 'frm' used in math_opt_barrier and math_force_eval cause
GCC choose fixed-point registers with lower costs in some cases, because
in LoongArch ira_reg_class_subunion[FP_REGS][GR_REGS] is initialized to
GR_REGS, extra move costs will be added when choosing FP_REGS. This
results in unnecessary instructions to move values between FP_REGS and
GR_REGS.
Most of cases in GLIBC math barriers related macros are invoked with
floating-point type paramters, this patch removes "r" constraints,
allowing GCC to keep values in floating-point registers and avoid the
extra moves.
Example from xflow function before the change:
movfr2gr.d $t0, $fa0
beqz $a0, 12 # 69960 <xflow+0x10>
fneg.d $fa1, $fa0
movfr2gr.d $t0, $fa1
movgr2fr.d $fa1, $t0
fmul.d $fa0, $fa0, $fa1
b -56 # 69930 <with_errno.constprop.0>
After the patch:
fmov.d $fa1, $fa0
beqz $a0, 12 # 69ac0 <xflow+0x10>
fneg.d $fa1, $fa0
nop
fmul.d $fa0, $fa0, $fa1
b -52 # 69a90 <with_errno.constprop.0>
---
sysdeps/loongarch/fpu/math-barriers.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On Mai 07 2026, Deng Jianbo wrote:
> The constraints 'frm' used in math_opt_barrier and math_force_eval cause
> GCC choose fixed-point registers with lower costs in some cases, because
> in LoongArch ira_reg_class_subunion[FP_REGS][GR_REGS] is initialized to
> GR_REGS, extra move costs will be added when choosing FP_REGS. This
> results in unnecessary instructions to move values between FP_REGS and
> GR_REGS.
It looks like a bug in GCC (wrong cost calculations) if it generates
unnecessary moves when it has more choices.
> Most of cases in GLIBC math barriers related macros are invoked with
> floating-point type paramters, this patch removes "r" constraints,
> allowing GCC to keep values in floating-point registers and avoid the
> extra moves.
It does not *allow* that, it *forces* gcc to use an fp reg (or memory).
What if the value is in a general reg in the first place, wouldn't that
result in unnecessary moves?
Yes, unnecessary moves can occur if the value is already in a general
reg. For example, when invoking math_opt_barrier with fixed-point
parameters. However, in glibc, these two macros are invoked with
floating-point parameters, and I didn’t find the unnecessary moves
being generated. For other architectures, aarch64 and s390 also do
not include the 'r' constraint.
For gcc part, to my understanding, when 'fr' is present, GCC selects the
union of FP_REGS and GR_REGS as the destination register class, which is
initialized to GR_REGS. Then compute the cost of choosing different
register classes, choosing GR_REGS incurs no extra cost, while choosing
FP_REGS adds an extra move cost. I am not sure if it's a strategy here.
On 5/7/26 2:46 PM, Andreas Schwab wrote:
> On Mai 07 2026, Deng Jianbo wrote:
>
>> The constraints 'frm' used in math_opt_barrier and math_force_eval cause
>> GCC choose fixed-point registers with lower costs in some cases, because
>> in LoongArch ira_reg_class_subunion[FP_REGS][GR_REGS] is initialized to
>> GR_REGS, extra move costs will be added when choosing FP_REGS. This
>> results in unnecessary instructions to move values between FP_REGS and
>> GR_REGS.
>
> It looks like a bug in GCC (wrong cost calculations) if it generates
> unnecessary moves when it has more choices.
>
>> Most of cases in GLIBC math barriers related macros are invoked with
>> floating-point type paramters, this patch removes "r" constraints,
>> allowing GCC to keep values in floating-point registers and avoid the
>> extra moves.
>
> It does not *allow* that, it *forces* gcc to use an fp reg (or memory).
> What if the value is in a general reg in the first place, wouldn't that
> result in unnecessary moves?
>
On Thu, 2026-05-07 at 17:30 +0800, dengjianbo wrote:
> Yes, unnecessary moves can occur if the value is already in a general
> reg. For example, when invoking math_opt_barrier with fixed-point
> parameters. However, in glibc, these two macros are invoked with
> floating-point parameters, and I didn’t find the unnecessary moves
> being generated. For other architectures, aarch64 and s390 also do
> not include the 'r' constraint.
>
> For gcc part, to my understanding, when 'fr' is present, GCC selects the
> union of FP_REGS and GR_REGS as the destination register class, which is
> initialized to GR_REGS. Then compute the cost of choosing different
> register classes, choosing GR_REGS incurs no extra cost, while choosing
> FP_REGS adds an extra move cost. I am not sure if it's a strategy here.
Where does this issue manifest (i.e. in which glibc source file)? I'll
try to reduce a test case from that. To me this seems a compiler bug
too.
>
On 5/8/26 5:29 PM, Xi Ruoyao wrote:
> On Thu, 2026-05-07 at 17:30 +0800, dengjianbo wrote:
>> Yes, unnecessary moves can occur if the value is already in a general
>> reg. For example, when invoking math_opt_barrier with fixed-point
>> parameters. However, in glibc, these two macros are invoked with
>> floating-point parameters, and I didn’t find the unnecessary moves
>> being generated. For other architectures, aarch64 and s390 also do
>> not include the 'r' constraint.
>>
>> For gcc part, to my understanding, when 'fr' is present, GCC selects the
>> union of FP_REGS and GR_REGS as the destination register class, which is
>> initialized to GR_REGS. Then compute the cost of choosing different
>> register classes, choosing GR_REGS incurs no extra cost, while choosing
>> FP_REGS adds an extra move cost. I am not sure if it's a strategy here.
>
> Where does this issue manifest (i.e. in which glibc source file)? I'll
> try to reduce a test case from that. To me this seems a compiler bug
> too.
>>
>
One example is the xflow function, compiler generates the unnecessary
moves for this function, the source file is
sysdeps/ieee754/dbl-64/math_err.c.
On 5/8/26 5:47 PM, dengjianbo wrote:
>
>
> On 5/8/26 5:29 PM, Xi Ruoyao wrote:
>> On Thu, 2026-05-07 at 17:30 +0800, dengjianbo wrote:
>>> Yes, unnecessary moves can occur if the value is already in a general
>>> reg. For example, when invoking math_opt_barrier with fixed-point
>>> parameters. However, in glibc, these two macros are invoked with
>>> floating-point parameters, and I didn’t find the unnecessary moves
>>> being generated. For other architectures, aarch64 and s390 also do
>>> not include the 'r' constraint.
>>>
>>> For gcc part, to my understanding, when 'fr' is present, GCC selects the
>>> union of FP_REGS and GR_REGS as the destination register class, which is
>>> initialized to GR_REGS. Then compute the cost of choosing different
>>> register classes, choosing GR_REGS incurs no extra cost, while choosing
>>> FP_REGS adds an extra move cost. I am not sure if it's a strategy here.
>>
>> Where does this issue manifest (i.e. in which glibc source file)? I'll
>> try to reduce a test case from that. To me this seems a compiler bug
>> too.
>>>
>>
> One example is the xflow function, compiler generates the unnecessary
> moves for this function, the source file is sysdeps/ieee754/dbl-64
> /math_err.c.
Just a gentle ping on the patch, maybe the below test case can be used
to reproduce this problem.
#define math_opt_barrier(x) \
({ __typeof (x) __x = (x); __asm ("" : "+frm" (__x)); __x; })
double
test (double x)
{
x = math_opt_barrier (x);
return x;
}
@@ -21,8 +21,8 @@
/* Generic code forces values to memory; we don't need to do that. */
#define math_opt_barrier(x) \
- ({ __typeof (x) __x = (x); __asm ("" : "+frm" (__x)); __x; })
+ ({ __typeof (x) __x = (x); __asm ("" : "+fm" (__x)); __x; })
#define math_force_eval(x) \
- ({ __typeof (x) __x = (x); __asm __volatile__ ("" : : "frm" (__x)); })
+ ({ __typeof (x) __x = (x); __asm __volatile__ ("" : : "fm" (__x)); })
#endif /* math-barriers.h */