LoongArch: Optimize math barriers

Message ID 20260507021740.16572-1-dengjianbo@loongson.cn (mailing list archive)
State New
Headers
Series LoongArch: Optimize math barriers |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Test passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Test passed

Commit Message

dengjianbo May 7, 2026, 2:17 a.m. UTC
  The constraints 'frm' used in math_opt_barrier and math_force_eval cause
GCC choose fixed-point registers with lower costs in some cases, because
in LoongArch ira_reg_class_subunion[FP_REGS][GR_REGS] is initialized to
GR_REGS, extra move costs will be added when choosing FP_REGS. This
results in unnecessary instructions to move values between FP_REGS and
GR_REGS.

Most of cases in GLIBC math barriers related macros are invoked with
floating-point type paramters, this patch removes "r" constraints,
allowing GCC to keep values in floating-point registers and avoid the
extra moves.

Example from xflow function before the change:
	movfr2gr.d      $t0, $fa0
	beqz            $a0, 12 # 69960 <xflow+0x10>
	fneg.d          $fa1, $fa0
	movfr2gr.d      $t0, $fa1
	movgr2fr.d      $fa1, $t0
	fmul.d          $fa0, $fa0, $fa1
	b               -56 # 69930 <with_errno.constprop.0>

After the patch:
	fmov.d          $fa1, $fa0
	beqz            $a0, 12 # 69ac0 <xflow+0x10>
	fneg.d          $fa1, $fa0
	nop
	fmul.d          $fa0, $fa0, $fa1
	b               -52 # 69a90 <with_errno.constprop.0>
---
 sysdeps/loongarch/fpu/math-barriers.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Andreas Schwab May 7, 2026, 6:46 a.m. UTC | #1
On Mai 07 2026, Deng Jianbo wrote:

> The constraints 'frm' used in math_opt_barrier and math_force_eval cause
> GCC choose fixed-point registers with lower costs in some cases, because
> in LoongArch ira_reg_class_subunion[FP_REGS][GR_REGS] is initialized to
> GR_REGS, extra move costs will be added when choosing FP_REGS. This
> results in unnecessary instructions to move values between FP_REGS and
> GR_REGS.

It looks like a bug in GCC (wrong cost calculations) if it generates
unnecessary moves when it has more choices.

> Most of cases in GLIBC math barriers related macros are invoked with
> floating-point type paramters, this patch removes "r" constraints,
> allowing GCC to keep values in floating-point registers and avoid the
> extra moves.

It does not *allow* that, it *forces* gcc to use an fp reg (or memory).
What if the value is in a general reg in the first place, wouldn't that
result in unnecessary moves?
  
dengjianbo May 7, 2026, 9:30 a.m. UTC | #2
Yes, unnecessary moves can occur if the value is already in a general
reg. For example, when invoking math_opt_barrier with fixed-point
parameters. However, in glibc, these two macros are invoked with
floating-point parameters, and I didn’t find the unnecessary moves
being generated. For other architectures, aarch64 and s390 also do
not include the 'r' constraint.

For gcc part, to my understanding, when 'fr' is present, GCC selects the
union of FP_REGS and GR_REGS as the destination register class, which is
initialized to GR_REGS. Then compute the cost of choosing different
register classes, choosing GR_REGS incurs no extra cost, while choosing
FP_REGS adds an extra move cost. I am not sure if it's a strategy here.

  On 5/7/26 2:46 PM, Andreas Schwab wrote:
> On Mai 07 2026, Deng Jianbo wrote:
> 
>> The constraints 'frm' used in math_opt_barrier and math_force_eval cause
>> GCC choose fixed-point registers with lower costs in some cases, because
>> in LoongArch ira_reg_class_subunion[FP_REGS][GR_REGS] is initialized to
>> GR_REGS, extra move costs will be added when choosing FP_REGS. This
>> results in unnecessary instructions to move values between FP_REGS and
>> GR_REGS.
> 
> It looks like a bug in GCC (wrong cost calculations) if it generates
> unnecessary moves when it has more choices.
> 
>> Most of cases in GLIBC math barriers related macros are invoked with
>> floating-point type paramters, this patch removes "r" constraints,
>> allowing GCC to keep values in floating-point registers and avoid the
>> extra moves.
> 
> It does not *allow* that, it *forces* gcc to use an fp reg (or memory).
> What if the value is in a general reg in the first place, wouldn't that
> result in unnecessary moves?
>
  
Xi Ruoyao May 8, 2026, 9:29 a.m. UTC | #3
On Thu, 2026-05-07 at 17:30 +0800, dengjianbo wrote:
> Yes, unnecessary moves can occur if the value is already in a general
> reg. For example, when invoking math_opt_barrier with fixed-point
> parameters. However, in glibc, these two macros are invoked with
> floating-point parameters, and I didn’t find the unnecessary moves
> being generated. For other architectures, aarch64 and s390 also do
> not include the 'r' constraint.
> 
> For gcc part, to my understanding, when 'fr' is present, GCC selects the
> union of FP_REGS and GR_REGS as the destination register class, which is
> initialized to GR_REGS. Then compute the cost of choosing different
> register classes, choosing GR_REGS incurs no extra cost, while choosing
> FP_REGS adds an extra move cost. I am not sure if it's a strategy here.

Where does this issue manifest (i.e. in which glibc source file)?  I'll
try to reduce a test case from that.  To me this seems a compiler bug
too.
>
  
dengjianbo May 8, 2026, 9:47 a.m. UTC | #4
On 5/8/26 5:29 PM, Xi Ruoyao wrote:
> On Thu, 2026-05-07 at 17:30 +0800, dengjianbo wrote:
>> Yes, unnecessary moves can occur if the value is already in a general
>> reg. For example, when invoking math_opt_barrier with fixed-point
>> parameters. However, in glibc, these two macros are invoked with
>> floating-point parameters, and I didn’t find the unnecessary moves
>> being generated. For other architectures, aarch64 and s390 also do
>> not include the 'r' constraint.
>>
>> For gcc part, to my understanding, when 'fr' is present, GCC selects the
>> union of FP_REGS and GR_REGS as the destination register class, which is
>> initialized to GR_REGS. Then compute the cost of choosing different
>> register classes, choosing GR_REGS incurs no extra cost, while choosing
>> FP_REGS adds an extra move cost. I am not sure if it's a strategy here.
> 
> Where does this issue manifest (i.e. in which glibc source file)?  I'll
> try to reduce a test case from that.  To me this seems a compiler bug
> too.
>>
> 
One example is the xflow function, compiler generates the unnecessary 
moves for this function, the source file is 
sysdeps/ieee754/dbl-64/math_err.c.
  
dengjianbo May 25, 2026, 1:09 a.m. UTC | #5
On 5/8/26 5:47 PM, dengjianbo wrote:
> 
> 
>   On 5/8/26 5:29 PM, Xi Ruoyao wrote:
>> On Thu, 2026-05-07 at 17:30 +0800, dengjianbo wrote:
>>> Yes, unnecessary moves can occur if the value is already in a general
>>> reg. For example, when invoking math_opt_barrier with fixed-point
>>> parameters. However, in glibc, these two macros are invoked with
>>> floating-point parameters, and I didn’t find the unnecessary moves
>>> being generated. For other architectures, aarch64 and s390 also do
>>> not include the 'r' constraint.
>>>
>>> For gcc part, to my understanding, when 'fr' is present, GCC selects the
>>> union of FP_REGS and GR_REGS as the destination register class, which is
>>> initialized to GR_REGS. Then compute the cost of choosing different
>>> register classes, choosing GR_REGS incurs no extra cost, while choosing
>>> FP_REGS adds an extra move cost. I am not sure if it's a strategy here.
>>
>> Where does this issue manifest (i.e. in which glibc source file)?  I'll
>> try to reduce a test case from that.  To me this seems a compiler bug
>> too.
>>>
>>
> One example is the xflow function, compiler generates the unnecessary 
> moves for this function, the source file is sysdeps/ieee754/dbl-64 
> /math_err.c.

Just a gentle ping on the patch, maybe the below test case can be used
to reproduce this problem.

#define math_opt_barrier(x)                 \
   ({ __typeof (x) __x = (x); __asm ("" : "+frm" (__x)); __x; })

double
test (double x)
{
   x = math_opt_barrier (x);
   return x;
}
  

Patch

diff --git a/sysdeps/loongarch/fpu/math-barriers.h b/sysdeps/loongarch/fpu/math-barriers.h
index 6a069ff41d..76835aa210 100644
--- a/sysdeps/loongarch/fpu/math-barriers.h
+++ b/sysdeps/loongarch/fpu/math-barriers.h
@@ -21,8 +21,8 @@ 
 
 /* Generic code forces values to memory; we don't need to do that.  */
 #define math_opt_barrier(x)					\
-  ({ __typeof (x) __x = (x); __asm ("" : "+frm" (__x)); __x; })
+  ({ __typeof (x) __x = (x); __asm ("" : "+fm" (__x)); __x; })
 #define math_force_eval(x)						\
-  ({ __typeof (x) __x = (x); __asm __volatile__ ("" : : "frm" (__x)); })
+  ({ __typeof (x) __x = (x); __asm __volatile__ ("" : : "fm" (__x)); })
 
 #endif /* math-barriers.h */