LoongArch: Stop -mfpu from silently breaking ABI

Message ID 20230302160122.47573-1-xry111@xry111.site
State New
Headers
Series LoongArch: Stop -mfpu from silently breaking ABI |

Commit Message

Xi Ruoyao March 2, 2023, 4:01 p.m. UTC
  In the toolchain convention, we describe -mfpu= as:

"Selects the allowed set of basic floating-point instructions and
registers. This option should not change the FP calling convention
unless it's necessary."

Though not explicitly stated, the rationale of this rule is to allow
combinations like "-mabi=lp64s -mfpu=64".  This will be useful for
running applications with LP64S/F ABI on a double-float-capable
LoongArch hardware and using a math library with LP64S/F ABI but native
double float HW instructions, for a better performance.

And now a case in Linux kernel has again proven the usefulness of this
kind of combination.  The AMDGPU DCN kernel driver needs to perform some
floating-point operation, but the entire kernel uses LP64S ABI.  So the
translation units of the AMDGPU DCN driver need to be compiled with
-mfpu=64 (the kernel lacks soft-FP routines in libgcc), but -mabi=lp64s
(or you can't link it with the other part of the kernel).

Unfortunately, currently GCC uses TARGET_{HARD,SOFT,DOUBLE}_FLOAT to
determine the floating calling convention.  This causes "-mfpu=64"
silently allow using $fa* to pass parameters and return values EVEN IF
-mabi=lp64s is used.  To make things worse, the generated object file
has SOFT-FLOAT set in the eflags field so the linker will happily link
it with other LP64S ABI object files, but obviously this will lead to
bad results at runtime.

The fix is simple: use TARGET_*_FLOAT_ABI instead.  But then it causes
"-mabi=lp64s -march=loongarch64" to generate code like:

  movgr2fr.d $fa0, $a0
  frecip.d   $fa0, $fa0
  movfr2gr.d $a0, $fa0

The problem here is "loongarch64" is never strictly defined.  So we
consider "loongarch64" a "64-bit LoongArch CPU with the simplest FPU
needed by the ABI", and if -march=loongarch64 but -mfpu is not
explicitly used, we set -mfpu such a simplest one.

I consider this a bug fix: the behavior difference from the toolchain
convention doc is a bug, and generating object files with SOFT-FLOAT
flag but parameters/return values passed through FPRs is definitely a
bug.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?  I'm
not sure if it's a good idea to backport this into gcc-12 though.

gcc/ChangeLog:

	* config/loongarch/loongarch.h (FP_RETURN): Use
	TARGET_*_FLOAT_ABI instead of TARGET_*_FLOAT.
	(UNITS_PER_FP_ARG): Likewise.
	* config/loongarch/loongarch-opts.cc (loongarch_config_target):
	If -march=loongarch64 and -mfpu not explicitly used, guess FPU
	capability from ABI.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/flt-abi-isa-1.c: New test.
	* gcc.target/loongarch/flt-abi-isa-2.c: New test.
	* gcc.target/loongarch/flt-abi-isa-3.c: New test.
	* gcc.target/loongarch/flt-abi-isa-4.c: New test.
	* gcc.target/loongarch/flt-abi-isa-5.c: New test.
	* gcc.target/loongarch/flt-abi-isa-6.c: New test.
	* gcc.target/loongarch/flt-abi-isa-7.c: New test.
	* gcc.target/loongarch/flt-abi-isa-8.c: New test.
	* gcc.target/loongarch/flt-abi-isa-9.c: New test.
	* gcc.target/loongarch/flt-abi-isa-10.c: New test.
---
 gcc/config/loongarch/loongarch-opts.cc         | 18 ++++++++++++++++++
 gcc/config/loongarch/loongarch.h               |  4 ++--
 .../gcc.target/loongarch/flt-abi-isa-1.c       | 12 ++++++++++++
 .../gcc.target/loongarch/flt-abi-isa-10.c      |  7 +++++++
 .../gcc.target/loongarch/flt-abi-isa-2.c       | 11 +++++++++++
 .../gcc.target/loongarch/flt-abi-isa-3.c       | 11 +++++++++++
 .../gcc.target/loongarch/flt-abi-isa-4.c       | 12 ++++++++++++
 .../gcc.target/loongarch/flt-abi-isa-5.c       |  7 +++++++
 .../gcc.target/loongarch/flt-abi-isa-6.c       | 11 +++++++++++
 .../gcc.target/loongarch/flt-abi-isa-7.c       |  5 +++++
 .../gcc.target/loongarch/flt-abi-isa-8.c       |  7 +++++++
 .../gcc.target/loongarch/flt-abi-isa-9.c       |  7 +++++++
 12 files changed, 110 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c
  

Comments

Yang Yujie March 3, 2023, 2:12 a.m. UTC | #1
On Fri, Mar 03, 2023 at 12:01:22AM +0800, Xi Ruoyao via Gcc-patches wrote:
> But then it causes "-mabi=lp64s -march=loongarch64" to generate code like:
> 
>   movgr2fr.d $fa0, $a0
>   frecip.d   $fa0, $fa0
>   movfr2gr.d $a0, $fa0
> 
> The problem here is "loongarch64" is never strictly defined.  So we
> consider "loongarch64" a "64-bit LoongArch CPU with the simplest FPU
> needed by the ABI", and if -march=loongarch64 but -mfpu is not
> explicitly used, we set -mfpu such a simplest one.

Thanks for the fix on TARGET_*_FLOAT_ABI usage! Certainly more testing
needs to be done on the soft-float side.

However, "loongarch64" is defined to include the "fpu64" ISA module[1]
(i.e. enable "-mfpu=64" when -mfpu is not explicitly used). So I believe
the above behavior you observed is expected.

[1] Table 5. Target CPU Models,
https://loongson.github.io/LoongArch-Documentation/LoongArch-toolchain-conventions-EN.html
  
Xi Ruoyao March 3, 2023, 4:06 a.m. UTC | #2
On Fri, 2023-03-03 at 10:12 +0800, Yujie Yang wrote:
> However, "loongarch64" is defined to include the "fpu64" ISA module[1]
> (i.e. enable "-mfpu=64" when -mfpu is not explicitly used). So I believe
> the above behavior you observed is expected.

Ah this make things much simpler and aligns with my gut feeling :).  I
can drop the change in loongarch-opts.cc now.  And the smaller changeset
also makes me more confident about a backport to gcc-12.

V2 patch is being tested and I'll send it after the testing.  Meanwhile
I created PR109000 to track the issue.
  

Patch

diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc
index a52e25236ea..bea77da93e9 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -251,6 +251,24 @@  config_target_isa:
     ((t.cpu_arch == CPU_NATIVE && constrained.arch) ?
      t.isa.fpu : DEFAULT_ISA_EXT_FPU);
 
+  /* "loongarch64" is not really strictly defined: which FPU does it have?
+     So if -march=loongarch64 and -mfpu not explicitly provided, use the
+     minimal -mfpu setting suitable for the ABI.  */
+  if (t.cpu_arch == CPU_LOONGARCH64 && !constrained.fpu)
+    switch (t.abi.base)
+      {
+	case ABI_BASE_LP64D:
+	  t.isa.fpu = ISA_EXT_FPU64;
+	  break;
+	case ABI_BASE_LP64F:
+	  t.isa.fpu = ISA_EXT_FPU32;
+	  break;
+	case ABI_BASE_LP64S:
+	  t.isa.fpu = ISA_EXT_NOFPU;
+	  break;
+	default:
+	  gcc_unreachable ();
+      }
 
   /* 4.  ABI-ISA compatibility */
   /* Note:
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f4e903d46bb..f8167875646 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -676,7 +676,7 @@  enum reg_class
    point values.  */
 
 #define GP_RETURN (GP_REG_FIRST + 4)
-#define FP_RETURN ((TARGET_SOFT_FLOAT) ? GP_RETURN : (FP_REG_FIRST + 0))
+#define FP_RETURN ((TARGET_SOFT_FLOAT_ABI) ? GP_RETURN : (FP_REG_FIRST + 0))
 
 #define MAX_ARGS_IN_REGISTERS 8
 
@@ -1154,6 +1154,6 @@  struct GTY (()) machine_function
 /* The largest type that can be passed in floating-point registers.  */
 /* TODO: according to mabi.  */
 #define UNITS_PER_FP_ARG  \
-  (TARGET_HARD_FLOAT ? (TARGET_DOUBLE_FLOAT ? 8 : 4) : 0)
+  (TARGET_HARD_FLOAT_ABI ? (TARGET_DOUBLE_FLOAT_ABI ? 8 : 4) : 0)
 
 #define FUNCTION_VALUE_REGNO_P(N) ((N) == GP_RETURN || (N) == FP_RETURN)
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c
new file mode 100644
index 00000000000..ab1c357d98c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2" } */
+/* { dg-final { scan-assembler-not "frecip\\.d" } } */
+
+/* With the "default" -march=loongarch64, -mabi=lp64s implies -mfpu=0 so
+   we won't puzzle people.  */
+
+double
+t (double x)
+{
+  return 1.0 / x;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c
new file mode 100644
index 00000000000..49d2f4ec267
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=la464 -O2" } */
+/* { dg-final { scan-assembler "frecip\\.s" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.w" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.s" } } */
+
+#include "flt-abi-isa-6.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c
new file mode 100644
index 00000000000..d248cc546f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=64" } */
+/* { dg-final { scan-assembler "frecip\\.d" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.d" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.d" } } */
+
+/* With -mabi=lp64s and -mfpu=64, we can use the FPU to calculate the
+   answer but we need to move the argument from a0 to a FPR, then move
+   the answer from a FPR back to a0.  */
+
+#include "flt-abi-isa-1.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c
new file mode 100644
index 00000000000..e31a1d1fbc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=la464 -O2" } */
+/* { dg-final { scan-assembler "frecip\\.d" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.d" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.d" } } */
+
+/* We know LA464 has a 64-bit FPU, so we can use it to calculate the
+   answer but we need to move the argument from a0 to a FPR, then move
+   the answer from a FPR back to a0.  */
+
+#include "flt-abi-isa-1.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c
new file mode 100644
index 00000000000..398e6b56ab5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64f -march=loongarch64 -O2 -mfpu=64" } */
+/* { dg-final { scan-assembler "frecip\\.d" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.d" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.d" } } */
+
+/* With -mabi=lp64f and -mfpu=64, we can use the FPU to calculate the
+   answer but we need to move the argument from a0 to a FPR, then move
+   the answer from a FPR back to a0 as the LP64F ABI mandates passing
+   double values via GPR (like LP64S).  */
+
+#include "flt-abi-isa-1.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c
new file mode 100644
index 00000000000..d7db62de344
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=la464 -O2 -mfpu=none" } */
+/* { dg-final { scan-assembler-not "frecip\\.d" } } */
+
+/* Explicitly disable FPU on LA464.  */
+
+#include "flt-abi-isa-1.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c
new file mode 100644
index 00000000000..9e204c6cd7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64f -march=loongarch64 -O2" } */
+/* { dg-final { scan-assembler "frecip\\.s" } } */
+/* { dg-final { scan-assembler-not "movgr2fr" } } */
+/* { dg-final { scan-assembler-not "movfr2gr" } } */
+
+float
+t (float x)
+{
+  return 1.0 / x;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c
new file mode 100644
index 00000000000..dc9f2a322bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c
@@ -0,0 +1,5 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2" } */
+/* { dg-final { scan-assembler-not "frecip\\.s" } } */
+
+#include "flt-abi-isa-6.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c
new file mode 100644
index 00000000000..001b034be9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=32" } */
+/* { dg-final { scan-assembler "frecip\\.s" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.w" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.s" } } */
+
+#include "flt-abi-isa-6.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c
new file mode 100644
index 00000000000..5762294207e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=64" } */
+/* { dg-final { scan-assembler "frecip\\.s" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.w" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.s" } } */
+
+#include "flt-abi-isa-6.c"