From patchwork Thu Mar  2 16:01:22 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Xi Ruoyao <xry111@xry111.site>
X-Patchwork-Id: 65908
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 3CF783858284
	for <patchwork@sourceware.org>; Thu,  2 Mar 2023 16:02:13 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3CF783858284
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1677772933;
	bh=6WGalZVJRke6B69J26cz8ePSHyRjg6FOZjliCbgsrHw=;
	h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive:
	 List-Post:List-Help:List-Subscribe:From:Reply-To:From;
	b=NJ1BmGy38bqiL8q4fp7M1yFJwc2MZlqx0UbpBZ+0kvCJtWh2wVw/+q9+JT35Xch5q
	 uh/Uf2COGVMBZAgvwfsMoQxs3JL02H1YT7XClZSftEnm/mbYFHO1luMLuxeHS+uBQP
	 H3eTnxASgDq7i+Yu7nElQjiwtE5Sv+NiG4troGl8=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from xry111.site (xry111.site [IPv6:2001:470:683e::1])
 by sourceware.org (Postfix) with ESMTPS id 7B3C23858CDB
 for <gcc-patches@gcc.gnu.org>; Thu,  2 Mar 2023 16:01:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7B3C23858CDB
Received: from stargazer.. (unknown
 [IPv6:240e:358:1106:d100:dc73:854d:832e:6])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
 (Client did not present a certificate)
 (Authenticated sender: xry111@xry111.site)
 by xry111.site (Postfix) with ESMTPSA id A08AD65C86;
 Thu,  2 Mar 2023 11:01:32 -0500 (EST)
To: gcc-patches@gcc.gnu.org
Cc: WANG Xuerui <i@xen0n.name>, Lulu Cheng <chenglulu@loongson.cn>,
 Chenghua Xu <xuchenghua@loongson.cn>, Xi Ruoyao <xry111@xry111.site>
Subject: [PATCH] LoongArch: Stop -mfpu from silently breaking ABI
Date: Fri,  3 Mar 2023 00:01:22 +0800
Message-Id: <20230302160122.47573-1-xry111@xry111.site>
X-Mailer: git-send-email 2.39.2
MIME-Version: 1.0
X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 LIKELY_SPAM_FROM, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Xi Ruoyao via Gcc-patches <gcc-patches@gcc.gnu.org>
From: Xi Ruoyao <xry111@xry111.site>
Reply-To: Xi Ruoyao <xry111@xry111.site>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

In the toolchain convention, we describe -mfpu= as:

"Selects the allowed set of basic floating-point instructions and
registers. This option should not change the FP calling convention
unless it's necessary."

Though not explicitly stated, the rationale of this rule is to allow
combinations like "-mabi=lp64s -mfpu=64".  This will be useful for
running applications with LP64S/F ABI on a double-float-capable
LoongArch hardware and using a math library with LP64S/F ABI but native
double float HW instructions, for a better performance.

And now a case in Linux kernel has again proven the usefulness of this
kind of combination.  The AMDGPU DCN kernel driver needs to perform some
floating-point operation, but the entire kernel uses LP64S ABI.  So the
translation units of the AMDGPU DCN driver need to be compiled with
-mfpu=64 (the kernel lacks soft-FP routines in libgcc), but -mabi=lp64s
(or you can't link it with the other part of the kernel).

Unfortunately, currently GCC uses TARGET_{HARD,SOFT,DOUBLE}_FLOAT to
determine the floating calling convention.  This causes "-mfpu=64"
silently allow using $fa* to pass parameters and return values EVEN IF
-mabi=lp64s is used.  To make things worse, the generated object file
has SOFT-FLOAT set in the eflags field so the linker will happily link
it with other LP64S ABI object files, but obviously this will lead to
bad results at runtime.

The fix is simple: use TARGET_*_FLOAT_ABI instead.  But then it causes
"-mabi=lp64s -march=loongarch64" to generate code like:

  movgr2fr.d $fa0, $a0
  frecip.d   $fa0, $fa0
  movfr2gr.d $a0, $fa0

The problem here is "loongarch64" is never strictly defined.  So we
consider "loongarch64" a "64-bit LoongArch CPU with the simplest FPU
needed by the ABI", and if -march=loongarch64 but -mfpu is not
explicitly used, we set -mfpu such a simplest one.

I consider this a bug fix: the behavior difference from the toolchain
convention doc is a bug, and generating object files with SOFT-FLOAT
flag but parameters/return values passed through FPRs is definitely a
bug.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?  I'm
not sure if it's a good idea to backport this into gcc-12 though.

gcc/ChangeLog:

	* config/loongarch/loongarch.h (FP_RETURN): Use
	TARGET_*_FLOAT_ABI instead of TARGET_*_FLOAT.
	(UNITS_PER_FP_ARG): Likewise.
	* config/loongarch/loongarch-opts.cc (loongarch_config_target):
	If -march=loongarch64 and -mfpu not explicitly used, guess FPU
	capability from ABI.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/flt-abi-isa-1.c: New test.
	* gcc.target/loongarch/flt-abi-isa-2.c: New test.
	* gcc.target/loongarch/flt-abi-isa-3.c: New test.
	* gcc.target/loongarch/flt-abi-isa-4.c: New test.
	* gcc.target/loongarch/flt-abi-isa-5.c: New test.
	* gcc.target/loongarch/flt-abi-isa-6.c: New test.
	* gcc.target/loongarch/flt-abi-isa-7.c: New test.
	* gcc.target/loongarch/flt-abi-isa-8.c: New test.
	* gcc.target/loongarch/flt-abi-isa-9.c: New test.
	* gcc.target/loongarch/flt-abi-isa-10.c: New test.
---
 gcc/config/loongarch/loongarch-opts.cc         | 18 ++++++++++++++++++
 gcc/config/loongarch/loongarch.h               |  4 ++--
 .../gcc.target/loongarch/flt-abi-isa-1.c       | 12 ++++++++++++
 .../gcc.target/loongarch/flt-abi-isa-10.c      |  7 +++++++
 .../gcc.target/loongarch/flt-abi-isa-2.c       | 11 +++++++++++
 .../gcc.target/loongarch/flt-abi-isa-3.c       | 11 +++++++++++
 .../gcc.target/loongarch/flt-abi-isa-4.c       | 12 ++++++++++++
 .../gcc.target/loongarch/flt-abi-isa-5.c       |  7 +++++++
 .../gcc.target/loongarch/flt-abi-isa-6.c       | 11 +++++++++++
 .../gcc.target/loongarch/flt-abi-isa-7.c       |  5 +++++
 .../gcc.target/loongarch/flt-abi-isa-8.c       |  7 +++++++
 .../gcc.target/loongarch/flt-abi-isa-9.c       |  7 +++++++
 12 files changed, 110 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c

diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc
index a52e25236ea..bea77da93e9 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -251,6 +251,24 @@ config_target_isa:
     ((t.cpu_arch == CPU_NATIVE && constrained.arch) ?
      t.isa.fpu : DEFAULT_ISA_EXT_FPU);
 
+  /* "loongarch64" is not really strictly defined: which FPU does it have?
+     So if -march=loongarch64 and -mfpu not explicitly provided, use the
+     minimal -mfpu setting suitable for the ABI.  */
+  if (t.cpu_arch == CPU_LOONGARCH64 && !constrained.fpu)
+    switch (t.abi.base)
+      {
+	case ABI_BASE_LP64D:
+	  t.isa.fpu = ISA_EXT_FPU64;
+	  break;
+	case ABI_BASE_LP64F:
+	  t.isa.fpu = ISA_EXT_FPU32;
+	  break;
+	case ABI_BASE_LP64S:
+	  t.isa.fpu = ISA_EXT_NOFPU;
+	  break;
+	default:
+	  gcc_unreachable ();
+      }
 
   /* 4.  ABI-ISA compatibility */
   /* Note:
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f4e903d46bb..f8167875646 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -676,7 +676,7 @@ enum reg_class
    point values.  */
 
 #define GP_RETURN (GP_REG_FIRST + 4)
-#define FP_RETURN ((TARGET_SOFT_FLOAT) ? GP_RETURN : (FP_REG_FIRST + 0))
+#define FP_RETURN ((TARGET_SOFT_FLOAT_ABI) ? GP_RETURN : (FP_REG_FIRST + 0))
 
 #define MAX_ARGS_IN_REGISTERS 8
 
@@ -1154,6 +1154,6 @@ struct GTY (()) machine_function
 /* The largest type that can be passed in floating-point registers.  */
 /* TODO: according to mabi.  */
 #define UNITS_PER_FP_ARG  \
-  (TARGET_HARD_FLOAT ? (TARGET_DOUBLE_FLOAT ? 8 : 4) : 0)
+  (TARGET_HARD_FLOAT_ABI ? (TARGET_DOUBLE_FLOAT_ABI ? 8 : 4) : 0)
 
 #define FUNCTION_VALUE_REGNO_P(N) ((N) == GP_RETURN || (N) == FP_RETURN)
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c
new file mode 100644
index 00000000000..ab1c357d98c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2" } */
+/* { dg-final { scan-assembler-not "frecip\\.d" } } */
+
+/* With the "default" -march=loongarch64, -mabi=lp64s implies -mfpu=0 so
+   we won't puzzle people.  */
+
+double
+t (double x)
+{
+  return 1.0 / x;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c
new file mode 100644
index 00000000000..49d2f4ec267
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=la464 -O2" } */
+/* { dg-final { scan-assembler "frecip\\.s" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.w" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.s" } } */
+
+#include "flt-abi-isa-6.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c
new file mode 100644
index 00000000000..d248cc546f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=64" } */
+/* { dg-final { scan-assembler "frecip\\.d" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.d" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.d" } } */
+
+/* With -mabi=lp64s and -mfpu=64, we can use the FPU to calculate the
+   answer but we need to move the argument from a0 to a FPR, then move
+   the answer from a FPR back to a0.  */
+
+#include "flt-abi-isa-1.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c
new file mode 100644
index 00000000000..e31a1d1fbc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=la464 -O2" } */
+/* { dg-final { scan-assembler "frecip\\.d" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.d" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.d" } } */
+
+/* We know LA464 has a 64-bit FPU, so we can use it to calculate the
+   answer but we need to move the argument from a0 to a FPR, then move
+   the answer from a FPR back to a0.  */
+
+#include "flt-abi-isa-1.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c
new file mode 100644
index 00000000000..398e6b56ab5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64f -march=loongarch64 -O2 -mfpu=64" } */
+/* { dg-final { scan-assembler "frecip\\.d" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.d" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.d" } } */
+
+/* With -mabi=lp64f and -mfpu=64, we can use the FPU to calculate the
+   answer but we need to move the argument from a0 to a FPR, then move
+   the answer from a FPR back to a0 as the LP64F ABI mandates passing
+   double values via GPR (like LP64S).  */
+
+#include "flt-abi-isa-1.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c
new file mode 100644
index 00000000000..d7db62de344
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=la464 -O2 -mfpu=none" } */
+/* { dg-final { scan-assembler-not "frecip\\.d" } } */
+
+/* Explicitly disable FPU on LA464.  */
+
+#include "flt-abi-isa-1.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c
new file mode 100644
index 00000000000..9e204c6cd7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64f -march=loongarch64 -O2" } */
+/* { dg-final { scan-assembler "frecip\\.s" } } */
+/* { dg-final { scan-assembler-not "movgr2fr" } } */
+/* { dg-final { scan-assembler-not "movfr2gr" } } */
+
+float
+t (float x)
+{
+  return 1.0 / x;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c
new file mode 100644
index 00000000000..dc9f2a322bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2" } */
+/* { dg-final { scan-assembler-not "frecip\\.s" } } */
+
+#include "flt-abi-isa-6.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c
new file mode 100644
index 00000000000..001b034be9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=32" } */
+/* { dg-final { scan-assembler "frecip\\.s" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.w" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.s" } } */
+
+#include "flt-abi-isa-6.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c
new file mode 100644
index 00000000000..5762294207e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=64" } */
+/* { dg-final { scan-assembler "frecip\\.s" } } */
+/* { dg-final { scan-assembler "movgr2fr\\.w" } } */
+/* { dg-final { scan-assembler "movfr2gr\\.s" } } */
+
+#include "flt-abi-isa-6.c"