From patchwork Thu Mar 2 16:01:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 65908 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3CF783858284 for ; Thu, 2 Mar 2023 16:02:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3CF783858284 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1677772933; bh=6WGalZVJRke6B69J26cz8ePSHyRjg6FOZjliCbgsrHw=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=NJ1BmGy38bqiL8q4fp7M1yFJwc2MZlqx0UbpBZ+0kvCJtWh2wVw/+q9+JT35Xch5q uh/Uf2COGVMBZAgvwfsMoQxs3JL02H1YT7XClZSftEnm/mbYFHO1luMLuxeHS+uBQP H3eTnxASgDq7i+Yu7nElQjiwtE5Sv+NiG4troGl8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from xry111.site (xry111.site [IPv6:2001:470:683e::1]) by sourceware.org (Postfix) with ESMTPS id 7B3C23858CDB for ; Thu, 2 Mar 2023 16:01:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7B3C23858CDB Received: from stargazer.. (unknown [IPv6:240e:358:1106:d100:dc73:854d:832e:6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id A08AD65C86; Thu, 2 Mar 2023 11:01:32 -0500 (EST) To: gcc-patches@gcc.gnu.org Cc: WANG Xuerui , Lulu Cheng , Chenghua Xu , Xi Ruoyao Subject: [PATCH] LoongArch: Stop -mfpu from silently breaking ABI Date: Fri, 3 Mar 2023 00:01:22 +0800 Message-Id: <20230302160122.47573-1-xry111@xry111.site> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_FROM, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Xi Ruoyao via Gcc-patches From: Xi Ruoyao Reply-To: Xi Ruoyao Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" In the toolchain convention, we describe -mfpu= as: "Selects the allowed set of basic floating-point instructions and registers. This option should not change the FP calling convention unless it's necessary." Though not explicitly stated, the rationale of this rule is to allow combinations like "-mabi=lp64s -mfpu=64". This will be useful for running applications with LP64S/F ABI on a double-float-capable LoongArch hardware and using a math library with LP64S/F ABI but native double float HW instructions, for a better performance. And now a case in Linux kernel has again proven the usefulness of this kind of combination. The AMDGPU DCN kernel driver needs to perform some floating-point operation, but the entire kernel uses LP64S ABI. So the translation units of the AMDGPU DCN driver need to be compiled with -mfpu=64 (the kernel lacks soft-FP routines in libgcc), but -mabi=lp64s (or you can't link it with the other part of the kernel). Unfortunately, currently GCC uses TARGET_{HARD,SOFT,DOUBLE}_FLOAT to determine the floating calling convention. This causes "-mfpu=64" silently allow using $fa* to pass parameters and return values EVEN IF -mabi=lp64s is used. To make things worse, the generated object file has SOFT-FLOAT set in the eflags field so the linker will happily link it with other LP64S ABI object files, but obviously this will lead to bad results at runtime. The fix is simple: use TARGET_*_FLOAT_ABI instead. But then it causes "-mabi=lp64s -march=loongarch64" to generate code like: movgr2fr.d $fa0, $a0 frecip.d $fa0, $fa0 movfr2gr.d $a0, $fa0 The problem here is "loongarch64" is never strictly defined. So we consider "loongarch64" a "64-bit LoongArch CPU with the simplest FPU needed by the ABI", and if -march=loongarch64 but -mfpu is not explicitly used, we set -mfpu such a simplest one. I consider this a bug fix: the behavior difference from the toolchain convention doc is a bug, and generating object files with SOFT-FLOAT flag but parameters/return values passed through FPRs is definitely a bug. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? I'm not sure if it's a good idea to backport this into gcc-12 though. gcc/ChangeLog: * config/loongarch/loongarch.h (FP_RETURN): Use TARGET_*_FLOAT_ABI instead of TARGET_*_FLOAT. (UNITS_PER_FP_ARG): Likewise. * config/loongarch/loongarch-opts.cc (loongarch_config_target): If -march=loongarch64 and -mfpu not explicitly used, guess FPU capability from ABI. gcc/testsuite/ChangeLog: * gcc.target/loongarch/flt-abi-isa-1.c: New test. * gcc.target/loongarch/flt-abi-isa-2.c: New test. * gcc.target/loongarch/flt-abi-isa-3.c: New test. * gcc.target/loongarch/flt-abi-isa-4.c: New test. * gcc.target/loongarch/flt-abi-isa-5.c: New test. * gcc.target/loongarch/flt-abi-isa-6.c: New test. * gcc.target/loongarch/flt-abi-isa-7.c: New test. * gcc.target/loongarch/flt-abi-isa-8.c: New test. * gcc.target/loongarch/flt-abi-isa-9.c: New test. * gcc.target/loongarch/flt-abi-isa-10.c: New test. --- gcc/config/loongarch/loongarch-opts.cc | 18 ++++++++++++++++++ gcc/config/loongarch/loongarch.h | 4 ++-- .../gcc.target/loongarch/flt-abi-isa-1.c | 12 ++++++++++++ .../gcc.target/loongarch/flt-abi-isa-10.c | 7 +++++++ .../gcc.target/loongarch/flt-abi-isa-2.c | 11 +++++++++++ .../gcc.target/loongarch/flt-abi-isa-3.c | 11 +++++++++++ .../gcc.target/loongarch/flt-abi-isa-4.c | 12 ++++++++++++ .../gcc.target/loongarch/flt-abi-isa-5.c | 7 +++++++ .../gcc.target/loongarch/flt-abi-isa-6.c | 11 +++++++++++ .../gcc.target/loongarch/flt-abi-isa-7.c | 5 +++++ .../gcc.target/loongarch/flt-abi-isa-8.c | 7 +++++++ .../gcc.target/loongarch/flt-abi-isa-9.c | 7 +++++++ 12 files changed, 110 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index a52e25236ea..bea77da93e9 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -251,6 +251,24 @@ config_target_isa: ((t.cpu_arch == CPU_NATIVE && constrained.arch) ? t.isa.fpu : DEFAULT_ISA_EXT_FPU); + /* "loongarch64" is not really strictly defined: which FPU does it have? + So if -march=loongarch64 and -mfpu not explicitly provided, use the + minimal -mfpu setting suitable for the ABI. */ + if (t.cpu_arch == CPU_LOONGARCH64 && !constrained.fpu) + switch (t.abi.base) + { + case ABI_BASE_LP64D: + t.isa.fpu = ISA_EXT_FPU64; + break; + case ABI_BASE_LP64F: + t.isa.fpu = ISA_EXT_FPU32; + break; + case ABI_BASE_LP64S: + t.isa.fpu = ISA_EXT_NOFPU; + break; + default: + gcc_unreachable (); + } /* 4. ABI-ISA compatibility */ /* Note: diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index f4e903d46bb..f8167875646 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -676,7 +676,7 @@ enum reg_class point values. */ #define GP_RETURN (GP_REG_FIRST + 4) -#define FP_RETURN ((TARGET_SOFT_FLOAT) ? GP_RETURN : (FP_REG_FIRST + 0)) +#define FP_RETURN ((TARGET_SOFT_FLOAT_ABI) ? GP_RETURN : (FP_REG_FIRST + 0)) #define MAX_ARGS_IN_REGISTERS 8 @@ -1154,6 +1154,6 @@ struct GTY (()) machine_function /* The largest type that can be passed in floating-point registers. */ /* TODO: according to mabi. */ #define UNITS_PER_FP_ARG \ - (TARGET_HARD_FLOAT ? (TARGET_DOUBLE_FLOAT ? 8 : 4) : 0) + (TARGET_HARD_FLOAT_ABI ? (TARGET_DOUBLE_FLOAT_ABI ? 8 : 4) : 0) #define FUNCTION_VALUE_REGNO_P(N) ((N) == GP_RETURN || (N) == FP_RETURN) diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c new file mode 100644 index 00000000000..ab1c357d98c --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64s -march=loongarch64 -O2" } */ +/* { dg-final { scan-assembler-not "frecip\\.d" } } */ + +/* With the "default" -march=loongarch64, -mabi=lp64s implies -mfpu=0 so + we won't puzzle people. */ + +double +t (double x) +{ + return 1.0 / x; +} diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c new file mode 100644 index 00000000000..49d2f4ec267 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-10.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64s -march=la464 -O2" } */ +/* { dg-final { scan-assembler "frecip\\.s" } } */ +/* { dg-final { scan-assembler "movgr2fr\\.w" } } */ +/* { dg-final { scan-assembler "movfr2gr\\.s" } } */ + +#include "flt-abi-isa-6.c" diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c new file mode 100644 index 00000000000..d248cc546f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=64" } */ +/* { dg-final { scan-assembler "frecip\\.d" } } */ +/* { dg-final { scan-assembler "movgr2fr\\.d" } } */ +/* { dg-final { scan-assembler "movfr2gr\\.d" } } */ + +/* With -mabi=lp64s and -mfpu=64, we can use the FPU to calculate the + answer but we need to move the argument from a0 to a FPR, then move + the answer from a FPR back to a0. */ + +#include "flt-abi-isa-1.c" diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c new file mode 100644 index 00000000000..e31a1d1fbc4 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64s -march=la464 -O2" } */ +/* { dg-final { scan-assembler "frecip\\.d" } } */ +/* { dg-final { scan-assembler "movgr2fr\\.d" } } */ +/* { dg-final { scan-assembler "movfr2gr\\.d" } } */ + +/* We know LA464 has a 64-bit FPU, so we can use it to calculate the + answer but we need to move the argument from a0 to a FPR, then move + the answer from a FPR back to a0. */ + +#include "flt-abi-isa-1.c" diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c new file mode 100644 index 00000000000..398e6b56ab5 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64f -march=loongarch64 -O2 -mfpu=64" } */ +/* { dg-final { scan-assembler "frecip\\.d" } } */ +/* { dg-final { scan-assembler "movgr2fr\\.d" } } */ +/* { dg-final { scan-assembler "movfr2gr\\.d" } } */ + +/* With -mabi=lp64f and -mfpu=64, we can use the FPU to calculate the + answer but we need to move the argument from a0 to a FPR, then move + the answer from a FPR back to a0 as the LP64F ABI mandates passing + double values via GPR (like LP64S). */ + +#include "flt-abi-isa-1.c" diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c new file mode 100644 index 00000000000..d7db62de344 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-5.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64s -march=la464 -O2 -mfpu=none" } */ +/* { dg-final { scan-assembler-not "frecip\\.d" } } */ + +/* Explicitly disable FPU on LA464. */ + +#include "flt-abi-isa-1.c" diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c new file mode 100644 index 00000000000..9e204c6cd7c --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-6.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64f -march=loongarch64 -O2" } */ +/* { dg-final { scan-assembler "frecip\\.s" } } */ +/* { dg-final { scan-assembler-not "movgr2fr" } } */ +/* { dg-final { scan-assembler-not "movfr2gr" } } */ + +float +t (float x) +{ + return 1.0 / x; +} diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c new file mode 100644 index 00000000000..dc9f2a322bf --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-7.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64s -march=loongarch64 -O2" } */ +/* { dg-final { scan-assembler-not "frecip\\.s" } } */ + +#include "flt-abi-isa-6.c" diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c new file mode 100644 index 00000000000..001b034be9a --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-8.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=32" } */ +/* { dg-final { scan-assembler "frecip\\.s" } } */ +/* { dg-final { scan-assembler "movgr2fr\\.w" } } */ +/* { dg-final { scan-assembler "movfr2gr\\.s" } } */ + +#include "flt-abi-isa-6.c" diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c new file mode 100644 index 00000000000..5762294207e --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-9.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-mabi=lp64s -march=loongarch64 -O2 -mfpu=64" } */ +/* { dg-final { scan-assembler "frecip\\.s" } } */ +/* { dg-final { scan-assembler "movgr2fr\\.w" } } */ +/* { dg-final { scan-assembler "movfr2gr\\.s" } } */ + +#include "flt-abi-isa-6.c"