Message ID | 20220523181209.2208136-1-vineetg@rivosinc.com |
---|---|
State | Committed |
Commit | b646d7d279ae0c0d35564542d09866bf3e8afac0 |
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E24B63856250 for <patchwork@sourceware.org>; Mon, 23 May 2022 18:12:31 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by sourceware.org (Postfix) with ESMTPS id 5A1E53858C56 for <gcc-patches@gcc.gnu.org>; Mon, 23 May 2022 18:12:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5A1E53858C56 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pf1-x436.google.com with SMTP id 202so7390998pfu.0 for <gcc-patches@gcc.gnu.org>; Mon, 23 May 2022 11:12:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=JWJNS+IthRebg0KC6SxeBRJJmONvQ54Z1gO8RExodD8=; b=Ybm29PBkjJcOvcH4tvCyAwv6ATTO02YA4TbupHRB8yCLCAB8z5JU5bCWLhm4Qy7KSY jNgQq9ANmfBSdYss7byHHumBfMuJov43hKLDvVtDN9tZtFQzPqkhGSWGw1Ct4FLibehD KZ1xe50arIaqB2Ke3zuIwo1fXMhUW3Fm66UkooAfJ2PLkI81BQM5MKpRw2TG0WKVGYiC slbhd6SUw+Y/nKzG0FJErt4bWVpmkEyiFINJXZ/FtNZ3UawudNaSIVaoEn6ho3tkN7KP hs/EHjjOUi+yq577LWgpuDO94JL/D/IzMVlmHbv0JzkzBziYbX6HZHgTpXDqfBxZ7Dx3 c4Vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=JWJNS+IthRebg0KC6SxeBRJJmONvQ54Z1gO8RExodD8=; b=NwIeXnxY5RG+xQK9IFJRl6Y9aOcooJ0OKfQWIw9YyE/ItIrloAw4F/KWqwyJYXNSXL wI0KSMQQS+07LSxfsekJDqDA7HeGhCbC1LxsodZ7tTXeALu3YC126z+deb8n4kYx2jmZ TPmuvHlWNXbVpS6I7V7R3UuSclTB4iY4aOjsnO+D4W3v74Mit+IyUSdar4K3zgw2tOgE kLk5Tl84VYiD30Wg3fZ2ChsoJrr4OvCBtJ1kyGICi/Ne+0ca+ucOK1XdZm+O9Tj3i8yy nbhjytk6HHbdHGbzcYUuiH9xO/ncqen9+7qsGb2DizYUtQX5qSqgdxJSy9ANeL6X9QMr BA6Q== X-Gm-Message-State: AOAM5330+8MT+3IybTJpK5gaE/6DbgtREQskV+hTx9MfbifJ3mekN9Uu Y9dUxSJvUZiAGakebIPDUKV5oUqwmR1Z3g== X-Google-Smtp-Source: ABdhPJymA31wMVvihHoMNHJb28lArkZb1yyCGZTR7/xlS9CvkgB1GVERigtEGedvYA+LkQE5oo2qsw== X-Received: by 2002:a63:5:0:b0:3c6:dcb2:428 with SMTP id 5-20020a630005000000b003c6dcb20428mr21146615pga.73.1653329533009; Mon, 23 May 2022 11:12:13 -0700 (PDT) Received: from vineetg-framework.ba.rivosinc.com (c-24-4-73-83.hsd1.ca.comcast.net. [24.4.73.83]) by smtp.gmail.com with ESMTPSA id b4-20020a170902d50400b0015e8d4eb1dcsm5511563plg.38.2022.05.23.11.12.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 11:12:12 -0700 (PDT) From: Vineet Gupta <vineetg@rivosinc.com> To: gcc-patches@gcc.gnu.org Subject: [PATCH] [PR/target 105666] RISC-V: Inhibit FP <--> int register moves via tune param Date: Mon, 23 May 2022 11:12:09 -0700 Message-Id: <20220523181209.2208136-1-vineetg@rivosinc.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Cc: Andrew Waterman <andrew@sifive.com>, Vineet Gupta <vineetg@rivosinc.com>, kito.cheng@gmail.com, Philipp Tomsich <philipp.tomsich@vrull.eu>, gnu-toolchain@rivosinc.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
[PR/target,105666] RISC-V: Inhibit FP <--> int register moves via tune param
|
|
Commit Message
Vineet Gupta
May 23, 2022, 6:12 p.m. UTC
Under extreme register pressure, compiler can use FP <--> int
moves as a cheap alternate to spilling to memory.
This was seen with SPEC2017 FP benchmark 507.cactu:
ML_BSSN_Advect.cc:ML_BSSN_Advect_Body()
| fmv.d.x fa5,s9 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1
| .LVL325:
| ld s9,184(sp) # _12469, %sfp
| ...
| .LVL339:
| fmv.x.d s4,fa5 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1
|
The FMV instructions could be costlier (than stack spill) on certain
micro-architectures, thus this needs to be a per-cpu tunable
(default being to inhibit on all existing RV cpus).
Testsuite run with new test reports 10 failures without the fix
corresponding to the build variations of pr105666.c
| === gcc Summary ===
|
| # of expected passes 123318 (+10)
| # of unexpected failures 34 (-10)
| # of unexpected successes 4
| # of expected failures 780
| # of unresolved testcases 4
| # of unsupported tests 2796
gcc/Changelog:
* config/riscv/riscv.cc: (struct riscv_tune_param): Add
fmv_cost.
(rocket_tune_info): Add default fmv_cost 8.
(sifive_7_tune_info): Ditto.
(thead_c906_tune_info): Ditto.
(optimize_size_tune_info): Ditto.
(riscv_register_move_cost): Use fmv_cost for int<->fp moves.
gcc/testsuite/Changelog:
* gcc.target/riscv/pr105666.c: New test.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
gcc/config/riscv/riscv.cc | 9 ++++
gcc/testsuite/gcc.target/riscv/pr105666.c | 55 +++++++++++++++++++++++
2 files changed, 64 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr105666.c
Comments
Good catch! On Mon, 23 May 2022 at 20:12, Vineet Gupta <vineetg@rivosinc.com> wrote: > Under extreme register pressure, compiler can use FP <--> int > moves as a cheap alternate to spilling to memory. > This was seen with SPEC2017 FP benchmark 507.cactu: > ML_BSSN_Advect.cc:ML_BSSN_Advect_Body() > > | fmv.d.x fa5,s9 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 > | .LVL325: > | ld s9,184(sp) # _12469, %sfp > | ... > | .LVL339: > | fmv.x.d s4,fa5 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 > | > > The FMV instructions could be costlier (than stack spill) on certain > micro-architectures, thus this needs to be a per-cpu tunable > (default being to inhibit on all existing RV cpus). > > Testsuite run with new test reports 10 failures without the fix > corresponding to the build variations of pr105666.c > > | === gcc Summary === > | > | # of expected passes 123318 (+10) > | # of unexpected failures 34 (-10) > | # of unexpected successes 4 > | # of expected failures 780 > | # of unresolved testcases 4 > | # of unsupported tests 2796 > > gcc/Changelog: > > * config/riscv/riscv.cc: (struct riscv_tune_param): Add > fmv_cost. > (rocket_tune_info): Add default fmv_cost 8. > (sifive_7_tune_info): Ditto. > (thead_c906_tune_info): Ditto. > (optimize_size_tune_info): Ditto. > (riscv_register_move_cost): Use fmv_cost for int<->fp moves. > > gcc/testsuite/Changelog: > > * gcc.target/riscv/pr105666.c: New test. > > Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> > --- > gcc/config/riscv/riscv.cc | 9 ++++ > gcc/testsuite/gcc.target/riscv/pr105666.c | 55 +++++++++++++++++++++++ > 2 files changed, 64 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/pr105666.c > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index ee756aab6940..f3ac0d8865f0 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -220,6 +220,7 @@ struct riscv_tune_param > unsigned short issue_rate; > unsigned short branch_cost; > unsigned short memory_cost; > + unsigned short fmv_cost; > bool slow_unaligned_access; > }; > > @@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info > = { > 1, /* issue_rate */ > 3, /* branch_cost */ > 5, /* memory_cost */ > + 8, /* fmv_cost */ > true, /* > slow_unaligned_access */ > }; > > @@ -298,6 +300,7 @@ static const struct riscv_tune_param > sifive_7_tune_info = { > 2, /* issue_rate */ > 4, /* branch_cost */ > 3, /* memory_cost */ > + 8, /* fmv_cost */ > true, /* > slow_unaligned_access */ > }; > > @@ -311,6 +314,7 @@ static const struct riscv_tune_param > thead_c906_tune_info = { > 1, /* issue_rate */ > 3, /* branch_cost */ > 5, /* memory_cost */ > + 8, /* fmv_cost */ > false, /* slow_unaligned_access */ > }; > > @@ -324,6 +328,7 @@ static const struct riscv_tune_param > optimize_size_tune_info = { > 1, /* issue_rate */ > 1, /* branch_cost */ > 2, /* memory_cost */ > + 8, /* fmv_cost */ > false, /* slow_unaligned_access */ > }; > > @@ -4737,6 +4742,10 @@ static int > riscv_register_move_cost (machine_mode mode, > reg_class_t from, reg_class_t to) > { > + if ((from == FP_REGS && to == GR_REGS) || > + (from == GR_REGS && to == FP_REGS)) > + return tune_param->fmv_cost; > + > return riscv_secondary_memory_needed (mode, from, to) ? 8 : 2; > } > > diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c > b/gcc/testsuite/gcc.target/riscv/pr105666.c > new file mode 100644 > index 000000000000..904f3bc0763f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c > @@ -0,0 +1,55 @@ > +/* Shamelessly plugged off > gcc/testsuite/gcc.c-torture/execute/pr28982a.c. > + > + The idea is to induce high register pressure for both int/fp registers > + so that they spill. By default FMV instructions would be used to stash > + int reg to a fp reg (and vice-versa) but that could be costlier than > + spilling to stack. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64g -ffast-math" } */ > + > +#define NITER 4 > +#define NVARS 20 > +#define MULTI(X) \ > + X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \ > + X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19) > + > +#define DECLAREI(INDEX) inc##INDEX = incs[INDEX] > +#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 5 > +#define LOOP(INDEX) result##INDEX += result##INDEX * (*ptr##INDEX), > ptr##INDEX += inc##INDEX > +#define COPYOUT(INDEX) results[INDEX] = result##INDEX > + > +double *ptrs[NVARS]; > +double results[NVARS]; > +int incs[NVARS]; > + > +void __attribute__((noinline)) > +foo (int n) > +{ > + int MULTI (DECLAREI); > + double MULTI (DECLAREF); > + while (n--) > + MULTI (LOOP); > + MULTI (COPYOUT); > +} > + > +double input[NITER * NVARS]; > + > +int > +main (void) > +{ > + int i; > + > + for (i = 0; i < NVARS; i++) > + ptrs[i] = input + i, incs[i] = i; > + for (i = 0; i < NITER * NVARS; i++) > + input[i] = i; > + foo (NITER); > + for (i = 0; i < NVARS; i++) > + if (results[i] != i * NITER * (NITER + 1) / 2) > + return 1; > + return 0; > +} > + > +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */ > +/* { dg-final { scan-assembler-not "\tfmv\\.x\\.d\t" } } */ > -- > 2.32.0 > >
Committed, thanks! On Tue, May 24, 2022 at 3:40 AM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > > Good catch! > > On Mon, 23 May 2022 at 20:12, Vineet Gupta <vineetg@rivosinc.com> wrote: > > > Under extreme register pressure, compiler can use FP <--> int > > moves as a cheap alternate to spilling to memory. > > This was seen with SPEC2017 FP benchmark 507.cactu: > > ML_BSSN_Advect.cc:ML_BSSN_Advect_Body() > > > > | fmv.d.x fa5,s9 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 > > | .LVL325: > > | ld s9,184(sp) # _12469, %sfp > > | ... > > | .LVL339: > > | fmv.x.d s4,fa5 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 > > | > > > > The FMV instructions could be costlier (than stack spill) on certain > > micro-architectures, thus this needs to be a per-cpu tunable > > (default being to inhibit on all existing RV cpus). > > > > Testsuite run with new test reports 10 failures without the fix > > corresponding to the build variations of pr105666.c > > > > | === gcc Summary === > > | > > | # of expected passes 123318 (+10) > > | # of unexpected failures 34 (-10) > > | # of unexpected successes 4 > > | # of expected failures 780 > > | # of unresolved testcases 4 > > | # of unsupported tests 2796 > > > > gcc/Changelog: > > > > * config/riscv/riscv.cc: (struct riscv_tune_param): Add > > fmv_cost. > > (rocket_tune_info): Add default fmv_cost 8. > > (sifive_7_tune_info): Ditto. > > (thead_c906_tune_info): Ditto. > > (optimize_size_tune_info): Ditto. > > (riscv_register_move_cost): Use fmv_cost for int<->fp moves. > > > > gcc/testsuite/Changelog: > > > > * gcc.target/riscv/pr105666.c: New test. > > > > Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> > > --- > > gcc/config/riscv/riscv.cc | 9 ++++ > > gcc/testsuite/gcc.target/riscv/pr105666.c | 55 +++++++++++++++++++++++ > > 2 files changed, 64 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/riscv/pr105666.c > > > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > > index ee756aab6940..f3ac0d8865f0 100644 > > --- a/gcc/config/riscv/riscv.cc > > +++ b/gcc/config/riscv/riscv.cc > > @@ -220,6 +220,7 @@ struct riscv_tune_param > > unsigned short issue_rate; > > unsigned short branch_cost; > > unsigned short memory_cost; > > + unsigned short fmv_cost; > > bool slow_unaligned_access; > > }; > > > > @@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info > > = { > > 1, /* issue_rate */ > > 3, /* branch_cost */ > > 5, /* memory_cost */ > > + 8, /* fmv_cost */ > > true, /* > > slow_unaligned_access */ > > }; > > > > @@ -298,6 +300,7 @@ static const struct riscv_tune_param > > sifive_7_tune_info = { > > 2, /* issue_rate */ > > 4, /* branch_cost */ > > 3, /* memory_cost */ > > + 8, /* fmv_cost */ > > true, /* > > slow_unaligned_access */ > > }; > > > > @@ -311,6 +314,7 @@ static const struct riscv_tune_param > > thead_c906_tune_info = { > > 1, /* issue_rate */ > > 3, /* branch_cost */ > > 5, /* memory_cost */ > > + 8, /* fmv_cost */ > > false, /* slow_unaligned_access */ > > }; > > > > @@ -324,6 +328,7 @@ static const struct riscv_tune_param > > optimize_size_tune_info = { > > 1, /* issue_rate */ > > 1, /* branch_cost */ > > 2, /* memory_cost */ > > + 8, /* fmv_cost */ > > false, /* slow_unaligned_access */ > > }; > > > > @@ -4737,6 +4742,10 @@ static int > > riscv_register_move_cost (machine_mode mode, > > reg_class_t from, reg_class_t to) > > { > > + if ((from == FP_REGS && to == GR_REGS) || > > + (from == GR_REGS && to == FP_REGS)) > > + return tune_param->fmv_cost; > > + > > return riscv_secondary_memory_needed (mode, from, to) ? 8 : 2; > > } > > > > diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c > > b/gcc/testsuite/gcc.target/riscv/pr105666.c > > new file mode 100644 > > index 000000000000..904f3bc0763f > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c > > @@ -0,0 +1,55 @@ > > +/* Shamelessly plugged off > > gcc/testsuite/gcc.c-torture/execute/pr28982a.c. > > + > > + The idea is to induce high register pressure for both int/fp registers > > + so that they spill. By default FMV instructions would be used to stash > > + int reg to a fp reg (and vice-versa) but that could be costlier than > > + spilling to stack. */ > > + > > +/* { dg-do compile } */ > > +/* { dg-options "-march=rv64g -ffast-math" } */ > > + > > +#define NITER 4 > > +#define NVARS 20 > > +#define MULTI(X) \ > > + X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \ > > + X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19) > > + > > +#define DECLAREI(INDEX) inc##INDEX = incs[INDEX] > > +#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 5 > > +#define LOOP(INDEX) result##INDEX += result##INDEX * (*ptr##INDEX), > > ptr##INDEX += inc##INDEX > > +#define COPYOUT(INDEX) results[INDEX] = result##INDEX > > + > > +double *ptrs[NVARS]; > > +double results[NVARS]; > > +int incs[NVARS]; > > + > > +void __attribute__((noinline)) > > +foo (int n) > > +{ > > + int MULTI (DECLAREI); > > + double MULTI (DECLAREF); > > + while (n--) > > + MULTI (LOOP); > > + MULTI (COPYOUT); > > +} > > + > > +double input[NITER * NVARS]; > > + > > +int > > +main (void) > > +{ > > + int i; > > + > > + for (i = 0; i < NVARS; i++) > > + ptrs[i] = input + i, incs[i] = i; > > + for (i = 0; i < NITER * NVARS; i++) > > + input[i] = i; > > + foo (NITER); > > + for (i = 0; i < NVARS; i++) > > + if (results[i] != i * NITER * (NITER + 1) / 2) > > + return 1; > > + return 0; > > +} > > + > > +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */ > > +/* { dg-final { scan-assembler-not "\tfmv\\.x\\.d\t" } } */ > > -- > > 2.32.0 > > > >
On 5/24/22 00:59, Kito Cheng wrote: > Committed, thanks! Thx for the quick action Kito, Can this be backported to gcc 12 as well ? Thx, -Vineet > > On Tue, May 24, 2022 at 3:40 AM Philipp Tomsich > <philipp.tomsich@vrull.eu> wrote: >> Good catch! >> >> On Mon, 23 May 2022 at 20:12, Vineet Gupta <vineetg@rivosinc.com> wrote: >> >>> Under extreme register pressure, compiler can use FP <--> int >>> moves as a cheap alternate to spilling to memory. >>> This was seen with SPEC2017 FP benchmark 507.cactu: >>> ML_BSSN_Advect.cc:ML_BSSN_Advect_Body() >>> >>> | fmv.d.x fa5,s9 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 >>> | .LVL325: >>> | ld s9,184(sp) # _12469, %sfp >>> | ... >>> | .LVL339: >>> | fmv.x.d s4,fa5 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 >>> | >>> >>> The FMV instructions could be costlier (than stack spill) on certain >>> micro-architectures, thus this needs to be a per-cpu tunable >>> (default being to inhibit on all existing RV cpus). >>> >>> Testsuite run with new test reports 10 failures without the fix >>> corresponding to the build variations of pr105666.c >>> >>> | === gcc Summary === >>> | >>> | # of expected passes 123318 (+10) >>> | # of unexpected failures 34 (-10) >>> | # of unexpected successes 4 >>> | # of expected failures 780 >>> | # of unresolved testcases 4 >>> | # of unsupported tests 2796 >>> >>> gcc/Changelog: >>> >>> * config/riscv/riscv.cc: (struct riscv_tune_param): Add >>> fmv_cost. >>> (rocket_tune_info): Add default fmv_cost 8. >>> (sifive_7_tune_info): Ditto. >>> (thead_c906_tune_info): Ditto. >>> (optimize_size_tune_info): Ditto. >>> (riscv_register_move_cost): Use fmv_cost for int<->fp moves. >>> >>> gcc/testsuite/Changelog: >>> >>> * gcc.target/riscv/pr105666.c: New test. >>> >>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> >>> --- >>> gcc/config/riscv/riscv.cc | 9 ++++ >>> gcc/testsuite/gcc.target/riscv/pr105666.c | 55 +++++++++++++++++++++++ >>> 2 files changed, 64 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr105666.c >>> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >>> index ee756aab6940..f3ac0d8865f0 100644 >>> --- a/gcc/config/riscv/riscv.cc >>> +++ b/gcc/config/riscv/riscv.cc >>> @@ -220,6 +220,7 @@ struct riscv_tune_param >>> unsigned short issue_rate; >>> unsigned short branch_cost; >>> unsigned short memory_cost; >>> + unsigned short fmv_cost; >>> bool slow_unaligned_access; >>> }; >>> >>> @@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info >>> = { >>> 1, /* issue_rate */ >>> 3, /* branch_cost */ >>> 5, /* memory_cost */ >>> + 8, /* fmv_cost */ >>> true, /* >>> slow_unaligned_access */ >>> }; >>> >>> @@ -298,6 +300,7 @@ static const struct riscv_tune_param >>> sifive_7_tune_info = { >>> 2, /* issue_rate */ >>> 4, /* branch_cost */ >>> 3, /* memory_cost */ >>> + 8, /* fmv_cost */ >>> true, /* >>> slow_unaligned_access */ >>> }; >>> >>> @@ -311,6 +314,7 @@ static const struct riscv_tune_param >>> thead_c906_tune_info = { >>> 1, /* issue_rate */ >>> 3, /* branch_cost */ >>> 5, /* memory_cost */ >>> + 8, /* fmv_cost */ >>> false, /* slow_unaligned_access */ >>> }; >>> >>> @@ -324,6 +328,7 @@ static const struct riscv_tune_param >>> optimize_size_tune_info = { >>> 1, /* issue_rate */ >>> 1, /* branch_cost */ >>> 2, /* memory_cost */ >>> + 8, /* fmv_cost */ >>> false, /* slow_unaligned_access */ >>> }; >>> >>> @@ -4737,6 +4742,10 @@ static int >>> riscv_register_move_cost (machine_mode mode, >>> reg_class_t from, reg_class_t to) >>> { >>> + if ((from == FP_REGS && to == GR_REGS) || >>> + (from == GR_REGS && to == FP_REGS)) >>> + return tune_param->fmv_cost; >>> + >>> return riscv_secondary_memory_needed (mode, from, to) ? 8 : 2; >>> } >>> >>> diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c >>> b/gcc/testsuite/gcc.target/riscv/pr105666.c >>> new file mode 100644 >>> index 000000000000..904f3bc0763f >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c >>> @@ -0,0 +1,55 @@ >>> +/* Shamelessly plugged off >>> gcc/testsuite/gcc.c-torture/execute/pr28982a.c. >>> + >>> + The idea is to induce high register pressure for both int/fp registers >>> + so that they spill. By default FMV instructions would be used to stash >>> + int reg to a fp reg (and vice-versa) but that could be costlier than >>> + spilling to stack. */ >>> + >>> +/* { dg-do compile } */ >>> +/* { dg-options "-march=rv64g -ffast-math" } */ >>> + >>> +#define NITER 4 >>> +#define NVARS 20 >>> +#define MULTI(X) \ >>> + X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \ >>> + X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19) >>> + >>> +#define DECLAREI(INDEX) inc##INDEX = incs[INDEX] >>> +#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 5 >>> +#define LOOP(INDEX) result##INDEX += result##INDEX * (*ptr##INDEX), >>> ptr##INDEX += inc##INDEX >>> +#define COPYOUT(INDEX) results[INDEX] = result##INDEX >>> + >>> +double *ptrs[NVARS]; >>> +double results[NVARS]; >>> +int incs[NVARS]; >>> + >>> +void __attribute__((noinline)) >>> +foo (int n) >>> +{ >>> + int MULTI (DECLAREI); >>> + double MULTI (DECLAREF); >>> + while (n--) >>> + MULTI (LOOP); >>> + MULTI (COPYOUT); >>> +} >>> + >>> +double input[NITER * NVARS]; >>> + >>> +int >>> +main (void) >>> +{ >>> + int i; >>> + >>> + for (i = 0; i < NVARS; i++) >>> + ptrs[i] = input + i, incs[i] = i; >>> + for (i = 0; i < NITER * NVARS; i++) >>> + input[i] = i; >>> + foo (NITER); >>> + for (i = 0; i < NVARS; i++) >>> + if (results[i] != i * NITER * (NITER + 1) / 2) >>> + return 1; >>> + return 0; >>> +} >>> + >>> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */ >>> +/* { dg-final { scan-assembler-not "\tfmv\\.x\\.d\t" } } */ >>> -- >>> 2.32.0 >>> >>>
I just hesitated for a few days about backporting this, but I think it's OK to back port because 1. Simple enough 2. Good for general RISC-V core Committed with your latest testsuite fix. Thanks! On Wed, May 25, 2022 at 3:38 AM Vineet Gupta <vineetg@rivosinc.com> wrote: > > > > On 5/24/22 00:59, Kito Cheng wrote: > > Committed, thanks! > > Thx for the quick action Kito, > Can this be backported to gcc 12 as well ? > > Thx, > -Vineet > > > > > On Tue, May 24, 2022 at 3:40 AM Philipp Tomsich > > <philipp.tomsich@vrull.eu> wrote: > >> Good catch! > >> > >> On Mon, 23 May 2022 at 20:12, Vineet Gupta <vineetg@rivosinc.com> wrote: > >> > >>> Under extreme register pressure, compiler can use FP <--> int > >>> moves as a cheap alternate to spilling to memory. > >>> This was seen with SPEC2017 FP benchmark 507.cactu: > >>> ML_BSSN_Advect.cc:ML_BSSN_Advect_Body() > >>> > >>> | fmv.d.x fa5,s9 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 > >>> | .LVL325: > >>> | ld s9,184(sp) # _12469, %sfp > >>> | ... > >>> | .LVL339: > >>> | fmv.x.d s4,fa5 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1 > >>> | > >>> > >>> The FMV instructions could be costlier (than stack spill) on certain > >>> micro-architectures, thus this needs to be a per-cpu tunable > >>> (default being to inhibit on all existing RV cpus). > >>> > >>> Testsuite run with new test reports 10 failures without the fix > >>> corresponding to the build variations of pr105666.c > >>> > >>> | === gcc Summary === > >>> | > >>> | # of expected passes 123318 (+10) > >>> | # of unexpected failures 34 (-10) > >>> | # of unexpected successes 4 > >>> | # of expected failures 780 > >>> | # of unresolved testcases 4 > >>> | # of unsupported tests 2796 > >>> > >>> gcc/Changelog: > >>> > >>> * config/riscv/riscv.cc: (struct riscv_tune_param): Add > >>> fmv_cost. > >>> (rocket_tune_info): Add default fmv_cost 8. > >>> (sifive_7_tune_info): Ditto. > >>> (thead_c906_tune_info): Ditto. > >>> (optimize_size_tune_info): Ditto. > >>> (riscv_register_move_cost): Use fmv_cost for int<->fp moves. > >>> > >>> gcc/testsuite/Changelog: > >>> > >>> * gcc.target/riscv/pr105666.c: New test. > >>> > >>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> > >>> --- > >>> gcc/config/riscv/riscv.cc | 9 ++++ > >>> gcc/testsuite/gcc.target/riscv/pr105666.c | 55 +++++++++++++++++++++++ > >>> 2 files changed, 64 insertions(+) > >>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr105666.c > >>> > >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > >>> index ee756aab6940..f3ac0d8865f0 100644 > >>> --- a/gcc/config/riscv/riscv.cc > >>> +++ b/gcc/config/riscv/riscv.cc > >>> @@ -220,6 +220,7 @@ struct riscv_tune_param > >>> unsigned short issue_rate; > >>> unsigned short branch_cost; > >>> unsigned short memory_cost; > >>> + unsigned short fmv_cost; > >>> bool slow_unaligned_access; > >>> }; > >>> > >>> @@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info > >>> = { > >>> 1, /* issue_rate */ > >>> 3, /* branch_cost */ > >>> 5, /* memory_cost */ > >>> + 8, /* fmv_cost */ > >>> true, /* > >>> slow_unaligned_access */ > >>> }; > >>> > >>> @@ -298,6 +300,7 @@ static const struct riscv_tune_param > >>> sifive_7_tune_info = { > >>> 2, /* issue_rate */ > >>> 4, /* branch_cost */ > >>> 3, /* memory_cost */ > >>> + 8, /* fmv_cost */ > >>> true, /* > >>> slow_unaligned_access */ > >>> }; > >>> > >>> @@ -311,6 +314,7 @@ static const struct riscv_tune_param > >>> thead_c906_tune_info = { > >>> 1, /* issue_rate */ > >>> 3, /* branch_cost */ > >>> 5, /* memory_cost */ > >>> + 8, /* fmv_cost */ > >>> false, /* slow_unaligned_access */ > >>> }; > >>> > >>> @@ -324,6 +328,7 @@ static const struct riscv_tune_param > >>> optimize_size_tune_info = { > >>> 1, /* issue_rate */ > >>> 1, /* branch_cost */ > >>> 2, /* memory_cost */ > >>> + 8, /* fmv_cost */ > >>> false, /* slow_unaligned_access */ > >>> }; > >>> > >>> @@ -4737,6 +4742,10 @@ static int > >>> riscv_register_move_cost (machine_mode mode, > >>> reg_class_t from, reg_class_t to) > >>> { > >>> + if ((from == FP_REGS && to == GR_REGS) || > >>> + (from == GR_REGS && to == FP_REGS)) > >>> + return tune_param->fmv_cost; > >>> + > >>> return riscv_secondary_memory_needed (mode, from, to) ? 8 : 2; > >>> } > >>> > >>> diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c > >>> b/gcc/testsuite/gcc.target/riscv/pr105666.c > >>> new file mode 100644 > >>> index 000000000000..904f3bc0763f > >>> --- /dev/null > >>> +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c > >>> @@ -0,0 +1,55 @@ > >>> +/* Shamelessly plugged off > >>> gcc/testsuite/gcc.c-torture/execute/pr28982a.c. > >>> + > >>> + The idea is to induce high register pressure for both int/fp registers > >>> + so that they spill. By default FMV instructions would be used to stash > >>> + int reg to a fp reg (and vice-versa) but that could be costlier than > >>> + spilling to stack. */ > >>> + > >>> +/* { dg-do compile } */ > >>> +/* { dg-options "-march=rv64g -ffast-math" } */ > >>> + > >>> +#define NITER 4 > >>> +#define NVARS 20 > >>> +#define MULTI(X) \ > >>> + X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \ > >>> + X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19) > >>> + > >>> +#define DECLAREI(INDEX) inc##INDEX = incs[INDEX] > >>> +#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 5 > >>> +#define LOOP(INDEX) result##INDEX += result##INDEX * (*ptr##INDEX), > >>> ptr##INDEX += inc##INDEX > >>> +#define COPYOUT(INDEX) results[INDEX] = result##INDEX > >>> + > >>> +double *ptrs[NVARS]; > >>> +double results[NVARS]; > >>> +int incs[NVARS]; > >>> + > >>> +void __attribute__((noinline)) > >>> +foo (int n) > >>> +{ > >>> + int MULTI (DECLAREI); > >>> + double MULTI (DECLAREF); > >>> + while (n--) > >>> + MULTI (LOOP); > >>> + MULTI (COPYOUT); > >>> +} > >>> + > >>> +double input[NITER * NVARS]; > >>> + > >>> +int > >>> +main (void) > >>> +{ > >>> + int i; > >>> + > >>> + for (i = 0; i < NVARS; i++) > >>> + ptrs[i] = input + i, incs[i] = i; > >>> + for (i = 0; i < NITER * NVARS; i++) > >>> + input[i] = i; > >>> + foo (NITER); > >>> + for (i = 0; i < NVARS; i++) > >>> + if (results[i] != i * NITER * (NITER + 1) / 2) > >>> + return 1; > >>> + return 0; > >>> +} > >>> + > >>> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */ > >>> +/* { dg-final { scan-assembler-not "\tfmv\\.x\\.d\t" } } */ > >>> -- > >>> 2.32.0 > >>> > >>> >
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index ee756aab6940..f3ac0d8865f0 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -220,6 +220,7 @@ struct riscv_tune_param unsigned short issue_rate; unsigned short branch_cost; unsigned short memory_cost; + unsigned short fmv_cost; bool slow_unaligned_access; }; @@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info = { 1, /* issue_rate */ 3, /* branch_cost */ 5, /* memory_cost */ + 8, /* fmv_cost */ true, /* slow_unaligned_access */ }; @@ -298,6 +300,7 @@ static const struct riscv_tune_param sifive_7_tune_info = { 2, /* issue_rate */ 4, /* branch_cost */ 3, /* memory_cost */ + 8, /* fmv_cost */ true, /* slow_unaligned_access */ }; @@ -311,6 +314,7 @@ static const struct riscv_tune_param thead_c906_tune_info = { 1, /* issue_rate */ 3, /* branch_cost */ 5, /* memory_cost */ + 8, /* fmv_cost */ false, /* slow_unaligned_access */ }; @@ -324,6 +328,7 @@ static const struct riscv_tune_param optimize_size_tune_info = { 1, /* issue_rate */ 1, /* branch_cost */ 2, /* memory_cost */ + 8, /* fmv_cost */ false, /* slow_unaligned_access */ }; @@ -4737,6 +4742,10 @@ static int riscv_register_move_cost (machine_mode mode, reg_class_t from, reg_class_t to) { + if ((from == FP_REGS && to == GR_REGS) || + (from == GR_REGS && to == FP_REGS)) + return tune_param->fmv_cost; + return riscv_secondary_memory_needed (mode, from, to) ? 8 : 2; } diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c b/gcc/testsuite/gcc.target/riscv/pr105666.c new file mode 100644 index 000000000000..904f3bc0763f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c @@ -0,0 +1,55 @@ +/* Shamelessly plugged off gcc/testsuite/gcc.c-torture/execute/pr28982a.c. + + The idea is to induce high register pressure for both int/fp registers + so that they spill. By default FMV instructions would be used to stash + int reg to a fp reg (and vice-versa) but that could be costlier than + spilling to stack. */ + +/* { dg-do compile } */ +/* { dg-options "-march=rv64g -ffast-math" } */ + +#define NITER 4 +#define NVARS 20 +#define MULTI(X) \ + X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), X( 8), X( 9), \ + X(10), X(11), X(12), X(13), X(14), X(15), X(16), X(17), X(18), X(19) + +#define DECLAREI(INDEX) inc##INDEX = incs[INDEX] +#define DECLAREF(INDEX) *ptr##INDEX = ptrs[INDEX], result##INDEX = 5 +#define LOOP(INDEX) result##INDEX += result##INDEX * (*ptr##INDEX), ptr##INDEX += inc##INDEX +#define COPYOUT(INDEX) results[INDEX] = result##INDEX + +double *ptrs[NVARS]; +double results[NVARS]; +int incs[NVARS]; + +void __attribute__((noinline)) +foo (int n) +{ + int MULTI (DECLAREI); + double MULTI (DECLAREF); + while (n--) + MULTI (LOOP); + MULTI (COPYOUT); +} + +double input[NITER * NVARS]; + +int +main (void) +{ + int i; + + for (i = 0; i < NVARS; i++) + ptrs[i] = input + i, incs[i] = i; + for (i = 0; i < NITER * NVARS; i++) + input[i] = i; + foo (NITER); + for (i = 0; i < NVARS; i++) + if (results[i] != i * NITER * (NITER + 1) / 2) + return 1; + return 0; +} + +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */ +/* { dg-final { scan-assembler-not "\tfmv\\.x\\.d\t" } } */