RISC-V: Split unordered FP comparisons into individual RTL insns
Commit Message
We have unordered FP comparisons implemented as RTL insns that produce
multiple machine instructions. Such RTL insns are hard to match with a
processor pipeline description and additionally there is a redundant
SNEZ instruction produced on the result of these comparisons even though
the FLT.fmt and FLE.fmt machine instructions already produce either 0 or
1, e.g.:
long
flt (double x, double y)
{
return __builtin_isless (x, y);
}
with `-O2 -fno-finite-math-only -fno-signaling-nans' gets compiled to:
.globl flt
.type flt, @function
flt:
frflags a5
flt.d a0,fa0,fa1
fsflags a5
snez a0,a0
ret
.size flt, .-flt
because the middle end can't see through the UNSPEC operation unordered
FP comparisons have been defined in terms of.
These instructions are only produced via an expander already, so change
the expander to emit individual RTL insns for each machine instruction
in the ultimate ultimate sequence produced rather than deferring to a
single RTL insn producing the whole sequence at once.
gcc/
* config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant.
(QUIET_PATTERN): New int attribute.
(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4): Emit the intended
RTL insns entirely within the preparation statements.
(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Remove
insns.
(*riscv_fsnvsnan<mode>2): New insn.
gcc/testsuite/
* gcc.target/riscv/fle-ieee.c: New test.
* gcc.target/riscv/fle-snan.c: New test.
* gcc.target/riscv/fle.c: New test.
* gcc.target/riscv/flef-ieee.c: New test.
* gcc.target/riscv/flef-snan.c: New test.
* gcc.target/riscv/flef.c: New test.
* gcc.target/riscv/flt-ieee.c: New test.
* gcc.target/riscv/flt-snan.c: New test.
* gcc.target/riscv/flt.c: New test.
* gcc.target/riscv/fltf-ieee.c: New test.
* gcc.target/riscv/fltf-snan.c: New test.
* gcc.target/riscv/fltf.c: New test.
---
Hi,
I think it is a step in the right direction, however ultimately I think
we ought to actually tell GCC about the IEEE exception flags, so that the
compiler can track data dependencies and we do not have to resort to
UNSPECs which the compiler cannot see through. E.g. for a piece of code
like:
long
fltlt (double x, double y, double z)
{
return __builtin_isless (x, y) + __builtin_isless (x, z);
}
(using an addition here for clarity because for a logical operation even
more horror is produced) we get:
.globl fltlt
.type fltlt, @function
fltlt:
frflags a5 # 8 [c=4 l=4] riscv_frflags
flt.d a0,fa0,fa1 # 9 [c=4 l=4] *cstoredfdi4
fsflags a5 # 10 [c=0 l=4] riscv_fsflags
frflags a4 # 16 [c=4 l=4] riscv_frflags
flt.d a5,fa0,fa2 # 17 [c=4 l=4] *cstoredfdi4
fsflags a4 # 18 [c=0 l=4] riscv_fsflags
addw a0,a0,a5 # 30 [c=8 l=4] *addsi3_extended/0
ret # 40 [c=0 l=4] simple_return
.size fltlt, .-fltlt
where the middle FSFLAGS/FRFLAGS pair makes no sense of course and is a
waste of both space and cycles.
I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs
makes any noticeable performance difference, but I suspect it does not as
the compiler could not do much about the original multiple-instruction
single RTL insns anyway.
No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc
testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for
glibc testsuite.")). OK to apply?
Maciej
---
gcc/config/riscv/riscv.md | 67 +++++++++++++++--------------
gcc/testsuite/gcc.target/riscv/fle-ieee.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fle-snan.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fle.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flef-ieee.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flef-snan.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flef.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flt-ieee.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flt-snan.c | 12 +++++
gcc/testsuite/gcc.target/riscv/flt.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fltf-ieee.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fltf-snan.c | 12 +++++
gcc/testsuite/gcc.target/riscv/fltf.c | 12 +++++
13 files changed, 179 insertions(+), 32 deletions(-)
gcc-riscv-fcmp-split.diff
Comments
On Thu, 9 Jun 2022, Maciej W. Rozycki wrote:
> I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs
> makes any noticeable performance difference, but I suspect it does not as
> the compiler could not do much about the original multiple-instruction
> single RTL insns anyway.
This has now finally completed. I used SPECfp 2006 built at `-O3' and
statically linked, which needs ~33 hours per run with the HiFive Unmatched
board at its standard 1196MHz clock rate. Here are the results merged by
hand from original reports:
Base Base Base Base Est Base Est Base Est Base
Benchmarks Ref. Run Time Run Time Run Time Ratio Ratio Ratio
(base) (length) (split) (base) (length) (split)
------------- ------ ------- ------- ------- -------- -------- --------
410.bwaves 13590 10353 10396 10370 1.31 1.31 1.31
416.gamess 19580 9080 9410 9284 2.16 2.08 2.11
433.milc 9180 5465 5475 5610 1.68 1.68 1.64
434.zeusmp 9100 5773 5767 5761 1.58 1.58 1.58
435.gromacs 7140 3605 3561 3545 1.98 2.00 2.01
436.cactusADM 11950 7779 7658 7680 1.54 1.56 1.56
437.leslie3d 9400 10280 10697 10274 0.914 0.879 0.915
444.namd 8020 3141 3120 3129 2.55 2.57 2.56
447.dealII 11440 3459 3490 3495 3.31 3.28 3.27
450.soplex 8340 4698 4899 4781 1.78 1.70 1.74
453.povray 5320 1953 1922 1916 2.72 2.77 2.78
454.calculix 8250 4844 4857 4821 1.70 1.70 1.71
459.GemsFDTD 10610 8703 8957 9028 1.22 1.18 1.18
465.tonto 9840 4585 4539 4620 2.15 2.17 2.13
470.lbm 13740 10172 10945 10789 1.35 1.26 1.27
481.wrf 11170 8516 8646 8584 1.31 1.29 1.30
482.sphinx3 19490 9240 9267 9280 2.11 2.10 2.10
==============================================================================
The execution time reference (second column) is for a Sun Ultra Enterprise
2 system from 1997, based on a 296MHz UltraSPARC II CPU, times are given
in seconds (lower is better) and the ratios calculated are in relation to
the reference (higher is better).
In the table above "base" results are with upstream master as at commit
7b98910406b5 ("c++: ICE with template NEW_EXPR [PR105803]". Then "length"
results are with commit 72b185189f91 ("RISC-V: Reset the length to the
default of 4 for FP comparisons") applied on top, as it does make changes
to code produced even at `-O3' (where size matters less than speed), e.g.:
46b2c: 8d01b787 fld fa5,-1840(gp) # 7760a8 <__SDATA_BEGIN__+0xd0>
- 46b30: 66f4b027 fsd fa5,1632(s1)
- 46b34: a029 j 46b3e <gciinp_+0x124>
- 46b36: 8c01b787 fld fa5,-1856(gp) # 776098 <__SDATA_BEGIN__+0xc0>
- 46b3a: 66f4b027 fsd fa5,1632(s1)
- 46b3e: 00ab67b7 lui a5,0xab6
- 46b42: 0a07b707 fld fa4,160(a5) # ab60a0 <runopt_>
- 46b46: 8d81b787 fld fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
- 46b4a: a2f727d3 feq.d a5,fa4,fa5
- 46b4e: 18079fe3 bnez a5,474ec <gciinp_+0xad2>
- 46b52: 00afd7b7 lui a5,0xafd
- 46b56: 4607a703 lw a4,1120(a5) # afd460 <symtry_+0x47340>
- 46b5a: 4785 li a5,1
- 46b5c: 18f708e3 beq a4,a5,474ec <gciinp_+0xad2>
- 46b60: 00aaeab7 lui s5,0xaae
- 46b64: d70a8a93 addi s5,s5,-656 # aadd70 <infoa_>
- 46b68: 008aa783 lw a5,8(s5)
- 46b6c: 8301b707 fld fa4,-2000(gp) # 776008 <__SDATA_BEGIN__+0x30>
- 46b70: 37fd addiw a5,a5,-1
+ 46b30: 00ab67b7 lui a5,0xab6
+ 46b34: 0a07b707 fld fa4,160(a5) # ab60a0 <runopt_>
+ 46b38: 66f4b027 fsd fa5,1632(s1)
+ 46b3c: 8d81b787 fld fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
+ 46b40: a2f727d3 feq.d a5,fa4,fa5
+ 46b44: c39d beqz a5,46b6a <gciinp_+0x150>
+ 46b46: 8901b787 fld fa5,-1904(gp) # 776068 <__SDATA_BEGIN__+0x90>
+ 46b4a: 66f4b027 fsd fa5,1632(s1)
+ 46b4e: a02d j 46b78 <gciinp_+0x15e>
+ 46b50: 8c01b787 fld fa5,-1856(gp) # 776098 <__SDATA_BEGIN__+0xc0>
+ 46b54: 66f4b027 fsd fa5,1632(s1)
+ 46b58: 00ab67b7 lui a5,0xab6
+ 46b5c: 0a07b707 fld fa4,160(a5) # ab60a0 <runopt_>
+ 46b60: 8d81b787 fld fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
+ 46b64: a2f727d3 feq.d a5,fa4,fa5
+ 46b68: fff9 bnez a5,46b46 <gciinp_+0x12c>
+ 46b6a: 00afd7b7 lui a5,0xafd
+ 46b6e: 4607a703 lw a4,1120(a5) # afd460 <symtry_+0x47340>
+ 46b72: 4785 li a5,1
+ 46b74: fcf709e3 beq a4,a5,46b46 <gciinp_+0x12c>
+ 46b78: 00aaeab7 lui s5,0xaae
+ 46b7c: d70a8a93 addi s5,s5,-656 # aadd70 <infoa_>
+ 46b80: 008aa783 lw a5,8(s5)
+ 46b84: 8301b707 fld fa4,-2000(gp) # 776008 <__SDATA_BEGIN__+0x30>
+ 46b88: 37fd addiw a5,a5,-1
And finally "split" is with this patch also applied, changing code in
places as well, e.g.:
@@ -4873598,13 +4873598,13 @@
5f5744: 87bf70ef jal ra,5ecfbe <_gfortrani_internal_error>
5f5748: 8281b407 fld fs0,-2008(gp) # 776000 <__SDATA_BEGIN__+0x28>
5f574c: 221c fld fa5,0(a2)
- 5f574e: 0079a7b7 lui a5,0x79a
- 5f5752: ac22 fsd fs0,24(sp)
- 5f5754: a83e fsd fa5,16(sp)
- 5f5756: 27c2 fld fa5,16(sp)
- 5f5758: 4907b707 fld fa4,1168(a5) # 79a490 <__global_pointer$+0x23cb8>
- 5f575c: 22f7a7d3 fabs.d fa5,fa5
- 5f5760: 00102773 frflags a4
+ 5f574e: ac22 fsd fs0,24(sp)
+ 5f5750: a83e fsd fa5,16(sp)
+ 5f5752: 27c2 fld fa5,16(sp)
+ 5f5754: 22f7a7d3 fabs.d fa5,fa5
+ 5f5758: 00102773 frflags a4
+ 5f575c: 0079a7b7 lui a5,0x79a
+ 5f5760: 4907b707 fld fa4,1168(a5) # 79a490 <__global_pointer$+0x23cb8>
5f5764: a2e787d3 fle.d a5,fa5,fa4
5f5768: 00171073 fsflags a4
5f576c: 2c078363 beqz a5,5f5a32 <determine_en_precision+0x328>
or:
@@ -4909410,9 +4909410,9 @@
60eb8a: a2f696d3 flt.d a3,fa3,fa5
60eb8e: 00161073 fsflags a2
60eb92: ee81 bnez a3,60ebaa <__hypot+0x58>
- 60eb94: f20707d3 fmv.d.x fa5,a4
- 60eb98: 22f7a7d3 fabs.d fa5,fa5
- 60eb9c: 00102673 frflags a2
+ 60eb94: 00102673 frflags a2
+ 60eb98: f20707d3 fmv.d.x fa5,a4
+ 60eb9c: 22f7a7d3 fabs.d fa5,fa5
60eba0: a2f696d3 flt.d a3,fa3,fa5
60eba4: 00161073 fsflags a2
60eba8: c29d beqz a3,60ebce <__hypot+0x7c>
(so no arithmetic FP instructions appear to be scheduled between FSFLAGS
and FRFLAGS, though it's not clear to me how the compiler knows it is not
allowed do it) or finally:
- 66204: 52cd754b fnmsub.d fa0,fs10,fa2,fa0
- 66208: 40157553 fcvt.s.d fa0,fa0
- 6620c: a0e517d3 flt.s a5,fa0,fa4
- 66210: 58079263 bnez a5,66794 <do_cg+0xd20>
- 66214: 00102773 frflags a4
- 66218: a0e517d3 flt.s a5,fa0,fa4
- 6621c: 00171073 fsflags a4
- 66220: 220793e3 bnez a5,66c46 <do_cg+0x11d2>
- 66224: 580576d3 fsqrt.s fa3,fa0
+ 6620c: 52cd754b fnmsub.d fa0,fs10,fa2,fa0
+ 66210: 40157553 fcvt.s.d fa0,fa0
+ 66214: a0e517d3 flt.s a5,fa0,fa4
+ 66218: 58079063 bnez a5,66798 <do_cg+0xd1c>
+ 6621c: 00102773 frflags a4
+ 66220: 00171073 fsflags a4
+ 66224: 220793e3 bnez a5,66c4a <do_cg+0x11ce>
+ 66228: 580576d3 fsqrt.s fa3,fa0
(at least removing a redundant FLT.S instruction, although this doesn't
seem optimal anyway as there appears no way for the second BNEZ branch to
be ever taken, but I gather that's an unfortunate consequence of the
volatility of `riscv_frflags'/`riscv_fsflags' RTL insns) and I was able to
spot a place where an FMV.D instruction has been removed too, indicating a
better register allocation.
Results quoted above seem to suggest that in some cases a performance
regression has resulted from the change, but that may not necessarily be
the case given that the benchmarks have been run on a live even if lightly
loaded Linux system. Obtaining standard three samples would require ~4.5
days per SPECfp 2006 iteration or almost a fortnight total.
Therefore I chose to rerun only one of the worst offenders and the
results are as follows:
Estimated
Base Base Base
Benchmarks Ref. Run Time Ratio
-------------- ------ --------- ---------
416.gamess 19580 9138 2.14 S
416.gamess 19580 9498 2.06 S
416.gamess 19580 9478 2.07 *
corresponding to the "split" result earlier on. So the variation between
runs is similar to the supposed loss of performance and therefore I think
we do not need to be concerned. If there's anything that we're missing,
it's the tracking of IEEE exception flags, as I previously mentioned.
I did not run benchmarking for `-fsignaling-nans'. Relative figures are
expected to be similar as the only difference is the presence of a FEQ.fmt
instruction following FSFLAGS. I've spotted this anomaly however:
- 3c670: 5a057553 fsqrt.d fa0,fa0
- 3c674: f20006d3 fmv.d.x fa3,zero
- 3c678: 8d01b787 fld fa5,-1840(gp) # c5678 <__SDATA_BEGIN__+0xd0>
- 3c67c: 0ad57553 fsub.d fa0,fa0,fa3
- 3c680: a2a797d3 flt.d a5,fa5,fa0
- 3c684: e3cd bnez a5,3c726 <_ZN9ResultSet5checkEv+0xe6>
- 3c686: 8d81b787 fld fa5,-1832(gp) # c5680 <__SDATA_BEGIN__+0xd8>
- 3c68a: a2f517d3 flt.d a5,fa0,fa5
- 3c68e: efc1 bnez a5,3c726 <_ZN9ResultSet5checkEv+0xe6>
- 3c690: 3578 fld fa4,232(a0)
- 3c692: 317c fld fa5,224(a0)
- 3c694: 3968 fld fa0,240(a0)
- 3c696: 12e77753 fmul.d fa4,fa4,fa4
- 3c69a: 72f7f7c3 fmadd.d fa5,fa5,fa5,fa4
- 3c69e: 7aa57543 fmadd.d fa0,fa0,fa0,fa5
- 3c6a2: 00102773 frflags a4
- 3c6a6: a2d517d3 flt.d a5,fa0,fa3
- 3c6aa: 00171073 fsflags a4
- 3c6ae: a2d52053 feq.d zero,fa0,fa3
- 3c6b2: efc9 bnez a5,3c74c <_ZN9ResultSet5checkEv+0x10c>
- 3c6b4: 5a057553 fsqrt.d fa0,fa0
+ 3c66e: 5a057553 fsqrt.d fa0,fa0
+ 3c672: f20006d3 fmv.d.x fa3,zero
+ 3c676: 8d01b787 fld fa5,-1840(gp) # c5678 <__SDATA_BEGIN__+0xd0>
+ 3c67a: 0ad57553 fsub.d fa0,fa0,fa3
+ 3c67e: a2a797d3 flt.d a5,fa5,fa0
+ 3c682: e7cd bnez a5,3c72c <_ZN9ResultSet5checkEv+0xee>
+ 3c684: 8d81b787 fld fa5,-1832(gp) # c5680 <__SDATA_BEGIN__+0xd8>
+ 3c688: a2f517d3 flt.d a5,fa0,fa5
+ 3c68c: e3c5 bnez a5,3c72c <_ZN9ResultSet5checkEv+0xee>
+ 3c68e: 3578 fld fa4,232(a0)
+ 3c690: 317c fld fa5,224(a0)
+ 3c692: 3968 fld fa0,240(a0)
+ 3c694: 12e77753 fmul.d fa4,fa4,fa4
+ 3c698: 72f7f7c3 fmadd.d fa5,fa5,fa5,fa4
+ 3c69c: 7aa57543 fmadd.d fa0,fa0,fa0,fa5
+ 3c6a0: 00102773 frflags a4
+ 3c6a4: a2d517d3 flt.d a5,fa0,fa3
+ 3c6a8: 00171073 fsflags a4
+ 3c6ac: f20007d3 fmv.d.x fa5,zero
+ 3c6b0: a2f52053 feq.d zero,fa0,fa5
+ 3c6b4: efd9 bnez a5,3c752 <_ZN9ResultSet5checkEv+0x114>
+ 3c6b6: 5a057553 fsqrt.d fa0,fa0
where the compiler for some reason cannot realise it already has the value
of 0.0 available in fa3 and instead uses an extra move to fa5 for the
final FEQ.D.
> No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc
> testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for
> glibc testsuite.")). OK to apply?
Any comments on the change, anyone?
Maciej
Hi Maciej:
Thanks for detail analysis and performance number report, I am concern
about this patch might let compiler schedule the fsflags/frflags with
other floating point instructions, and the major issue is we didn't
model fflags right in GCC as you mentioned in the first email.
So I think we should model this right before we split that, I guess
that would be a bunch of work:
1. Add fflags to the hard register list.
2. Add (clobber (reg fflags)) or (set (reg fflags) (fpop
(operands...))) to those floating point operations which might change
fflags
3. Rewrite riscv_frflags and riscv_fsflags pattern by right RTL
pattern: (set (reg) (reg fflags)) and (set (reg fflags) (reg)).
4. Then split *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default and
*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan pattern.
However I am not sure about the code gen impact of 2, especially the
impact to the combine pass, not sure if you are interested to give a
try?
And, I did some hack for part of this approach (1+3+4) got following
result for "__builtin_isless (x, y) + __builtin_isless (x, z)":
fltlt:
frflags a4 # 8 [c=4 l=4] riscv_frflags
flt.d a5,fa0,fa1 # 14 [c=4 l=4] *cstoredfdi4
flt.d a0,fa0,fa2 # 17 [c=4 l=4] *cstoredfdi4
fsflags a4 # 18 [c=4 l=4] riscv_fsflags
add a0,a0,a5 # 30 [c=4 l=4] adddi3/0
ret # 40 [c=0 l=4] simple_return
Verbose version:
fltlt:
#(insn 8 5 9 (set (reg:SI 14 a4 [88])
# (reg:SI 66 fflags)) "x.c":5:10 258 {riscv_frflags}
# (expr_list:REG_DEAD (reg:SI 66 fflags)
# (nil)))
frflags a4 # 8 [c=4 l=4] riscv_frflags
#(insn 14 11 15 (parallel [
# (set (reg:DI 15 a5 [90])
# (lt:DI (reg/v:DF 42 fa0 [orig:81 x ] [81])
# (reg:DF 43 fa1 [101])))
# (clobber:SI (reg:SI 66 fflags))
# ]) "x.c":5:10 197 {*cstoredfdi4}
# (expr_list:REG_DEAD (reg:DF 43 fa1 [101])
# (expr_list:REG_UNUSED (reg:SI 66 fflags)
# (nil))))
flt.d a5,fa0,fa1 # 14 [c=4 l=4] *cstoredfdi4
#(insn 17 15 18 (parallel [
# (set (reg:DI 10 a0 [94])
# (lt:DI (reg/v:DF 42 fa0 [orig:81 x ] [81])
# (reg:DF 44 fa2 [102])))
# (clobber:SI (reg:SI 66 fflags))
# ]) "x.c":5:36 197 {*cstoredfdi4}
# (expr_list:REG_DEAD (reg:DF 44 fa2 [102])
# (expr_list:REG_DEAD (reg/v:DF 42 fa0 [orig:81 x ] [81])
# (expr_list:REG_UNUSED (reg:SI 66 fflags)
# (nil)))))
flt.d a0,fa0,fa2 # 17 [c=4 l=4] *cstoredfdi4
#(insn 18 17 19 (set (reg:SI 66 fflags)
# (reg:SI 14 a4 [88])) "x.c":5:36 259 {riscv_fsflags}
# (expr_list:REG_DEAD (reg:SI 14 a4 [88])
# (nil)))
fsflags a4 # 18 [c=4 l=4] riscv_fsflags
#(insn 30 25 31 (set (reg/i:DI 10 a0)
# (plus:DI (reg:DI 10 a0 [94])
# (reg:DI 15 a5 [90]))) "x.c":6:1 4 {adddi3}
# (expr_list:REG_DEAD (reg:DI 15 a5 [90])
# (nil)))
add a0,a0,a5 # 30 [c=4 l=4] adddi3/0
#(jump_insn 40 39 41 (simple_return) "x.c":6:1 244 {simple_return}
# (nil)
# -> simple_return)
ret # 40 [c=0 l=4] simple_return
----
But this hack add an extra use of fflags to prevent FFLAGS getting
CSEed, patch attached.
===================================================================
@@ -57,6 +57,7 @@
;; Floating-point unspecs.
UNSPECV_FRFLAGS
UNSPECV_FSFLAGS
+ UNSPECV_FSNVSNAN
;; Interrupt handler instructions.
UNSPECV_MRET
@@ -360,6 +361,7 @@
;; Iterator and attributes for quiet comparisons.
(define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
(define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")])
+(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")])
;; This code iterator allows signed and unsigned widening multiplications
;; to use the same template.
@@ -2326,39 +2328,31 @@
(set_attr "mode" "<UNITMODE>")])
(define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4"
- [(parallel [(set (match_operand:X 0 "register_operand")
- (unspec:X
- [(match_operand:ANYF 1 "register_operand")
- (match_operand:ANYF 2 "register_operand")]
- QUIET_COMPARISON))
- (clobber (match_scratch:X 3))])]
- "TARGET_HARD_FLOAT")
-
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default"
- [(set (match_operand:X 0 "register_operand" "=r")
- (unspec:X
- [(match_operand:ANYF 1 "register_operand" " f")
- (match_operand:ANYF 2 "register_operand" " f")]
- QUIET_COMPARISON))
- (clobber (match_scratch:X 3 "=&r"))]
- "TARGET_HARD_FLOAT && ! HONOR_SNANS (<ANYF:MODE>mode)"
- "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3"
- [(set_attr "type" "fcmp")
- (set_attr "mode" "<UNITMODE>")
- (set (attr "length") (const_int 12))])
+ [(set (match_operand:X 0 "register_operand")
+ (unspec:X [(match_operand:ANYF 1 "register_operand")
+ (match_operand:ANYF 2 "register_operand")]
+ QUIET_COMPARISON))]
+ "TARGET_HARD_FLOAT"
+{
+ rtx op0 = operands[0];
+ rtx op1 = operands[1];
+ rtx op2 = operands[2];
+ rtx tmp = gen_reg_rtx (SImode);
+ rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2);
+ rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx),
+ UNSPECV_FRFLAGS);
+ rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp),
+ UNSPECV_FSFLAGS);
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan"
- [(set (match_operand:X 0 "register_operand" "=r")
- (unspec:X
- [(match_operand:ANYF 1 "register_operand" " f")
- (match_operand:ANYF 2 "register_operand" " f")]
- QUIET_COMPARISON))
- (clobber (match_scratch:X 3 "=&r"))]
- "TARGET_HARD_FLOAT && HONOR_SNANS (<ANYF:MODE>mode)"
- "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.<fmt>\tzero,%1,%2"
- [(set_attr "type" "fcmp")
- (set_attr "mode" "<UNITMODE>")
- (set (attr "length") (const_int 16))])
+ emit_insn (gen_rtx_SET (tmp, frflags));
+ emit_insn (gen_rtx_SET (op0, cmp));
+ emit_insn (fsflags);
+ if (HONOR_SNANS (<ANYF:MODE>mode))
+ emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode,
+ gen_rtvec (2, op1, op2),
+ UNSPECV_FSNVSNAN));
+ DONE;
+})
(define_insn "*seq_zero_<X:mode><GPR:mode>"
[(set (match_operand:GPR 0 "register_operand" "=r")
@@ -2766,6 +2760,15 @@
"TARGET_HARD_FLOAT"
"fsflags\t%0")
+(define_insn "*riscv_fsnvsnan<mode>2"
+ [(unspec_volatile [(match_operand:ANYF 0 "register_operand")
+ (match_operand:ANYF 1 "register_operand")]
+ UNSPECV_FSNVSNAN)]
+ "TARGET_HARD_FLOAT"
+ "feq.<fmt>\tzero,%0,%1"
+ [(set_attr "type" "fcmp")
+ (set_attr "mode" "<UNITMODE>")])
+
(define_insn "riscv_mret"
[(return)
(unspec_volatile [(const_int 0)] UNSPECV_MRET)]
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+fle (double x, double y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+fle (double x, double y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+fle (double x, double y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+flef (float x, float y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+flef (float x, float y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+flef (float x, float y)
+{
+ return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+flt (double x, double y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+flt (double x, double y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+flt (double x, double y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
===================================================================
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+fltf (float x, float y)
+{
+ return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */