RISC-V: Split unordered FP comparisons into individual RTL insns

Message ID alpine.DEB.2.20.2206082354490.10833@tpp.orcam.me.uk
State Deferred, archived
Headers
Series RISC-V: Split unordered FP comparisons into individual RTL insns |

Commit Message

Maciej W. Rozycki June 9, 2022, 1:44 p.m. UTC
  We have unordered FP comparisons implemented as RTL insns that produce 
multiple machine instructions.  Such RTL insns are hard to match with a 
processor pipeline description and additionally there is a redundant 
SNEZ instruction produced on the result of these comparisons even though 
the FLT.fmt and FLE.fmt machine instructions already produce either 0 or 
1, e.g.:

long
flt (double x, double y)
{
  return __builtin_isless (x, y);
}

with `-O2 -fno-finite-math-only -fno-signaling-nans' gets compiled to:

	.globl	flt
	.type	flt, @function
flt:
	frflags	a5
	flt.d	a0,fa0,fa1
	fsflags	a5
	snez	a0,a0
	ret
	.size	flt, .-flt

because the middle end can't see through the UNSPEC operation unordered 
FP comparisons have been defined in terms of.

These instructions are only produced via an expander already, so change 
the expander to emit individual RTL insns for each machine instruction 
in the ultimate ultimate sequence produced rather than deferring to a 
single RTL insn producing the whole sequence at once.

	gcc/
	* config/riscv/riscv.md (UNSPECV_FSNVSNAN): New constant.
	(QUIET_PATTERN): New int attribute.
	(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4): Emit the intended 
	RTL insns entirely within the preparation statements.
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Remove 
	insns.
	(*riscv_fsnvsnan<mode>2): New insn.

	gcc/testsuite/
	* gcc.target/riscv/fle-ieee.c: New test.
	* gcc.target/riscv/fle-snan.c: New test.
	* gcc.target/riscv/fle.c: New test.
	* gcc.target/riscv/flef-ieee.c: New test.
	* gcc.target/riscv/flef-snan.c: New test.
	* gcc.target/riscv/flef.c: New test.
	* gcc.target/riscv/flt-ieee.c: New test.
	* gcc.target/riscv/flt-snan.c: New test.
	* gcc.target/riscv/flt.c: New test.
	* gcc.target/riscv/fltf-ieee.c: New test.
	* gcc.target/riscv/fltf-snan.c: New test.
	* gcc.target/riscv/fltf.c: New test.
---
Hi,

 I think it is a step in the right direction, however ultimately I think 
we ought to actually tell GCC about the IEEE exception flags, so that the 
compiler can track data dependencies and we do not have to resort to 
UNSPECs which the compiler cannot see through.  E.g. for a piece of code 
like:

long
fltlt (double x, double y, double z)
{
  return __builtin_isless (x, y) + __builtin_isless (x, z);
}

(using an addition here for clarity because for a logical operation even 
more horror is produced) we get:

	.globl	fltlt
	.type	fltlt, @function
fltlt:
	frflags	a5	# 8	[c=4 l=4]  riscv_frflags
	flt.d	a0,fa0,fa1	# 9	[c=4 l=4]  *cstoredfdi4
	fsflags	a5	# 10	[c=0 l=4]  riscv_fsflags
	frflags	a4	# 16	[c=4 l=4]  riscv_frflags
	flt.d	a5,fa0,fa2	# 17	[c=4 l=4]  *cstoredfdi4
	fsflags	a4	# 18	[c=0 l=4]  riscv_fsflags
	addw	a0,a0,a5	# 30	[c=8 l=4]  *addsi3_extended/0
	ret		# 40	[c=0 l=4]  simple_return
	.size	fltlt, .-fltlt

where the middle FSFLAGS/FRFLAGS pair makes no sense of course and is a 
waste of both space and cycles.

 I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs 
makes any noticeable performance difference, but I suspect it does not as 
the compiler could not do much about the original multiple-instruction 
single RTL insns anyway.

 No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc 
testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for 
glibc testsuite.")).  OK to apply?

  Maciej
---
 gcc/config/riscv/riscv.md                  |   67 +++++++++++++++--------------
 gcc/testsuite/gcc.target/riscv/fle-ieee.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/fle-snan.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/fle.c       |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef-ieee.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef-snan.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/flef.c      |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt-ieee.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt-snan.c  |   12 +++++
 gcc/testsuite/gcc.target/riscv/flt.c       |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf-ieee.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf-snan.c |   12 +++++
 gcc/testsuite/gcc.target/riscv/fltf.c      |   12 +++++
 13 files changed, 179 insertions(+), 32 deletions(-)

gcc-riscv-fcmp-split.diff
  

Comments

Maciej W. Rozycki June 23, 2022, 1:39 p.m. UTC | #1
On Thu, 9 Jun 2022, Maciej W. Rozycki wrote:

>  I'm yet running some benchmarking to see if the use of UNSPEC_VOLATILEs 
> makes any noticeable performance difference, but I suspect it does not as 
> the compiler could not do much about the original multiple-instruction 
> single RTL insns anyway.

 This has now finally completed.  I used SPECfp 2006 built at `-O3' and 
statically linked, which needs ~33 hours per run with the HiFive Unmatched 
board at its standard 1196MHz clock rate.  Here are the results merged by 
hand from original reports:

               Base     Base     Base     Base   Est Base  Est Base  Est Base   
Benchmarks     Ref.   Run Time Run Time Run Time   Ratio     Ratio     Ratio  
                       (base)  (length) (split)   (base)   (length)   (split)  
------------- ------  -------  -------  -------  --------  --------  -------- 
410.bwaves     13590    10353    10396    10370    1.31      1.31      1.31   
416.gamess     19580     9080     9410     9284    2.16      2.08      2.11   
433.milc        9180     5465     5475     5610    1.68      1.68      1.64   
434.zeusmp      9100     5773     5767     5761    1.58      1.58      1.58   
435.gromacs     7140     3605     3561     3545    1.98      2.00      2.01   
436.cactusADM  11950     7779     7658     7680    1.54      1.56      1.56   
437.leslie3d    9400    10280    10697    10274    0.914     0.879     0.915  
444.namd        8020     3141     3120     3129    2.55      2.57      2.56   
447.dealII     11440     3459     3490     3495    3.31      3.28      3.27   
450.soplex      8340     4698     4899     4781    1.78      1.70      1.74   
453.povray      5320     1953     1922     1916    2.72      2.77      2.78   
454.calculix    8250     4844     4857     4821    1.70      1.70      1.71   
459.GemsFDTD   10610     8703     8957     9028    1.22      1.18      1.18   
465.tonto       9840     4585     4539     4620    2.15      2.17      2.13   
470.lbm        13740    10172    10945    10789    1.35      1.26      1.27   
481.wrf        11170     8516     8646     8584    1.31      1.29      1.30   
482.sphinx3    19490     9240     9267     9280    2.11      2.10      2.10   
==============================================================================

The execution time reference (second column) is for a Sun Ultra Enterprise 
2 system from 1997, based on a 296MHz UltraSPARC II CPU, times are given 
in seconds (lower is better) and the ratios calculated are in relation to 
the reference (higher is better).

In the table above "base" results are with upstream master as at commit 
7b98910406b5 ("c++: ICE with template NEW_EXPR [PR105803]".  Then "length" 
results are with commit 72b185189f91 ("RISC-V: Reset the length to the 
default of 4 for FP comparisons") applied on top, as it does make changes 
to code produced even at `-O3' (where size matters less than speed), e.g.:

    46b2c:	8d01b787          	fld	fa5,-1840(gp) # 7760a8 <__SDATA_BEGIN__+0xd0>
-   46b30:	66f4b027          	fsd	fa5,1632(s1)
-   46b34:	a029                	j	46b3e <gciinp_+0x124>
-   46b36:	8c01b787          	fld	fa5,-1856(gp) # 776098 <__SDATA_BEGIN__+0xc0>
-   46b3a:	66f4b027          	fsd	fa5,1632(s1)
-   46b3e:	00ab67b7          	lui	a5,0xab6
-   46b42:	0a07b707          	fld	fa4,160(a5) # ab60a0 <runopt_>
-   46b46:	8d81b787          	fld	fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
-   46b4a:	a2f727d3          	feq.d	a5,fa4,fa5
-   46b4e:	18079fe3          	bnez	a5,474ec <gciinp_+0xad2>
-   46b52:	00afd7b7          	lui	a5,0xafd
-   46b56:	4607a703          	lw	a4,1120(a5) # afd460 <symtry_+0x47340>
-   46b5a:	4785                	li	a5,1
-   46b5c:	18f708e3          	beq	a4,a5,474ec <gciinp_+0xad2>
-   46b60:	00aaeab7          	lui	s5,0xaae
-   46b64:	d70a8a93          	addi	s5,s5,-656 # aadd70 <infoa_>
-   46b68:	008aa783          	lw	a5,8(s5)
-   46b6c:	8301b707          	fld	fa4,-2000(gp) # 776008 <__SDATA_BEGIN__+0x30>
-   46b70:	37fd                	addiw	a5,a5,-1
+   46b30:	00ab67b7          	lui	a5,0xab6
+   46b34:	0a07b707          	fld	fa4,160(a5) # ab60a0 <runopt_>
+   46b38:	66f4b027          	fsd	fa5,1632(s1)
+   46b3c:	8d81b787          	fld	fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
+   46b40:	a2f727d3          	feq.d	a5,fa4,fa5
+   46b44:	c39d                	beqz	a5,46b6a <gciinp_+0x150>
+   46b46:	8901b787          	fld	fa5,-1904(gp) # 776068 <__SDATA_BEGIN__+0x90>
+   46b4a:	66f4b027          	fsd	fa5,1632(s1)
+   46b4e:	a02d                	j	46b78 <gciinp_+0x15e>
+   46b50:	8c01b787          	fld	fa5,-1856(gp) # 776098 <__SDATA_BEGIN__+0xc0>
+   46b54:	66f4b027          	fsd	fa5,1632(s1)
+   46b58:	00ab67b7          	lui	a5,0xab6
+   46b5c:	0a07b707          	fld	fa4,160(a5) # ab60a0 <runopt_>
+   46b60:	8d81b787          	fld	fa5,-1832(gp) # 7760b0 <__SDATA_BEGIN__+0xd8>
+   46b64:	a2f727d3          	feq.d	a5,fa4,fa5
+   46b68:	fff9                	bnez	a5,46b46 <gciinp_+0x12c>
+   46b6a:	00afd7b7          	lui	a5,0xafd
+   46b6e:	4607a703          	lw	a4,1120(a5) # afd460 <symtry_+0x47340>
+   46b72:	4785                	li	a5,1
+   46b74:	fcf709e3          	beq	a4,a5,46b46 <gciinp_+0x12c>
+   46b78:	00aaeab7          	lui	s5,0xaae
+   46b7c:	d70a8a93          	addi	s5,s5,-656 # aadd70 <infoa_>
+   46b80:	008aa783          	lw	a5,8(s5)
+   46b84:	8301b707          	fld	fa4,-2000(gp) # 776008 <__SDATA_BEGIN__+0x30>
+   46b88:	37fd                	addiw	a5,a5,-1

And finally "split" is with this patch also applied, changing code in 
places as well, e.g.:

@@ -4873598,13 +4873598,13 @@
   5f5744:	87bf70ef          	jal	ra,5ecfbe <_gfortrani_internal_error>
   5f5748:	8281b407          	fld	fs0,-2008(gp) # 776000 <__SDATA_BEGIN__+0x28>
   5f574c:	221c                	fld	fa5,0(a2)
-  5f574e:	0079a7b7          	lui	a5,0x79a
-  5f5752:	ac22                	fsd	fs0,24(sp)
-  5f5754:	a83e                	fsd	fa5,16(sp)
-  5f5756:	27c2                	fld	fa5,16(sp)
-  5f5758:	4907b707          	fld	fa4,1168(a5) # 79a490 <__global_pointer$+0x23cb8>
-  5f575c:	22f7a7d3          	fabs.d	fa5,fa5
-  5f5760:	00102773          	frflags	a4
+  5f574e:	ac22                	fsd	fs0,24(sp)
+  5f5750:	a83e                	fsd	fa5,16(sp)
+  5f5752:	27c2                	fld	fa5,16(sp)
+  5f5754:	22f7a7d3          	fabs.d	fa5,fa5
+  5f5758:	00102773          	frflags	a4
+  5f575c:	0079a7b7          	lui	a5,0x79a
+  5f5760:	4907b707          	fld	fa4,1168(a5) # 79a490 <__global_pointer$+0x23cb8>
   5f5764:	a2e787d3          	fle.d	a5,fa5,fa4
   5f5768:	00171073          	fsflags	a4
   5f576c:	2c078363          	beqz	a5,5f5a32 <determine_en_precision+0x328>

or:

@@ -4909410,9 +4909410,9 @@
   60eb8a:	a2f696d3          	flt.d	a3,fa3,fa5
   60eb8e:	00161073          	fsflags	a2
   60eb92:	ee81                	bnez	a3,60ebaa <__hypot+0x58>
-  60eb94:	f20707d3          	fmv.d.x	fa5,a4
-  60eb98:	22f7a7d3          	fabs.d	fa5,fa5
-  60eb9c:	00102673          	frflags	a2
+  60eb94:	00102673          	frflags	a2
+  60eb98:	f20707d3          	fmv.d.x	fa5,a4
+  60eb9c:	22f7a7d3          	fabs.d	fa5,fa5
   60eba0:	a2f696d3          	flt.d	a3,fa3,fa5
   60eba4:	00161073          	fsflags	a2
   60eba8:	c29d                	beqz	a3,60ebce <__hypot+0x7c>

(so no arithmetic FP instructions appear to be scheduled between FSFLAGS 
and FRFLAGS, though it's not clear to me how the compiler knows it is not 
allowed do it) or finally:

-   66204:	52cd754b          	fnmsub.d	fa0,fs10,fa2,fa0
-   66208:	40157553          	fcvt.s.d	fa0,fa0
-   6620c:	a0e517d3          	flt.s	a5,fa0,fa4
-   66210:	58079263          	bnez	a5,66794 <do_cg+0xd20>
-   66214:	00102773          	frflags	a4
-   66218:	a0e517d3          	flt.s	a5,fa0,fa4
-   6621c:	00171073          	fsflags	a4
-   66220:	220793e3          	bnez	a5,66c46 <do_cg+0x11d2>
-   66224:	580576d3          	fsqrt.s	fa3,fa0
+   6620c:	52cd754b          	fnmsub.d	fa0,fs10,fa2,fa0
+   66210:	40157553          	fcvt.s.d	fa0,fa0
+   66214:	a0e517d3          	flt.s	a5,fa0,fa4
+   66218:	58079063          	bnez	a5,66798 <do_cg+0xd1c>
+   6621c:	00102773          	frflags	a4
+   66220:	00171073          	fsflags	a4
+   66224:	220793e3          	bnez	a5,66c4a <do_cg+0x11ce>
+   66228:	580576d3          	fsqrt.s	fa3,fa0

(at least removing a redundant FLT.S instruction, although this doesn't 
seem optimal anyway as there appears no way for the second BNEZ branch to 
be ever taken, but I gather that's an unfortunate consequence of the 
volatility of `riscv_frflags'/`riscv_fsflags' RTL insns) and I was able to 
spot a place where an FMV.D instruction has been removed too, indicating a 
better register allocation.

 Results quoted above seem to suggest that in some cases a performance 
regression has resulted from the change, but that may not necessarily be 
the case given that the benchmarks have been run on a live even if lightly 
loaded Linux system.  Obtaining standard three samples would require ~4.5 
days per SPECfp 2006 iteration or almost a fortnight total.

 Therefore I chose to rerun only one of the worst offenders and the 
results are as follows:

                                  Estimated  
                Base     Base       Base     
Benchmarks      Ref.   Run Time     Ratio    
-------------- ------  ---------  ---------  
416.gamess      19580       9138       2.14 S
416.gamess      19580       9498       2.06 S
416.gamess      19580       9478       2.07 *

corresponding to the "split" result earlier on.  So the variation between 
runs is similar to the supposed loss of performance and therefore I think 
we do not need to be concerned.  If there's anything that we're missing, 
it's the tracking of IEEE exception flags, as I previously mentioned.

 I did not run benchmarking for `-fsignaling-nans'.  Relative figures are 
expected to be similar as the only difference is the presence of a FEQ.fmt 
instruction following FSFLAGS.  I've spotted this anomaly however:

-   3c670:	5a057553          	fsqrt.d	fa0,fa0
-   3c674:	f20006d3          	fmv.d.x	fa3,zero
-   3c678:	8d01b787          	fld	fa5,-1840(gp) # c5678 <__SDATA_BEGIN__+0xd0>
-   3c67c:	0ad57553          	fsub.d	fa0,fa0,fa3
-   3c680:	a2a797d3          	flt.d	a5,fa5,fa0
-   3c684:	e3cd                	bnez	a5,3c726 <_ZN9ResultSet5checkEv+0xe6>
-   3c686:	8d81b787          	fld	fa5,-1832(gp) # c5680 <__SDATA_BEGIN__+0xd8>
-   3c68a:	a2f517d3          	flt.d	a5,fa0,fa5
-   3c68e:	efc1                	bnez	a5,3c726 <_ZN9ResultSet5checkEv+0xe6>
-   3c690:	3578                	fld	fa4,232(a0)
-   3c692:	317c                	fld	fa5,224(a0)
-   3c694:	3968                	fld	fa0,240(a0)
-   3c696:	12e77753          	fmul.d	fa4,fa4,fa4
-   3c69a:	72f7f7c3          	fmadd.d	fa5,fa5,fa5,fa4
-   3c69e:	7aa57543          	fmadd.d	fa0,fa0,fa0,fa5
-   3c6a2:	00102773          	frflags	a4
-   3c6a6:	a2d517d3          	flt.d	a5,fa0,fa3
-   3c6aa:	00171073          	fsflags	a4
-   3c6ae:	a2d52053          	feq.d	zero,fa0,fa3
-   3c6b2:	efc9                	bnez	a5,3c74c <_ZN9ResultSet5checkEv+0x10c>
-   3c6b4:	5a057553          	fsqrt.d	fa0,fa0
+   3c66e:	5a057553          	fsqrt.d	fa0,fa0
+   3c672:	f20006d3          	fmv.d.x	fa3,zero
+   3c676:	8d01b787          	fld	fa5,-1840(gp) # c5678 <__SDATA_BEGIN__+0xd0>
+   3c67a:	0ad57553          	fsub.d	fa0,fa0,fa3
+   3c67e:	a2a797d3          	flt.d	a5,fa5,fa0
+   3c682:	e7cd                	bnez	a5,3c72c <_ZN9ResultSet5checkEv+0xee>
+   3c684:	8d81b787          	fld	fa5,-1832(gp) # c5680 <__SDATA_BEGIN__+0xd8>
+   3c688:	a2f517d3          	flt.d	a5,fa0,fa5
+   3c68c:	e3c5                	bnez	a5,3c72c <_ZN9ResultSet5checkEv+0xee>
+   3c68e:	3578                	fld	fa4,232(a0)
+   3c690:	317c                	fld	fa5,224(a0)
+   3c692:	3968                	fld	fa0,240(a0)
+   3c694:	12e77753          	fmul.d	fa4,fa4,fa4
+   3c698:	72f7f7c3          	fmadd.d	fa5,fa5,fa5,fa4
+   3c69c:	7aa57543          	fmadd.d	fa0,fa0,fa0,fa5
+   3c6a0:	00102773          	frflags	a4
+   3c6a4:	a2d517d3          	flt.d	a5,fa0,fa3
+   3c6a8:	00171073          	fsflags	a4
+   3c6ac:	f20007d3          	fmv.d.x	fa5,zero
+   3c6b0:	a2f52053          	feq.d	zero,fa0,fa5
+   3c6b4:	efd9                	bnez	a5,3c752 <_ZN9ResultSet5checkEv+0x114>
+   3c6b6:	5a057553          	fsqrt.d	fa0,fa0

where the compiler for some reason cannot realise it already has the value 
of 0.0 available in fa3 and instead uses an extra move to fa5 for the 
final FEQ.D.

>  No regressions with the GCC (with and w/o `-fsignaling-nans') and glibc 
> testsuites (as per commit 1fcbfb00fc67 ("RISC-V: Fix -fsignaling-nans for 
> glibc testsuite.")).  OK to apply?

 Any comments on the change, anyone?

  Maciej
  
Kito Cheng June 23, 2022, 4:44 p.m. UTC | #2
Hi Maciej:

Thanks for detail analysis and performance number report, I am concern
about this patch might let compiler schedule the fsflags/frflags with
other floating point instructions, and the major issue is we didn't
model fflags right in GCC as you mentioned in the first email.

So I think we should model this right before we split that, I guess
that would be a bunch of work:
1. Add fflags to the hard register list.
2. Add (clobber (reg fflags)) or (set (reg fflags) (fpop
(operands...))) to those floating point operations which might change
fflags
3. Rewrite riscv_frflags and riscv_fsflags pattern by right RTL
pattern: (set (reg) (reg fflags)) and (set (reg fflags) (reg)).
4. Then split *f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default and
*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan pattern.

However I am not sure about the code gen impact of 2, especially the
impact to the combine pass, not sure if you are interested to give a
try?

And, I did some hack for part of this approach (1+3+4) got following
result for "__builtin_isless (x, y) + __builtin_isless (x, z)":

fltlt:
       frflags a4      # 8     [c=4 l=4]  riscv_frflags
       flt.d   a5,fa0,fa1      # 14    [c=4 l=4]  *cstoredfdi4
       flt.d   a0,fa0,fa2      # 17    [c=4 l=4]  *cstoredfdi4
       fsflags a4      # 18    [c=4 l=4]  riscv_fsflags
       add     a0,a0,a5        # 30    [c=4 l=4]  adddi3/0
       ret             # 40    [c=0 l=4]  simple_return

Verbose version:
fltlt:
#(insn 8 5 9 (set (reg:SI 14 a4 [88])
#        (reg:SI 66 fflags)) "x.c":5:10 258 {riscv_frflags}
#     (expr_list:REG_DEAD (reg:SI 66 fflags)
#        (nil)))
       frflags a4      # 8     [c=4 l=4]  riscv_frflags
#(insn 14 11 15 (parallel [
#            (set (reg:DI 15 a5 [90])
#                (lt:DI (reg/v:DF 42 fa0 [orig:81 x ] [81])
#                    (reg:DF 43 fa1 [101])))
#            (clobber:SI (reg:SI 66 fflags))
#        ]) "x.c":5:10 197 {*cstoredfdi4}
#     (expr_list:REG_DEAD (reg:DF 43 fa1 [101])
#        (expr_list:REG_UNUSED (reg:SI 66 fflags)
#            (nil))))
       flt.d   a5,fa0,fa1      # 14    [c=4 l=4]  *cstoredfdi4
#(insn 17 15 18 (parallel [
#            (set (reg:DI 10 a0 [94])
#                (lt:DI (reg/v:DF 42 fa0 [orig:81 x ] [81])
#                    (reg:DF 44 fa2 [102])))
#            (clobber:SI (reg:SI 66 fflags))
#        ]) "x.c":5:36 197 {*cstoredfdi4}
#     (expr_list:REG_DEAD (reg:DF 44 fa2 [102])
#        (expr_list:REG_DEAD (reg/v:DF 42 fa0 [orig:81 x ] [81])
#            (expr_list:REG_UNUSED (reg:SI 66 fflags)
#                (nil)))))
       flt.d   a0,fa0,fa2      # 17    [c=4 l=4]  *cstoredfdi4
#(insn 18 17 19 (set (reg:SI 66 fflags)
#        (reg:SI 14 a4 [88])) "x.c":5:36 259 {riscv_fsflags}
#     (expr_list:REG_DEAD (reg:SI 14 a4 [88])
#        (nil)))
       fsflags a4      # 18    [c=4 l=4]  riscv_fsflags
#(insn 30 25 31 (set (reg/i:DI 10 a0)
#        (plus:DI (reg:DI 10 a0 [94])
#            (reg:DI 15 a5 [90]))) "x.c":6:1 4 {adddi3}
#     (expr_list:REG_DEAD (reg:DI 15 a5 [90])
#        (nil)))
       add     a0,a0,a5        # 30    [c=4 l=4]  adddi3/0
#(jump_insn 40 39 41 (simple_return) "x.c":6:1 244 {simple_return}
#     (nil)
# -> simple_return)
       ret             # 40    [c=0 l=4]  simple_return
----

But this hack add an extra use of fflags to prevent FFLAGS getting
CSEed, patch attached.
  

Patch

Index: gcc/gcc/config/riscv/riscv.md
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -57,6 +57,7 @@ 
   ;; Floating-point unspecs.
   UNSPECV_FRFLAGS
   UNSPECV_FSFLAGS
+  UNSPECV_FSNVSNAN
 
   ;; Interrupt handler instructions.
   UNSPECV_MRET
@@ -360,6 +361,7 @@ 
 ;; Iterator and attributes for quiet comparisons.
 (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
 (define_int_attr quiet_pattern [(UNSPEC_FLT_QUIET "lt") (UNSPEC_FLE_QUIET "le")])
+(define_int_attr QUIET_PATTERN [(UNSPEC_FLT_QUIET "LT") (UNSPEC_FLE_QUIET "LE")])
 
 ;; This code iterator allows signed and unsigned widening multiplications
 ;; to use the same template.
@@ -2326,39 +2328,31 @@ 
    (set_attr "mode" "<UNITMODE>")])
 
 (define_expand "f<quiet_pattern>_quiet<ANYF:mode><X:mode>4"
-   [(parallel [(set (match_operand:X      0 "register_operand")
-		    (unspec:X
-		     [(match_operand:ANYF 1 "register_operand")
-		      (match_operand:ANYF 2 "register_operand")]
-		     QUIET_COMPARISON))
-	       (clobber (match_scratch:X 3))])]
-  "TARGET_HARD_FLOAT")
-
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default"
-   [(set (match_operand:X      0 "register_operand" "=r")
-	 (unspec:X
-	  [(match_operand:ANYF 1 "register_operand" " f")
-	   (match_operand:ANYF 2 "register_operand" " f")]
-	  QUIET_COMPARISON))
-    (clobber (match_scratch:X 3 "=&r"))]
-  "TARGET_HARD_FLOAT && ! HONOR_SNANS (<ANYF:MODE>mode)"
-  "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3"
-  [(set_attr "type" "fcmp")
-   (set_attr "mode" "<UNITMODE>")
-   (set (attr "length") (const_int 12))])
+   [(set (match_operand:X               0 "register_operand")
+	 (unspec:X [(match_operand:ANYF 1 "register_operand")
+		    (match_operand:ANYF 2 "register_operand")]
+		   QUIET_COMPARISON))]
+  "TARGET_HARD_FLOAT"
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx cmp = gen_rtx_<QUIET_PATTERN> (<X:MODE>mode, op1, op2);
+  rtx frflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, const0_rtx),
+					 UNSPECV_FRFLAGS);
+  rtx fsflags = gen_rtx_UNSPEC_VOLATILE (SImode, gen_rtvec (1, tmp),
+					 UNSPECV_FSFLAGS);
 
-(define_insn "*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan"
-   [(set (match_operand:X      0 "register_operand" "=r")
-	 (unspec:X
-	  [(match_operand:ANYF 1 "register_operand" " f")
-	   (match_operand:ANYF 2 "register_operand" " f")]
-	  QUIET_COMPARISON))
-    (clobber (match_scratch:X 3 "=&r"))]
-  "TARGET_HARD_FLOAT && HONOR_SNANS (<ANYF:MODE>mode)"
-  "frflags\t%3\n\tf<quiet_pattern>.<fmt>\t%0,%1,%2\n\tfsflags\t%3\n\tfeq.<fmt>\tzero,%1,%2"
-  [(set_attr "type" "fcmp")
-   (set_attr "mode" "<UNITMODE>")
-   (set (attr "length") (const_int 16))])
+  emit_insn (gen_rtx_SET (tmp, frflags));
+  emit_insn (gen_rtx_SET (op0, cmp));
+  emit_insn (fsflags);
+  if (HONOR_SNANS (<ANYF:MODE>mode))
+    emit_insn (gen_rtx_UNSPEC_VOLATILE (<ANYF:MODE>mode,
+					gen_rtvec (2, op1, op2),
+					UNSPECV_FSNVSNAN));
+  DONE;
+})
 
 (define_insn "*seq_zero_<X:mode><GPR:mode>"
   [(set (match_operand:GPR       0 "register_operand" "=r")
@@ -2766,6 +2760,15 @@ 
   "TARGET_HARD_FLOAT"
   "fsflags\t%0")
 
+(define_insn "*riscv_fsnvsnan<mode>2"
+  [(unspec_volatile [(match_operand:ANYF 0 "register_operand")
+		     (match_operand:ANYF 1 "register_operand")]
+		    UNSPECV_FSNVSNAN)]
+  "TARGET_HARD_FLOAT"
+  "feq.<fmt>\tzero,%0,%1"
+  [(set_attr "type" "fcmp")
+   (set_attr "mode" "<UNITMODE>")])
+
 (define_insn "riscv_mret"
   [(return)
    (unspec_volatile [(const_int 0)] UNSPECV_MRET)]
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-ieee.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle-snan.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fle.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fle.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+fle (double x, double y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-ieee.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef-snan.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tfle\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flef.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flef.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+flef (float x, float y)
+{
+  return __builtin_islessequal (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:gt|le)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-ieee.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt-snan.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.d\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.d\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/flt.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/flt.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+flt (double x, double y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.d\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-ieee.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fno-signaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^\n\]*\n\tfsflags\t\\1\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf-snan.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-fno-finite-math-only -fsignaling-nans" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tfrflags\t(\[^\n\]*)\n\tflt\\.s\t\[^,\]*,(\[^,\]*),(\[^,\]*)\n\tfsflags\t\\1\n\tfeq\\.s\tzero,\\2,\\3\n" } } */
+/* { dg-final { scan-assembler-not "snez" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/fltf.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/fltf.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-ffinite-math-only" } */
+
+long
+fltf (float x, float y)
+{
+  return __builtin_isless (x, y);
+}
+
+/* { dg-final { scan-assembler "\tf(?:ge|lt)\\.s\t\[^\n\]*\n" } } */
+/* { dg-final { scan-assembler-not "f\[rs\]flags" } } */