[RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.

Message ID 20250218111258.839-1-jinma@linux.alibaba.com
State Rejected
Delegated to: Jeff Law
Headers
Series [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results. |

Checks

Context Check Description
rivoscibot/toolchain-ci-rivos-lint warning Lint failed
rivoscibot/toolchain-ci-rivos-apply-patch success Patch applied
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib success Build passed
rivoscibot/toolchain-ci-rivos-test success Testing passed
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap warning Skipped upon request
linaro-tcwg-bot/tcwg_gcc_build--master-arm warning Skipped upon request
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 warning Skipped upon request
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap warning Skipped upon request

Commit Message

Jin Ma Feb. 18, 2025, 11:12 a.m. UTC
  We overlooked the side effects of the rounding mode in the pattern,
which can impact the result of float_extend and lead to incorrect
optimizations in the final program. This issue likely affects nearly
all similar patterns that involve rounding modes, and the tests in
this patch only highlight one example. It seems challenging to address,
and I only implemented a simple fix, which is not a good way to solve
the problem.

Any comments on this?

gcc/ChangeLog:

	* config/riscv/vector-iterators.md (UNSPEC_VRM): New.
	* config/riscv/vector.md: Use UNSPEC for float_extend.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/bug-11.c: New test.

Reported-by: CunJian Huang <huangcunjian.huang@alibaba-inc.com>
Signed-off-by: Jin Ma <jinma@linux.alibaba.com>
---
 gcc/config/riscv/vector-iterators.md          |  3 +++
 gcc/config/riscv/vector.md                    |  6 +++--
 .../gcc.target/riscv/rvv/base/bug-11.c        | 24 +++++++++++++++++++
 3 files changed, 31 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
  

Comments

Kito Cheng Feb. 18, 2025, 1:44 p.m. UTC | #1
We already have a use of "(reg:SI FRM_REGNUM)" within the pattern, is
it not enough?
I believe the answer is not enough so you propose this patch, so could
you explain a few more about what happened?

(define_insn "@pred_single_widen_<plus_minus:optab><mode>_scalar"
 [(set (match_operand:VWEXTF 0 "register_operand"                "=vd,
vd, vr, vr")
       (if_then_else:VWEXTF
         (unspec:<VM>
           [(match_operand:<VM> 1 "vector_mask_operand"          " vm,
vm,Wc1,Wc1")
            (match_operand 5 "vector_length_operand"
"rvl,rvl,rvl,rvl")
            (match_operand 6 "const_int_operand"                 "  i,
 i,  i,  i")
            (match_operand 7 "const_int_operand"                 "  i,
 i,  i,  i")
            (match_operand 8 "const_int_operand"                 "  i,
 i,  i,  i")
            (match_operand 9 "const_int_operand"                 "  i,
 i,  i,  i")
            (reg:SI VL_REGNUM)
            (reg:SI VTYPE_REGNUM)
            (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)      <-------------here
         (plus_minus:VWEXTF
           (match_operand:VWEXTF 3 "register_operand"            " vr,
vr, vr, vr")
           (float_extend:VWEXTF
             (vec_duplicate:<V_DOUBLE_TRUNC>
               (match_operand:<VSUBEL> 4 "register_operand"      "  f,
 f,  f,  f"))))
         (match_operand:VWEXTF 2 "vector_merge_operand"          " vu,
 0, vu,  0")))]

On Tue, Feb 18, 2025 at 7:14 PM Jin Ma <jinma@linux.alibaba.com> wrote:
>
> We overlooked the side effects of the rounding mode in the pattern,
> which can impact the result of float_extend and lead to incorrect
> optimizations in the final program. This issue likely affects nearly
> all similar patterns that involve rounding modes, and the tests in
> this patch only highlight one example. It seems challenging to address,
> and I only implemented a simple fix, which is not a good way to solve
> the problem.
>
> Any comments on this?
>
> gcc/ChangeLog:
>
>         * config/riscv/vector-iterators.md (UNSPEC_VRM): New.
>         * config/riscv/vector.md: Use UNSPEC for float_extend.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/base/bug-11.c: New test.
>
> Reported-by: CunJian Huang <huangcunjian.huang@alibaba-inc.com>
> Signed-off-by: Jin Ma <jinma@linux.alibaba.com>
> ---
>  gcc/config/riscv/vector-iterators.md          |  3 +++
>  gcc/config/riscv/vector.md                    |  6 +++--
>  .../gcc.target/riscv/rvv/base/bug-11.c        | 24 +++++++++++++++++++
>  3 files changed, 31 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
> index c1bd7397441..bd592f736e2 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -120,6 +120,9 @@ (define_c_enum "unspec" [
>
>    UNSPEC_SF_VFNRCLIP
>    UNSPEC_SF_VFNRCLIPU
> +
> +  ;; Side effects of rounding mode
> +  UNSPEC_VRM
>  ])
>
>  (define_c_enum "unspecv" [
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 8ee43cf0ce1..e971dcdc973 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7135,8 +7135,10 @@ (define_insn "@pred_single_widen_<plus_minus:optab><mode>_scalar"
>           (plus_minus:VWEXTF
>             (match_operand:VWEXTF 3 "register_operand"            " vr, vr, vr, vr")
>             (float_extend:VWEXTF
> -             (vec_duplicate:<V_DOUBLE_TRUNC>
> -               (match_operand:<VSUBEL> 4 "register_operand"      "  f,  f,  f,  f"))))
> +             (unspec:VWEXTF
> +               [(vec_duplicate:<V_DOUBLE_TRUNC>
> +                 (match_operand:<VSUBEL> 4 "register_operand"      "  f,  f,  f,  f"))
> +                 (reg:SI FRM_REGNUM)] UNSPEC_VRM)))
>           (match_operand:VWEXTF 2 "vector_merge_operand"          " vu,  0, vu,  0")))]
>    "TARGET_VECTOR"
>    "vfw<insn>.wf\t%0,%3,%4%p1"
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
> new file mode 100644
> index 00000000000..52d940cb57a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2" } */
> +
> +#include <riscv_vector.h>
> +
> +int main ()
> +{
> +  float data_store = 0;
> +  int8_t mask = 1;
> +  size_t vl = 1;
> +  float data_load = 0.0;
> +  _Float16 data_sub = 0.0;
> +  vint8mf8_t mask_value = __riscv_vle8_v_i8mf8 (&mask, vl);
> +  vbool64_t vmask = __riscv_vmseq_vx_i8mf8_b64 (mask_value, 1, vl);
> +  vfloat32mf2_t vd_load = __riscv_vfmv_v_f_f32mf2 (0, __riscv_vsetvlmax_e32mf2 ());
> +  vfloat32mf2_t vreg_memory = __riscv_vle32_v_f32mf2_tu (vd_load, &data_load, vl);
> +  vfloat32mf2_t vreg = __riscv_vfwsub_wf_f32mf2_rm_tum (vmask, vreg_memory, vreg_memory, data_sub, __RISCV_FRM_RDN, vl);
> +  __riscv_vse32_v_f32mf2 (&data_store, vreg, vl);
> +
> +  __builtin_printf ("%f\n", data_store);
> +  return 0;
> +}
> +
> +/* { dg-output "-0.000000\\s+\n" } */
> --
> 2.25.1
>
  
Jeff Law Feb. 18, 2025, 8:48 p.m. UTC | #2
On 2/18/25 4:12 AM, Jin Ma wrote:
> We overlooked the side effects of the rounding mode in the pattern,
> which can impact the result of float_extend and lead to incorrect
> optimizations in the final program. This issue likely affects nearly
> all similar patterns that involve rounding modes, and the tests in
> this patch only highlight one example. It seems challenging to address,
> and I only implemented a simple fix, which is not a good way to solve
> the problem.
> 
> Any comments on this?
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/vector-iterators.md (UNSPEC_VRM): New.
> 	* config/riscv/vector.md: Use UNSPEC for float_extend.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/bug-11.c: New test.
So as Kito note, the insn you changed already has a reference to the FRM 
it needs -- kept in operands[9].  It seems like your patch, while fixing 
the bug, more likely does so by accident rather than by design.

What I see when I look at the dump files is a deeper issue.


In the .expand dump we have:

> (insn 17 16 18 2 (set (reg:HF 147)
>         (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 -1
>      (nil))
> (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
>         (if_then_else:RVVMF2SF (unspec:RVVMF64BI [
>                     (reg/v:RVVMF64BI 138 [ vmask ])
>                     (const_int 1 [0x1])
>                     (const_int 0 [0])
>                     (const_int 2 [0x2])
>                     (const_int 0 [0])
>                     (const_int 2 [0x2])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                     (reg:SI 69 frm)
>                 ] UNSPEC_VPREDICATE)
>             (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
>                 (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (reg:HF 147))))
>             (reg/v:RVVMF2SF 140 [ vreg_memory ]))) "j.c":14:24 -1
>      (nil))       



Insn 18 does the subtraction with the adjusted rounding mode.  So far, 
so good.  Things look fine at the start of cse1.  But if we look at the 
end of cse1 we have:

> (insn 17 16 18 2 (set (reg:HF 147)
>         (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 136 {*movhf_hardfloat}
>      (nil))
> (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
>         (reg/v:RVVMF2SF 140 [ vreg_memory ])) "j.c":14:24 2786 {*movrvvmf2sf_fract}
>      (expr_list:REG_DEAD (reg:HF 147)
>         (expr_list:REG_DEAD (reg/v:RVVMF2SF 140 [ vreg_memory ])
>             (expr_list:REG_DEAD (reg/v:RVVMF64BI 138 [ vmask ])
>                 (expr_list:REG_DEAD (reg:SI 69 frm)
>                     (nil))))))


Note how CSE replace the arithmetic with a simple copy.  At this point 
things are broken.

I don't see how CSE can make the right decision here; we don't expose 
rounding modes this early and thus CSE has no way to know it can't make 
that kind of replacement.

You patch kindof works, but it seems to me it's more accident than 
design and that we need to fix this in a more general manner.

The natural question is what do other targets do when the rounding mode 
gets changed.  I'm guessing its exposed as a unspec set before the RTL 
optimizers run.

jeff
  
Jin Ma Feb. 19, 2025, 2:30 a.m. UTC | #3
On Tue, 18 Feb 2025 13:48:02 -0700, Jeff Law wrote:
> 
> 
> On 2/18/25 4:12 AM, Jin Ma wrote:
> > We overlooked the side effects of the rounding mode in the pattern,
> > which can impact the result of float_extend and lead to incorrect
> > optimizations in the final program. This issue likely affects nearly
> > all similar patterns that involve rounding modes, and the tests in
> > this patch only highlight one example. It seems challenging to address,
> > and I only implemented a simple fix, which is not a good way to solve
> > the problem.
> > 
> > Any comments on this?
> > 
> > gcc/ChangeLog:
> > 
> > 	* config/riscv/vector-iterators.md (UNSPEC_VRM): New.
> > 	* config/riscv/vector.md: Use UNSPEC for float_extend.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.target/riscv/rvv/base/bug-11.c: New test.
> So as Kito note, the insn you changed already has a reference to the FRM 
> it needs -- kept in operands[9].  It seems like your patch, while fixing 
> the bug, more likely does so by accident rather than by design.
> 
> What I see when I look at the dump files is a deeper issue.
> 
> 
> In the .expand dump we have:
> 
> > (insn 17 16 18 2 (set (reg:HF 147)
> >         (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 -1
> >      (nil))
> > (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
> >         (if_then_else:RVVMF2SF (unspec:RVVMF64BI [
> >                     (reg/v:RVVMF64BI 138 [ vmask ])
> >                     (const_int 1 [0x1])
> >                     (const_int 0 [0])
> >                     (const_int 2 [0x2])
> >                     (const_int 0 [0])
> >                     (const_int 2 [0x2])
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                     (reg:SI 69 frm)
> >                 ] UNSPEC_VPREDICATE)
> >             (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
> >                 (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (reg:HF 147))))
> >             (reg/v:RVVMF2SF 140 [ vreg_memory ]))) "j.c":14:24 -1
> >      (nil))       
> 
> 
> 
> Insn 18 does the subtraction with the adjusted rounding mode.  So far, 
> so good.  Things look fine at the start of cse1.  But if we look at the 
> end of cse1 we have:
> 
> > (insn 17 16 18 2 (set (reg:HF 147)
> >         (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 136 {*movhf_hardfloat}
> >      (nil))
> > (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
> >         (reg/v:RVVMF2SF 140 [ vreg_memory ])) "j.c":14:24 2786 {*movrvvmf2sf_fract}
> >      (expr_list:REG_DEAD (reg:HF 147)
> >         (expr_list:REG_DEAD (reg/v:RVVMF2SF 140 [ vreg_memory ])
> >             (expr_list:REG_DEAD (reg/v:RVVMF64BI 138 [ vmask ])
> >                 (expr_list:REG_DEAD (reg:SI 69 frm)
> >                     (nil))))))
> 
> 
> Note how CSE replace the arithmetic with a simple copy.  At this point 
> things are broken.
> 
> I don't see how CSE can make the right decision here; we don't expose 
> rounding modes this early and thus CSE has no way to know it can't make 
> that kind of replacement.
> 
> You patch kindof works, but it seems to me it's more accident than 
> design and that we need to fix this in a more general manner.
> 
> The natural question is what do other targets do when the rounding mode 
> gets changed.  I'm guessing its exposed as a unspec set before the RTL 
> optimizers run.

I apologize for not explaining things more clearly. I also discovered that
the issue is caused by CSE. I think that during the substitution process,
CSE recognized the syntax of if_then_else and concluded that the expressions
in the "then" and "else" branches are equivalent, resulting in both yielding
(reg/v:RVVMF2SF 140 [ vreg_memory ]):

(minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ]) 
    (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 [0x0.0p+0]))))

is considered equivalent to:

(reg/v:RVVMF2SF 140 [ vreg_memory ])

Clearly, there wasn’t a deeper consideration of the fact that float_extend requires
a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to inform
CSE that we have a rounding mode.

As I mentioned before, this may not be a good solution, as it risks missing other
optimization opportunities. As you pointed out, we need a more general approach
to fix it. Unfortunately, while I’m still trying to find a solution, I currently
don't have any other good ideas.

Best regards,
Jin Ma

> jeff
  
Jeff Law Feb. 19, 2025, 4:40 a.m. UTC | #4
On 2/18/25 7:30 PM, Jin Ma wrote:

> 
> I apologize for not explaining things more clearly. I also discovered that
> the issue is caused by CSE. I think that during the substitution process,
> CSE recognized the syntax of if_then_else and concluded that the expressions
> in the "then" and "else" branches are equivalent, resulting in both yielding
> (reg/v:RVVMF2SF 140 [ vreg_memory ]):
> 
> (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
>      (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 [0x0.0p+0]))))
> 
> is considered equivalent to:
> 
> (reg/v:RVVMF2SF 140 [ vreg_memory ])
> 
> Clearly, there wasn’t a deeper consideration of the fact that float_extend requires
> a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to inform
> CSE that we have a rounding mode.
Right.  It worked, but there's a deeper issue here.

> 
> As I mentioned before, this may not be a good solution, as it risks missing other
> optimization opportunities. As you pointed out, we need a more general approach
> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
> don't have any other good ideas.
Changing the rounding modes isn't common, but it's not unheard of.  My 
suspicion is that we need to expose the rounding mode assignment earlier 
(at RTL generation time).

That may not work well with the current optimization of FRM, but I think 
early exposure is the only viable path forward in my mind.  Depending on 
the depth of the problems it may not be something we can fix in the 
gcc-15 space.

You might experiment with emitting the FRM assignment in the 
insn_expander class in the risc-v backend.  This code:
>     /* Add rounding mode operand.  */
>     if (m_insn_flags & FRM_DYN_P)
>       add_rounding_mode_operand (FRM_DYN);
>     else if (m_insn_flags & FRM_RUP_P)
>       add_rounding_mode_operand (FRM_RUP);
>     else if (m_insn_flags & FRM_RDN_P)
>       add_rounding_mode_operand (FRM_RDN);
>     else if (m_insn_flags & FRM_RMM_P)
>       add_rounding_mode_operand (FRM_RMM);
>     else if (m_insn_flags & FRM_RNE_P)
>       add_rounding_mode_operand (FRM_RNE);
>     else if (m_insn_flags & VXRM_RNU_P)
>       add_rounding_mode_operand (VXRM_RNU);
>     else if (m_insn_flags & VXRM_RDN_P)
>       add_rounding_mode_operand (VXRM_RDN);

For anything other than FRM_DYN_P emit the appropriate insn to set FRM. 
This may generate poor code in the presence of explicit rounding modes, 
but I think something along these lines is ultimately going to be needed.

jeff
  
Robin Dapp Feb. 19, 2025, 8 a.m. UTC | #5
>> As I mentioned before, this may not be a good solution, as it risks missing other
>> optimization opportunities. As you pointed out, we need a more general approach
>> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
>> don't have any other good ideas.
> Changing the rounding modes isn't common, but it's not unheard of.  My 
> suspicion is that we need to expose the rounding mode assignment earlier 
> (at RTL generation time).
>
> That may not work well with the current optimization of FRM, but I think 
> early exposure is the only viable path forward in my mind.  Depending on 
> the depth of the problems it may not be something we can fix in the 
> gcc-15 space.

With -frounding-math CSE doesn't do the replacement.  So we could argue that
a user should specify -frounding-math if they explicitly care about the
behavior.  But on the other hand it's surprising if the user deliberately used
a rounding-mode setting instruction which doesn't work as intended.

Even if we wrapped those instructions in unspecs, couldn't other parts of the
program, that are compiled with the default -fno-roundin-math still lead to
unexpected results?

I  don't see any other way than to "hide" the behavior from optimizers either
in order to prevent folding of such patterns.
  
Jeff Law Feb. 19, 2025, 1:53 p.m. UTC | #6
On 2/19/25 1:00 AM, Robin Dapp wrote:
>>> As I mentioned before, this may not be a good solution, as it risks missing other
>>> optimization opportunities. As you pointed out, we need a more general approach
>>> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
>>> don't have any other good ideas.
>> Changing the rounding modes isn't common, but it's not unheard of.  My
>> suspicion is that we need to expose the rounding mode assignment earlier
>> (at RTL generation time).
>>
>> That may not work well with the current optimization of FRM, but I think
>> early exposure is the only viable path forward in my mind.  Depending on
>> the depth of the problems it may not be something we can fix in the
>> gcc-15 space.
> 
> With -frounding-math CSE doesn't do the replacement.  So we could argue that
> a user should specify -frounding-math if they explicitly care about the
> behavior.  But on the other hand it's surprising if the user deliberately used
> a rounding-mode setting instruction which doesn't work as intended.
> 
> Even if we wrapped those instructions in unspecs, couldn't other parts of the
> program, that are compiled with the default -fno-roundin-math still lead to
> unexpected results?
> 
> I  don't see any other way than to "hide" the behavior from optimizers either
> in order to prevent folding of such patterns.
I didn't even know  the option existed!  Clearly necessary if we're 
using these builtins with non-default rounding modes.

One thought would be to issue a warning when using one of these builtins 
with a non-default mode and -frounding-math disabled.

Another would be to implicitly turn the option on.  I don't particularly 
like this idea, but throwing it out there as a possibility.

jeff
  
Jin Ma Feb. 20, 2025, 2:31 a.m. UTC | #7
On Wed, 19 Feb 2025 21:53:32 +0800, Jeff Law wrote:
> 
> 
> On 2/19/25 1:00 AM, Robin Dapp wrote:
> >>> As I mentioned before, this may not be a good solution, as it risks missing other
> >>> optimization opportunities. As you pointed out, we need a more general approach
> >>> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
> >>> don't have any other good ideas.
> >> Changing the rounding modes isn't common, but it's not unheard of.  My
> >> suspicion is that we need to expose the rounding mode assignment earlier
> >> (at RTL generation time).
> >>
> >> That may not work well with the current optimization of FRM, but I think
> >> early exposure is the only viable path forward in my mind.  Depending on
> >> the depth of the problems it may not be something we can fix in the
> >> gcc-15 space.
> > 
> > With -frounding-math CSE doesn't do the replacement.  So we could argue that
> > a user should specify -frounding-math if they explicitly care about the
> > behavior.  But on the other hand it's surprising if the user deliberately used
> > a rounding-mode setting instruction which doesn't work as intended.
> > 
> > Even if we wrapped those instructions in unspecs, couldn't other parts of the
> > program, that are compiled with the default -fno-roundin-math still lead to
> > unexpected results?
> > 
> > I  don't see any other way than to "hide" the behavior from optimizers either
> > in order to prevent folding of such patterns.
> I didn't even know  the option existed!  Clearly necessary if we're 
> using these builtins with non-default rounding modes.

I wasn't aware of the existence of this option either. These built-ins require it.
I suspect that it makes certain assumptions about the rounding modes in floating-point
calculations, such as in float_extend, which may prevent CSE optimizations. Could
this also lead to lost optimization opportunities in other areas that don't require
this option? I'm not sure.

I suspect that the best approach would be to define relevant
attributes (perhaps similar to -frounding-math) within specific related patterns/built-ins 
to inform optimizers we are using a rounding mode and to avoid over-optimization.

Best regards,
Jin Ma

> One thought would be to issue a warning when using one of these builtins 
> with a non-default mode and -frounding-math disabled.
> 
> Another would be to implicitly turn the option on.  I don't particularly 
> like this idea, but throwing it out there as a possibility.
> 
> jeff
> 
>
  
Jin Ma Feb. 20, 2025, 2:52 a.m. UTC | #8
On Tue, 18 Feb 2025 21:40:06 -0700, Jeff Law wrote:
> 
> 
> On 2/18/25 7:30 PM, Jin Ma wrote:
> 
> > 
> > I apologize for not explaining things more clearly. I also discovered that
> > the issue is caused by CSE. I think that during the substitution process,
> > CSE recognized the syntax of if_then_else and concluded that the expressions
> > in the "then" and "else" branches are equivalent, resulting in both yielding
> > (reg/v:RVVMF2SF 140 [ vreg_memory ]):
> > 
> > (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
> >      (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 [0x0.0p+0]))))
> > 
> > is considered equivalent to:
> > 
> > (reg/v:RVVMF2SF 140 [ vreg_memory ])
> > 
> > Clearly, there wasn鈥檛 a deeper consideration of the fact that float_extend requires
> > a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to inform
> > CSE that we have a rounding mode.
> Right.  It worked, but there's a deeper issue here.
> 
> > 
> > As I mentioned before, this may not be a good solution, as it risks missing other
> > optimization opportunities. As you pointed out, we need a more general approach
> > to fix it. Unfortunately, while I鈥檓 still trying to find a solution, I currently
> > don't have any other good ideas.
> Changing the rounding modes isn't common, but it's not unheard of.  My 
> suspicion is that we need to expose the rounding mode assignment earlier 
> (at RTL generation time).
> 
> That may not work well with the current optimization of FRM, but I think 
> early exposure is the only viable path forward in my mind.  Depending on 
> the depth of the problems it may not be something we can fix in the 
> gcc-15 space.
> 
> You might experiment with emitting the FRM assignment in the 
> insn_expander class in the risc-v backend.  This code:
> >     /* Add rounding mode operand.  */
> >     if (m_insn_flags & FRM_DYN_P)
> >       add_rounding_mode_operand (FRM_DYN);
> >     else if (m_insn_flags & FRM_RUP_P)
> >       add_rounding_mode_operand (FRM_RUP);
> >     else if (m_insn_flags & FRM_RDN_P)
> >       add_rounding_mode_operand (FRM_RDN);
> >     else if (m_insn_flags & FRM_RMM_P)
> >       add_rounding_mode_operand (FRM_RMM);
> >     else if (m_insn_flags & FRM_RNE_P)
> >       add_rounding_mode_operand (FRM_RNE);
> >     else if (m_insn_flags & VXRM_RNU_P)
> >       add_rounding_mode_operand (VXRM_RNU);
> >     else if (m_insn_flags & VXRM_RDN_P)
> >       add_rounding_mode_operand (VXRM_RDN);
> 
> For anything other than FRM_DYN_P emit the appropriate insn to set FRM. 
> This may generate poor code in the presence of explicit rounding modes, 
> but I think something along these lines is ultimately going to be needed.

Are you suggesting that we should emit the rounding mode insn earlier or
incorporate the rounding mode into the pattern (in fact, there are already
operands[9]/reg FRM_REGNUM)? However, this doesn't seem to be effective
because the side effects of the rounding mode do not take effect in
float_extend, and CSE will always optimize away pred_single_widen_subrvvmf2sf_scalar,
just like before :)

Best regards,
Jin Ma

> jeff
  
Xi Ruoyao Feb. 20, 2025, 6:20 a.m. UTC | #9
On Thu, 2025-02-20 at 10:31 +0800, Jin Ma wrote:
> On Wed, 19 Feb 2025 21:53:32 +0800, Jeff Law wrote:
> > 
> > 
> > On 2/19/25 1:00 AM, Robin Dapp wrote:
> > > > > As I mentioned before, this may not be a good solution, as it risks missing other
> > > > > optimization opportunities. As you pointed out, we need a more general approach
> > > > > to fix it. Unfortunately, while I’m still trying to find a solution, I currently
> > > > > don't have any other good ideas.
> > > > Changing the rounding modes isn't common, but it's not unheard of.  My
> > > > suspicion is that we need to expose the rounding mode assignment earlier
> > > > (at RTL generation time).
> > > > 
> > > > That may not work well with the current optimization of FRM, but I think
> > > > early exposure is the only viable path forward in my mind.  Depending on
> > > > the depth of the problems it may not be something we can fix in the
> > > > gcc-15 space.
> > > 
> > > With -frounding-math CSE doesn't do the replacement.  So we could argue that
> > > a user should specify -frounding-math if they explicitly care about the
> > > behavior.  But on the other hand it's surprising if the user deliberately used
> > > a rounding-mode setting instruction which doesn't work as intended.
> > > 
> > > Even if we wrapped those instructions in unspecs, couldn't other parts of the
> > > program, that are compiled with the default -fno-roundin-math still lead to
> > > unexpected results?
> > > 
> > > I  don't see any other way than to "hide" the behavior from optimizers either
> > > in order to prevent folding of such patterns.
> > I didn't even know  the option existed!  Clearly necessary if we're 
> > using these builtins with non-default rounding modes.
> 
> I wasn't aware of the existence of this option either. These built-ins require it.
> I suspect that it makes certain assumptions about the rounding modes in floating-point
> calculations, such as in float_extend, which may prevent CSE optimizations. Could
> this also lead to lost optimization opportunities in other areas that don't require
> this option? I'm not sure.
> 
> I suspect that the best approach would be to define relevant
> attributes (perhaps similar to -frounding-math) within specific related patterns/built-ins 
> to inform optimizers we are using a rounding mode and to avoid over-optimization.

The "special pattern" is supposed to be #pragma STDC FENV_ACCESS that
we've not implemented.  See https://gcc.gnu.org/PR34678.
  
Jeff Law March 1, 2025, 4:40 p.m. UTC | #10
On 2/19/25 11:20 PM, Xi Ruoyao wrote:
> On Thu, 2025-02-20 at 10:31 +0800, Jin Ma wrote:
>> On Wed, 19 Feb 2025 21:53:32 +0800, Jeff Law wrote:
>>>
>>>
>>> On 2/19/25 1:00 AM, Robin Dapp wrote:
>>>>>> As I mentioned before, this may not be a good solution, as it risks missing other
>>>>>> optimization opportunities. As you pointed out, we need a more general approach
>>>>>> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
>>>>>> don't have any other good ideas.
>>>>> Changing the rounding modes isn't common, but it's not unheard of.  My
>>>>> suspicion is that we need to expose the rounding mode assignment earlier
>>>>> (at RTL generation time).
>>>>>
>>>>> That may not work well with the current optimization of FRM, but I think
>>>>> early exposure is the only viable path forward in my mind.  Depending on
>>>>> the depth of the problems it may not be something we can fix in the
>>>>> gcc-15 space.
>>>>
>>>> With -frounding-math CSE doesn't do the replacement.  So we could argue that
>>>> a user should specify -frounding-math if they explicitly care about the
>>>> behavior.  But on the other hand it's surprising if the user deliberately used
>>>> a rounding-mode setting instruction which doesn't work as intended.
>>>>
>>>> Even if we wrapped those instructions in unspecs, couldn't other parts of the
>>>> program, that are compiled with the default -fno-roundin-math still lead to
>>>> unexpected results?
>>>>
>>>> I  don't see any other way than to "hide" the behavior from optimizers either
>>>> in order to prevent folding of such patterns.
>>> I didn't even know  the option existed!  Clearly necessary if we're
>>> using these builtins with non-default rounding modes.
>>
>> I wasn't aware of the existence of this option either. These built-ins require it.
>> I suspect that it makes certain assumptions about the rounding modes in floating-point
>> calculations, such as in float_extend, which may prevent CSE optimizations. Could
>> this also lead to lost optimization opportunities in other areas that don't require
>> this option? I'm not sure.
>>
>> I suspect that the best approach would be to define relevant
>> attributes (perhaps similar to -frounding-math) within specific related patterns/built-ins
>> to inform optimizers we are using a rounding mode and to avoid over-optimization.
> 
> The "special pattern" is supposed to be #pragma STDC FENV_ACCESS that
> we've not implemented.  See https://gcc.gnu.org/PR34678.
Not exactly.  That's a discussion of what it would take to make this all 
work relatively seamlessly, without the need for -frounding-math.

My comment about needing a special pattern was predicated on the 
(incorrect) belief that we already had some support for this baked into 
the optimizers for targets that allow adjusting the rounding mode via 
asms and builtins.  But I was wrong.

So the issue at hand is some of the RISC-V intrinsics depend on 
-frounding-math behavior and what to do about that.  We could:

1. Ignore it and tell users they need to explicitly set -frouding-math

2. Issue a diagnostic if a RISC-V intrinsic that needs rounding math is
used without -frounding-math being enabled.

3. If an intrinsic that needs rounding math is used, set -frounding-math


I tend to lean towards #3 given the state of the world right now.  But I 
haven't actually explored this at all to know if there are any 
unexpected gotchas.

Jeff
>
  
Jeff Law March 1, 2025, 4:41 p.m. UTC | #11
On 2/19/25 7:52 PM, Jin Ma wrote:

> 
> Are you suggesting that we should emit the rounding mode insn earlier or
> incorporate the rounding mode into the pattern (in fact, there are already
> operands[9]/reg FRM_REGNUM)? However, this doesn't seem to be effective
> because the side effects of the rounding mode do not take effect in
> float_extend, and CSE will always optimize away pred_single_widen_subrvvmf2sf_scalar,
> just like before :)
That was my original thought, but as others have pointed out, 
-frounding-math is probably a better path forward.

jeff
  

Patch

diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index c1bd7397441..bd592f736e2 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -120,6 +120,9 @@  (define_c_enum "unspec" [
 
   UNSPEC_SF_VFNRCLIP
   UNSPEC_SF_VFNRCLIPU
+
+  ;; Side effects of rounding mode
+  UNSPEC_VRM
 ])
 
 (define_c_enum "unspecv" [
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 8ee43cf0ce1..e971dcdc973 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7135,8 +7135,10 @@  (define_insn "@pred_single_widen_<plus_minus:optab><mode>_scalar"
 	  (plus_minus:VWEXTF
 	    (match_operand:VWEXTF 3 "register_operand"            " vr, vr, vr, vr")
 	    (float_extend:VWEXTF
-	      (vec_duplicate:<V_DOUBLE_TRUNC>
-		(match_operand:<VSUBEL> 4 "register_operand"      "  f,  f,  f,  f"))))
+	      (unspec:VWEXTF
+		[(vec_duplicate:<V_DOUBLE_TRUNC>
+		  (match_operand:<VSUBEL> 4 "register_operand"      "  f,  f,  f,  f"))
+		  (reg:SI FRM_REGNUM)] UNSPEC_VRM)))
 	  (match_operand:VWEXTF 2 "vector_merge_operand"          " vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "vfw<insn>.wf\t%0,%3,%4%p1"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
new file mode 100644
index 00000000000..52d940cb57a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
@@ -0,0 +1,24 @@ 
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2" } */
+
+#include <riscv_vector.h>
+
+int main ()
+{
+  float data_store = 0;
+  int8_t mask = 1;
+  size_t vl = 1;
+  float data_load = 0.0;
+  _Float16 data_sub = 0.0;
+  vint8mf8_t mask_value = __riscv_vle8_v_i8mf8 (&mask, vl);
+  vbool64_t vmask = __riscv_vmseq_vx_i8mf8_b64 (mask_value, 1, vl);
+  vfloat32mf2_t vd_load = __riscv_vfmv_v_f_f32mf2 (0, __riscv_vsetvlmax_e32mf2 ());
+  vfloat32mf2_t vreg_memory = __riscv_vle32_v_f32mf2_tu (vd_load, &data_load, vl);
+  vfloat32mf2_t vreg = __riscv_vfwsub_wf_f32mf2_rm_tum (vmask, vreg_memory, vreg_memory, data_sub, __RISCV_FRM_RDN, vl);
+  __riscv_vse32_v_f32mf2 (&data_store, vreg, vl);
+
+  __builtin_printf ("%f\n", data_store);
+  return 0;
+}
+
+/* { dg-output "-0.000000\\s+\n" } */