[RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.
Checks
Context |
Check |
Description |
rivoscibot/toolchain-ci-rivos-lint |
warning
|
Lint failed
|
rivoscibot/toolchain-ci-rivos-apply-patch |
success
|
Patch applied
|
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-test |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap |
warning
|
Skipped upon request
|
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
warning
|
Skipped upon request
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
warning
|
Skipped upon request
|
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap |
warning
|
Skipped upon request
|
Commit Message
We overlooked the side effects of the rounding mode in the pattern,
which can impact the result of float_extend and lead to incorrect
optimizations in the final program. This issue likely affects nearly
all similar patterns that involve rounding modes, and the tests in
this patch only highlight one example. It seems challenging to address,
and I only implemented a simple fix, which is not a good way to solve
the problem.
Any comments on this?
gcc/ChangeLog:
* config/riscv/vector-iterators.md (UNSPEC_VRM): New.
* config/riscv/vector.md: Use UNSPEC for float_extend.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/bug-11.c: New test.
Reported-by: CunJian Huang <huangcunjian.huang@alibaba-inc.com>
Signed-off-by: Jin Ma <jinma@linux.alibaba.com>
---
gcc/config/riscv/vector-iterators.md | 3 +++
gcc/config/riscv/vector.md | 6 +++--
.../gcc.target/riscv/rvv/base/bug-11.c | 24 +++++++++++++++++++
3 files changed, 31 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
Comments
We already have a use of "(reg:SI FRM_REGNUM)" within the pattern, is
it not enough?
I believe the answer is not enough so you propose this patch, so could
you explain a few more about what happened?
(define_insn "@pred_single_widen_<plus_minus:optab><mode>_scalar"
[(set (match_operand:VWEXTF 0 "register_operand" "=vd,
vd, vr, vr")
(if_then_else:VWEXTF
(unspec:<VM>
[(match_operand:<VM> 1 "vector_mask_operand" " vm,
vm,Wc1,Wc1")
(match_operand 5 "vector_length_operand"
"rvl,rvl,rvl,rvl")
(match_operand 6 "const_int_operand" " i,
i, i, i")
(match_operand 7 "const_int_operand" " i,
i, i, i")
(match_operand 8 "const_int_operand" " i,
i, i, i")
(match_operand 9 "const_int_operand" " i,
i, i, i")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)
(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) <-------------here
(plus_minus:VWEXTF
(match_operand:VWEXTF 3 "register_operand" " vr,
vr, vr, vr")
(float_extend:VWEXTF
(vec_duplicate:<V_DOUBLE_TRUNC>
(match_operand:<VSUBEL> 4 "register_operand" " f,
f, f, f"))))
(match_operand:VWEXTF 2 "vector_merge_operand" " vu,
0, vu, 0")))]
On Tue, Feb 18, 2025 at 7:14 PM Jin Ma <jinma@linux.alibaba.com> wrote:
>
> We overlooked the side effects of the rounding mode in the pattern,
> which can impact the result of float_extend and lead to incorrect
> optimizations in the final program. This issue likely affects nearly
> all similar patterns that involve rounding modes, and the tests in
> this patch only highlight one example. It seems challenging to address,
> and I only implemented a simple fix, which is not a good way to solve
> the problem.
>
> Any comments on this?
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md (UNSPEC_VRM): New.
> * config/riscv/vector.md: Use UNSPEC for float_extend.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/bug-11.c: New test.
>
> Reported-by: CunJian Huang <huangcunjian.huang@alibaba-inc.com>
> Signed-off-by: Jin Ma <jinma@linux.alibaba.com>
> ---
> gcc/config/riscv/vector-iterators.md | 3 +++
> gcc/config/riscv/vector.md | 6 +++--
> .../gcc.target/riscv/rvv/base/bug-11.c | 24 +++++++++++++++++++
> 3 files changed, 31 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
> index c1bd7397441..bd592f736e2 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -120,6 +120,9 @@ (define_c_enum "unspec" [
>
> UNSPEC_SF_VFNRCLIP
> UNSPEC_SF_VFNRCLIPU
> +
> + ;; Side effects of rounding mode
> + UNSPEC_VRM
> ])
>
> (define_c_enum "unspecv" [
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 8ee43cf0ce1..e971dcdc973 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7135,8 +7135,10 @@ (define_insn "@pred_single_widen_<plus_minus:optab><mode>_scalar"
> (plus_minus:VWEXTF
> (match_operand:VWEXTF 3 "register_operand" " vr, vr, vr, vr")
> (float_extend:VWEXTF
> - (vec_duplicate:<V_DOUBLE_TRUNC>
> - (match_operand:<VSUBEL> 4 "register_operand" " f, f, f, f"))))
> + (unspec:VWEXTF
> + [(vec_duplicate:<V_DOUBLE_TRUNC>
> + (match_operand:<VSUBEL> 4 "register_operand" " f, f, f, f"))
> + (reg:SI FRM_REGNUM)] UNSPEC_VRM)))
> (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0, vu, 0")))]
> "TARGET_VECTOR"
> "vfw<insn>.wf\t%0,%3,%4%p1"
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
> new file mode 100644
> index 00000000000..52d940cb57a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-11.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2" } */
> +
> +#include <riscv_vector.h>
> +
> +int main ()
> +{
> + float data_store = 0;
> + int8_t mask = 1;
> + size_t vl = 1;
> + float data_load = 0.0;
> + _Float16 data_sub = 0.0;
> + vint8mf8_t mask_value = __riscv_vle8_v_i8mf8 (&mask, vl);
> + vbool64_t vmask = __riscv_vmseq_vx_i8mf8_b64 (mask_value, 1, vl);
> + vfloat32mf2_t vd_load = __riscv_vfmv_v_f_f32mf2 (0, __riscv_vsetvlmax_e32mf2 ());
> + vfloat32mf2_t vreg_memory = __riscv_vle32_v_f32mf2_tu (vd_load, &data_load, vl);
> + vfloat32mf2_t vreg = __riscv_vfwsub_wf_f32mf2_rm_tum (vmask, vreg_memory, vreg_memory, data_sub, __RISCV_FRM_RDN, vl);
> + __riscv_vse32_v_f32mf2 (&data_store, vreg, vl);
> +
> + __builtin_printf ("%f\n", data_store);
> + return 0;
> +}
> +
> +/* { dg-output "-0.000000\\s+\n" } */
> --
> 2.25.1
>
On 2/18/25 4:12 AM, Jin Ma wrote:
> We overlooked the side effects of the rounding mode in the pattern,
> which can impact the result of float_extend and lead to incorrect
> optimizations in the final program. This issue likely affects nearly
> all similar patterns that involve rounding modes, and the tests in
> this patch only highlight one example. It seems challenging to address,
> and I only implemented a simple fix, which is not a good way to solve
> the problem.
>
> Any comments on this?
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md (UNSPEC_VRM): New.
> * config/riscv/vector.md: Use UNSPEC for float_extend.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/bug-11.c: New test.
So as Kito note, the insn you changed already has a reference to the FRM
it needs -- kept in operands[9]. It seems like your patch, while fixing
the bug, more likely does so by accident rather than by design.
What I see when I look at the dump files is a deeper issue.
In the .expand dump we have:
> (insn 17 16 18 2 (set (reg:HF 147)
> (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 -1
> (nil))
> (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
> (if_then_else:RVVMF2SF (unspec:RVVMF64BI [
> (reg/v:RVVMF64BI 138 [ vmask ])
> (const_int 1 [0x1])
> (const_int 0 [0])
> (const_int 2 [0x2])
> (const_int 0 [0])
> (const_int 2 [0x2])
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> (reg:SI 69 frm)
> ] UNSPEC_VPREDICATE)
> (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
> (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (reg:HF 147))))
> (reg/v:RVVMF2SF 140 [ vreg_memory ]))) "j.c":14:24 -1
> (nil))
Insn 18 does the subtraction with the adjusted rounding mode. So far,
so good. Things look fine at the start of cse1. But if we look at the
end of cse1 we have:
> (insn 17 16 18 2 (set (reg:HF 147)
> (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 136 {*movhf_hardfloat}
> (nil))
> (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
> (reg/v:RVVMF2SF 140 [ vreg_memory ])) "j.c":14:24 2786 {*movrvvmf2sf_fract}
> (expr_list:REG_DEAD (reg:HF 147)
> (expr_list:REG_DEAD (reg/v:RVVMF2SF 140 [ vreg_memory ])
> (expr_list:REG_DEAD (reg/v:RVVMF64BI 138 [ vmask ])
> (expr_list:REG_DEAD (reg:SI 69 frm)
> (nil))))))
Note how CSE replace the arithmetic with a simple copy. At this point
things are broken.
I don't see how CSE can make the right decision here; we don't expose
rounding modes this early and thus CSE has no way to know it can't make
that kind of replacement.
You patch kindof works, but it seems to me it's more accident than
design and that we need to fix this in a more general manner.
The natural question is what do other targets do when the rounding mode
gets changed. I'm guessing its exposed as a unspec set before the RTL
optimizers run.
jeff
On Tue, 18 Feb 2025 13:48:02 -0700, Jeff Law wrote:
>
>
> On 2/18/25 4:12 AM, Jin Ma wrote:
> > We overlooked the side effects of the rounding mode in the pattern,
> > which can impact the result of float_extend and lead to incorrect
> > optimizations in the final program. This issue likely affects nearly
> > all similar patterns that involve rounding modes, and the tests in
> > this patch only highlight one example. It seems challenging to address,
> > and I only implemented a simple fix, which is not a good way to solve
> > the problem.
> >
> > Any comments on this?
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/vector-iterators.md (UNSPEC_VRM): New.
> > * config/riscv/vector.md: Use UNSPEC for float_extend.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/base/bug-11.c: New test.
> So as Kito note, the insn you changed already has a reference to the FRM
> it needs -- kept in operands[9]. It seems like your patch, while fixing
> the bug, more likely does so by accident rather than by design.
>
> What I see when I look at the dump files is a deeper issue.
>
>
> In the .expand dump we have:
>
> > (insn 17 16 18 2 (set (reg:HF 147)
> > (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 -1
> > (nil))
> > (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
> > (if_then_else:RVVMF2SF (unspec:RVVMF64BI [
> > (reg/v:RVVMF64BI 138 [ vmask ])
> > (const_int 1 [0x1])
> > (const_int 0 [0])
> > (const_int 2 [0x2])
> > (const_int 0 [0])
> > (const_int 2 [0x2])
> > (reg:SI 66 vl)
> > (reg:SI 67 vtype)
> > (reg:SI 69 frm)
> > ] UNSPEC_VPREDICATE)
> > (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
> > (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (reg:HF 147))))
> > (reg/v:RVVMF2SF 140 [ vreg_memory ]))) "j.c":14:24 -1
> > (nil))
>
>
>
> Insn 18 does the subtraction with the adjusted rounding mode. So far,
> so good. Things look fine at the start of cse1. But if we look at the
> end of cse1 we have:
>
> > (insn 17 16 18 2 (set (reg:HF 147)
> > (const_double:HF 0.0 [0x0.0p+0])) "j.c":14:24 136 {*movhf_hardfloat}
> > (nil))
> > (insn 18 17 19 2 (set (reg/v:RVVMF2SF 141 [ vreg ])
> > (reg/v:RVVMF2SF 140 [ vreg_memory ])) "j.c":14:24 2786 {*movrvvmf2sf_fract}
> > (expr_list:REG_DEAD (reg:HF 147)
> > (expr_list:REG_DEAD (reg/v:RVVMF2SF 140 [ vreg_memory ])
> > (expr_list:REG_DEAD (reg/v:RVVMF64BI 138 [ vmask ])
> > (expr_list:REG_DEAD (reg:SI 69 frm)
> > (nil))))))
>
>
> Note how CSE replace the arithmetic with a simple copy. At this point
> things are broken.
>
> I don't see how CSE can make the right decision here; we don't expose
> rounding modes this early and thus CSE has no way to know it can't make
> that kind of replacement.
>
> You patch kindof works, but it seems to me it's more accident than
> design and that we need to fix this in a more general manner.
>
> The natural question is what do other targets do when the rounding mode
> gets changed. I'm guessing its exposed as a unspec set before the RTL
> optimizers run.
I apologize for not explaining things more clearly. I also discovered that
the issue is caused by CSE. I think that during the substitution process,
CSE recognized the syntax of if_then_else and concluded that the expressions
in the "then" and "else" branches are equivalent, resulting in both yielding
(reg/v:RVVMF2SF 140 [ vreg_memory ]):
(minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
(float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 [0x0.0p+0]))))
is considered equivalent to:
(reg/v:RVVMF2SF 140 [ vreg_memory ])
Clearly, there wasn’t a deeper consideration of the fact that float_extend requires
a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to inform
CSE that we have a rounding mode.
As I mentioned before, this may not be a good solution, as it risks missing other
optimization opportunities. As you pointed out, we need a more general approach
to fix it. Unfortunately, while I’m still trying to find a solution, I currently
don't have any other good ideas.
Best regards,
Jin Ma
> jeff
On 2/18/25 7:30 PM, Jin Ma wrote:
>
> I apologize for not explaining things more clearly. I also discovered that
> the issue is caused by CSE. I think that during the substitution process,
> CSE recognized the syntax of if_then_else and concluded that the expressions
> in the "then" and "else" branches are equivalent, resulting in both yielding
> (reg/v:RVVMF2SF 140 [ vreg_memory ]):
>
> (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
> (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 [0x0.0p+0]))))
>
> is considered equivalent to:
>
> (reg/v:RVVMF2SF 140 [ vreg_memory ])
>
> Clearly, there wasn’t a deeper consideration of the fact that float_extend requires
> a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to inform
> CSE that we have a rounding mode.
Right. It worked, but there's a deeper issue here.
>
> As I mentioned before, this may not be a good solution, as it risks missing other
> optimization opportunities. As you pointed out, we need a more general approach
> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
> don't have any other good ideas.
Changing the rounding modes isn't common, but it's not unheard of. My
suspicion is that we need to expose the rounding mode assignment earlier
(at RTL generation time).
That may not work well with the current optimization of FRM, but I think
early exposure is the only viable path forward in my mind. Depending on
the depth of the problems it may not be something we can fix in the
gcc-15 space.
You might experiment with emitting the FRM assignment in the
insn_expander class in the risc-v backend. This code:
> /* Add rounding mode operand. */
> if (m_insn_flags & FRM_DYN_P)
> add_rounding_mode_operand (FRM_DYN);
> else if (m_insn_flags & FRM_RUP_P)
> add_rounding_mode_operand (FRM_RUP);
> else if (m_insn_flags & FRM_RDN_P)
> add_rounding_mode_operand (FRM_RDN);
> else if (m_insn_flags & FRM_RMM_P)
> add_rounding_mode_operand (FRM_RMM);
> else if (m_insn_flags & FRM_RNE_P)
> add_rounding_mode_operand (FRM_RNE);
> else if (m_insn_flags & VXRM_RNU_P)
> add_rounding_mode_operand (VXRM_RNU);
> else if (m_insn_flags & VXRM_RDN_P)
> add_rounding_mode_operand (VXRM_RDN);
For anything other than FRM_DYN_P emit the appropriate insn to set FRM.
This may generate poor code in the presence of explicit rounding modes,
but I think something along these lines is ultimately going to be needed.
jeff
>> As I mentioned before, this may not be a good solution, as it risks missing other
>> optimization opportunities. As you pointed out, we need a more general approach
>> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
>> don't have any other good ideas.
> Changing the rounding modes isn't common, but it's not unheard of. My
> suspicion is that we need to expose the rounding mode assignment earlier
> (at RTL generation time).
>
> That may not work well with the current optimization of FRM, but I think
> early exposure is the only viable path forward in my mind. Depending on
> the depth of the problems it may not be something we can fix in the
> gcc-15 space.
With -frounding-math CSE doesn't do the replacement. So we could argue that
a user should specify -frounding-math if they explicitly care about the
behavior. But on the other hand it's surprising if the user deliberately used
a rounding-mode setting instruction which doesn't work as intended.
Even if we wrapped those instructions in unspecs, couldn't other parts of the
program, that are compiled with the default -fno-roundin-math still lead to
unexpected results?
I don't see any other way than to "hide" the behavior from optimizers either
in order to prevent folding of such patterns.
On 2/19/25 1:00 AM, Robin Dapp wrote:
>>> As I mentioned before, this may not be a good solution, as it risks missing other
>>> optimization opportunities. As you pointed out, we need a more general approach
>>> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
>>> don't have any other good ideas.
>> Changing the rounding modes isn't common, but it's not unheard of. My
>> suspicion is that we need to expose the rounding mode assignment earlier
>> (at RTL generation time).
>>
>> That may not work well with the current optimization of FRM, but I think
>> early exposure is the only viable path forward in my mind. Depending on
>> the depth of the problems it may not be something we can fix in the
>> gcc-15 space.
>
> With -frounding-math CSE doesn't do the replacement. So we could argue that
> a user should specify -frounding-math if they explicitly care about the
> behavior. But on the other hand it's surprising if the user deliberately used
> a rounding-mode setting instruction which doesn't work as intended.
>
> Even if we wrapped those instructions in unspecs, couldn't other parts of the
> program, that are compiled with the default -fno-roundin-math still lead to
> unexpected results?
>
> I don't see any other way than to "hide" the behavior from optimizers either
> in order to prevent folding of such patterns.
I didn't even know the option existed! Clearly necessary if we're
using these builtins with non-default rounding modes.
One thought would be to issue a warning when using one of these builtins
with a non-default mode and -frounding-math disabled.
Another would be to implicitly turn the option on. I don't particularly
like this idea, but throwing it out there as a possibility.
jeff
On Wed, 19 Feb 2025 21:53:32 +0800, Jeff Law wrote:
>
>
> On 2/19/25 1:00 AM, Robin Dapp wrote:
> >>> As I mentioned before, this may not be a good solution, as it risks missing other
> >>> optimization opportunities. As you pointed out, we need a more general approach
> >>> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
> >>> don't have any other good ideas.
> >> Changing the rounding modes isn't common, but it's not unheard of. My
> >> suspicion is that we need to expose the rounding mode assignment earlier
> >> (at RTL generation time).
> >>
> >> That may not work well with the current optimization of FRM, but I think
> >> early exposure is the only viable path forward in my mind. Depending on
> >> the depth of the problems it may not be something we can fix in the
> >> gcc-15 space.
> >
> > With -frounding-math CSE doesn't do the replacement. So we could argue that
> > a user should specify -frounding-math if they explicitly care about the
> > behavior. But on the other hand it's surprising if the user deliberately used
> > a rounding-mode setting instruction which doesn't work as intended.
> >
> > Even if we wrapped those instructions in unspecs, couldn't other parts of the
> > program, that are compiled with the default -fno-roundin-math still lead to
> > unexpected results?
> >
> > I don't see any other way than to "hide" the behavior from optimizers either
> > in order to prevent folding of such patterns.
> I didn't even know the option existed! Clearly necessary if we're
> using these builtins with non-default rounding modes.
I wasn't aware of the existence of this option either. These built-ins require it.
I suspect that it makes certain assumptions about the rounding modes in floating-point
calculations, such as in float_extend, which may prevent CSE optimizations. Could
this also lead to lost optimization opportunities in other areas that don't require
this option? I'm not sure.
I suspect that the best approach would be to define relevant
attributes (perhaps similar to -frounding-math) within specific related patterns/built-ins
to inform optimizers we are using a rounding mode and to avoid over-optimization.
Best regards,
Jin Ma
> One thought would be to issue a warning when using one of these builtins
> with a non-default mode and -frounding-math disabled.
>
> Another would be to implicitly turn the option on. I don't particularly
> like this idea, but throwing it out there as a possibility.
>
> jeff
>
>
On Tue, 18 Feb 2025 21:40:06 -0700, Jeff Law wrote:
>
>
> On 2/18/25 7:30 PM, Jin Ma wrote:
>
> >
> > I apologize for not explaining things more clearly. I also discovered that
> > the issue is caused by CSE. I think that during the substitution process,
> > CSE recognized the syntax of if_then_else and concluded that the expressions
> > in the "then" and "else" branches are equivalent, resulting in both yielding
> > (reg/v:RVVMF2SF 140 [ vreg_memory ]):
> >
> > (minus:RVVMF2SF (reg/v:RVVMF2SF 140 [ vreg_memory ])
> > (float_extend:RVVMF2SF (vec_duplicate:RVVMF4HF (const_double:HF 0.0 [0x0.0p+0]))))
> >
> > is considered equivalent to:
> >
> > (reg/v:RVVMF2SF 140 [ vreg_memory ])
> >
> > Clearly, there wasn鈥檛 a deeper consideration of the fact that float_extend requires
> > a rounding mode(frm). Therefore, I attempted to use UNSPEC in the pattern to inform
> > CSE that we have a rounding mode.
> Right. It worked, but there's a deeper issue here.
>
> >
> > As I mentioned before, this may not be a good solution, as it risks missing other
> > optimization opportunities. As you pointed out, we need a more general approach
> > to fix it. Unfortunately, while I鈥檓 still trying to find a solution, I currently
> > don't have any other good ideas.
> Changing the rounding modes isn't common, but it's not unheard of. My
> suspicion is that we need to expose the rounding mode assignment earlier
> (at RTL generation time).
>
> That may not work well with the current optimization of FRM, but I think
> early exposure is the only viable path forward in my mind. Depending on
> the depth of the problems it may not be something we can fix in the
> gcc-15 space.
>
> You might experiment with emitting the FRM assignment in the
> insn_expander class in the risc-v backend. This code:
> > /* Add rounding mode operand. */
> > if (m_insn_flags & FRM_DYN_P)
> > add_rounding_mode_operand (FRM_DYN);
> > else if (m_insn_flags & FRM_RUP_P)
> > add_rounding_mode_operand (FRM_RUP);
> > else if (m_insn_flags & FRM_RDN_P)
> > add_rounding_mode_operand (FRM_RDN);
> > else if (m_insn_flags & FRM_RMM_P)
> > add_rounding_mode_operand (FRM_RMM);
> > else if (m_insn_flags & FRM_RNE_P)
> > add_rounding_mode_operand (FRM_RNE);
> > else if (m_insn_flags & VXRM_RNU_P)
> > add_rounding_mode_operand (VXRM_RNU);
> > else if (m_insn_flags & VXRM_RDN_P)
> > add_rounding_mode_operand (VXRM_RDN);
>
> For anything other than FRM_DYN_P emit the appropriate insn to set FRM.
> This may generate poor code in the presence of explicit rounding modes,
> but I think something along these lines is ultimately going to be needed.
Are you suggesting that we should emit the rounding mode insn earlier or
incorporate the rounding mode into the pattern (in fact, there are already
operands[9]/reg FRM_REGNUM)? However, this doesn't seem to be effective
because the side effects of the rounding mode do not take effect in
float_extend, and CSE will always optimize away pred_single_widen_subrvvmf2sf_scalar,
just like before :)
Best regards,
Jin Ma
> jeff
On Thu, 2025-02-20 at 10:31 +0800, Jin Ma wrote:
> On Wed, 19 Feb 2025 21:53:32 +0800, Jeff Law wrote:
> >
> >
> > On 2/19/25 1:00 AM, Robin Dapp wrote:
> > > > > As I mentioned before, this may not be a good solution, as it risks missing other
> > > > > optimization opportunities. As you pointed out, we need a more general approach
> > > > > to fix it. Unfortunately, while I’m still trying to find a solution, I currently
> > > > > don't have any other good ideas.
> > > > Changing the rounding modes isn't common, but it's not unheard of. My
> > > > suspicion is that we need to expose the rounding mode assignment earlier
> > > > (at RTL generation time).
> > > >
> > > > That may not work well with the current optimization of FRM, but I think
> > > > early exposure is the only viable path forward in my mind. Depending on
> > > > the depth of the problems it may not be something we can fix in the
> > > > gcc-15 space.
> > >
> > > With -frounding-math CSE doesn't do the replacement. So we could argue that
> > > a user should specify -frounding-math if they explicitly care about the
> > > behavior. But on the other hand it's surprising if the user deliberately used
> > > a rounding-mode setting instruction which doesn't work as intended.
> > >
> > > Even if we wrapped those instructions in unspecs, couldn't other parts of the
> > > program, that are compiled with the default -fno-roundin-math still lead to
> > > unexpected results?
> > >
> > > I don't see any other way than to "hide" the behavior from optimizers either
> > > in order to prevent folding of such patterns.
> > I didn't even know the option existed! Clearly necessary if we're
> > using these builtins with non-default rounding modes.
>
> I wasn't aware of the existence of this option either. These built-ins require it.
> I suspect that it makes certain assumptions about the rounding modes in floating-point
> calculations, such as in float_extend, which may prevent CSE optimizations. Could
> this also lead to lost optimization opportunities in other areas that don't require
> this option? I'm not sure.
>
> I suspect that the best approach would be to define relevant
> attributes (perhaps similar to -frounding-math) within specific related patterns/built-ins
> to inform optimizers we are using a rounding mode and to avoid over-optimization.
The "special pattern" is supposed to be #pragma STDC FENV_ACCESS that
we've not implemented. See https://gcc.gnu.org/PR34678.
On 2/19/25 11:20 PM, Xi Ruoyao wrote:
> On Thu, 2025-02-20 at 10:31 +0800, Jin Ma wrote:
>> On Wed, 19 Feb 2025 21:53:32 +0800, Jeff Law wrote:
>>>
>>>
>>> On 2/19/25 1:00 AM, Robin Dapp wrote:
>>>>>> As I mentioned before, this may not be a good solution, as it risks missing other
>>>>>> optimization opportunities. As you pointed out, we need a more general approach
>>>>>> to fix it. Unfortunately, while I’m still trying to find a solution, I currently
>>>>>> don't have any other good ideas.
>>>>> Changing the rounding modes isn't common, but it's not unheard of. My
>>>>> suspicion is that we need to expose the rounding mode assignment earlier
>>>>> (at RTL generation time).
>>>>>
>>>>> That may not work well with the current optimization of FRM, but I think
>>>>> early exposure is the only viable path forward in my mind. Depending on
>>>>> the depth of the problems it may not be something we can fix in the
>>>>> gcc-15 space.
>>>>
>>>> With -frounding-math CSE doesn't do the replacement. So we could argue that
>>>> a user should specify -frounding-math if they explicitly care about the
>>>> behavior. But on the other hand it's surprising if the user deliberately used
>>>> a rounding-mode setting instruction which doesn't work as intended.
>>>>
>>>> Even if we wrapped those instructions in unspecs, couldn't other parts of the
>>>> program, that are compiled with the default -fno-roundin-math still lead to
>>>> unexpected results?
>>>>
>>>> I don't see any other way than to "hide" the behavior from optimizers either
>>>> in order to prevent folding of such patterns.
>>> I didn't even know the option existed! Clearly necessary if we're
>>> using these builtins with non-default rounding modes.
>>
>> I wasn't aware of the existence of this option either. These built-ins require it.
>> I suspect that it makes certain assumptions about the rounding modes in floating-point
>> calculations, such as in float_extend, which may prevent CSE optimizations. Could
>> this also lead to lost optimization opportunities in other areas that don't require
>> this option? I'm not sure.
>>
>> I suspect that the best approach would be to define relevant
>> attributes (perhaps similar to -frounding-math) within specific related patterns/built-ins
>> to inform optimizers we are using a rounding mode and to avoid over-optimization.
>
> The "special pattern" is supposed to be #pragma STDC FENV_ACCESS that
> we've not implemented. See https://gcc.gnu.org/PR34678.
Not exactly. That's a discussion of what it would take to make this all
work relatively seamlessly, without the need for -frounding-math.
My comment about needing a special pattern was predicated on the
(incorrect) belief that we already had some support for this baked into
the optimizers for targets that allow adjusting the rounding mode via
asms and builtins. But I was wrong.
So the issue at hand is some of the RISC-V intrinsics depend on
-frounding-math behavior and what to do about that. We could:
1. Ignore it and tell users they need to explicitly set -frouding-math
2. Issue a diagnostic if a RISC-V intrinsic that needs rounding math is
used without -frounding-math being enabled.
3. If an intrinsic that needs rounding math is used, set -frounding-math
I tend to lean towards #3 given the state of the world right now. But I
haven't actually explored this at all to know if there are any
unexpected gotchas.
Jeff
>
On 2/19/25 7:52 PM, Jin Ma wrote:
>
> Are you suggesting that we should emit the rounding mode insn earlier or
> incorporate the rounding mode into the pattern (in fact, there are already
> operands[9]/reg FRM_REGNUM)? However, this doesn't seem to be effective
> because the side effects of the rounding mode do not take effect in
> float_extend, and CSE will always optimize away pred_single_widen_subrvvmf2sf_scalar,
> just like before :)
That was my original thought, but as others have pointed out,
-frounding-math is probably a better path forward.
jeff
@@ -120,6 +120,9 @@ (define_c_enum "unspec" [
UNSPEC_SF_VFNRCLIP
UNSPEC_SF_VFNRCLIPU
+
+ ;; Side effects of rounding mode
+ UNSPEC_VRM
])
(define_c_enum "unspecv" [
@@ -7135,8 +7135,10 @@ (define_insn "@pred_single_widen_<plus_minus:optab><mode>_scalar"
(plus_minus:VWEXTF
(match_operand:VWEXTF 3 "register_operand" " vr, vr, vr, vr")
(float_extend:VWEXTF
- (vec_duplicate:<V_DOUBLE_TRUNC>
- (match_operand:<VSUBEL> 4 "register_operand" " f, f, f, f"))))
+ (unspec:VWEXTF
+ [(vec_duplicate:<V_DOUBLE_TRUNC>
+ (match_operand:<VSUBEL> 4 "register_operand" " f, f, f, f"))
+ (reg:SI FRM_REGNUM)] UNSPEC_VRM)))
(match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0, vu, 0")))]
"TARGET_VECTOR"
"vfw<insn>.wf\t%0,%3,%4%p1"
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2" } */
+
+#include <riscv_vector.h>
+
+int main ()
+{
+ float data_store = 0;
+ int8_t mask = 1;
+ size_t vl = 1;
+ float data_load = 0.0;
+ _Float16 data_sub = 0.0;
+ vint8mf8_t mask_value = __riscv_vle8_v_i8mf8 (&mask, vl);
+ vbool64_t vmask = __riscv_vmseq_vx_i8mf8_b64 (mask_value, 1, vl);
+ vfloat32mf2_t vd_load = __riscv_vfmv_v_f_f32mf2 (0, __riscv_vsetvlmax_e32mf2 ());
+ vfloat32mf2_t vreg_memory = __riscv_vle32_v_f32mf2_tu (vd_load, &data_load, vl);
+ vfloat32mf2_t vreg = __riscv_vfwsub_wf_f32mf2_rm_tum (vmask, vreg_memory, vreg_memory, data_sub, __RISCV_FRM_RDN, vl);
+ __riscv_vse32_v_f32mf2 (&data_store, vreg, vl);
+
+ __builtin_printf ("%f\n", data_store);
+ return 0;
+}
+
+/* { dg-output "-0.000000\\s+\n" } */