RISC-V: Adjust LMUL when using maximum SEW [PR117955].
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-lint |
success
|
Lint passed
|
rivoscibot/toolchain-ci-rivos-apply-patch |
success
|
Patch applied
|
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib |
success
|
Build passed
|
rivoscibot/toolchain-ci-rivos-test |
fail
|
Testing failed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap |
success
|
Build passed
|
Commit Message
Hi,
when merging two vsetvls that both only demand "SEW >= ..." we
use their maximum SEW and keep the LMUL. That may lead to invalid
vector configurations like
e64, mf4.
As we make sure that the SEW requirements overlap we can use the SEW
and LMUL of the configuration with the larger SEW.
Ma Jin already touched this merge rule some weeks ago and fixed the
ratio calculation (r15-6873). Calculating the ratio from an invalid
SEW/LMUL combination lead to an overflow in the ratio variable, though.
I'd argue the proper fix is to update SEW and LMUL, keeping the ratio
as before. This breaks bug-10.c, though, and I'm not sure what it
really tests. SEW/LMUL actually doesn't change, we just emit a slightly
different vsetvl. Maybe it was reduced too far? Jin, any insight
there? I changed it into a run test for now.
Regtested on rv64gcv_zvl512b.
Regards
Robin
PR target/117955
gcc/ChangeLog:
* config/riscv/riscv-v.cc (calculate_ratio): Use LMUL of vsetvl
with larger SEW.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/bug-10.c: Convert to run test.
* gcc.target/riscv/rvv/base/pr117955.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc | 8 +-
.../gcc.target/riscv/rvv/base/bug-10.c | 32 +-
.../gcc.target/riscv/rvv/base/pr117955.c | 827 ++++++++++++++++++
3 files changed, 861 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c
Comments
gcc/ChangeLog:
* config/riscv/riscv-v.cc (calculate_ratio): Use LMUL of vsetvl
with larger SEW.
The changelog seems incorrect.
>> - int max_sew = MAX (prev.get_sew (), next.get_sew ());
>> - prev.set_sew (max_sew);
>> - prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
>> + bool prev_sew_larger = prev.get_sew () >= next.get_sew ();
>> + const vsetvl_info from = prev_sew_larger ? prev : next;
>> + prev.set_sew (from.get_sew ());
>> + prev.set_vlmul (from.get_vlmul ());
>> + prev.set_ratio (from.get_ratio ());
It seems the issue is we didn't set "vlmul" ?
Can we do that:
int max_sew = MAX (prev.get_sew (), next.get_sew ());
prev.set_sew (max_sew);
prev.set_vlmul (calculate_vlmul (...));
prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2025-02-27 23:00
To: gcc-patches
CC: palmer@dabbelt.com; kito.cheng@gmail.com; juzhe.zhong@rivai.ai; jeffreyalaw@gmail.com; pan2.li@intel.com; rdapp.gcc@gmail.com; Jin Ma
Subject: [PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].
Hi,
when merging two vsetvls that both only demand "SEW >= ..." we
use their maximum SEW and keep the LMUL. That may lead to invalid
vector configurations like
e64, mf4.
As we make sure that the SEW requirements overlap we can use the SEW
and LMUL of the configuration with the larger SEW.
Ma Jin already touched this merge rule some weeks ago and fixed the
ratio calculation (r15-6873). Calculating the ratio from an invalid
SEW/LMUL combination lead to an overflow in the ratio variable, though.
I'd argue the proper fix is to update SEW and LMUL, keeping the ratio
as before. This breaks bug-10.c, though, and I'm not sure what it
really tests. SEW/LMUL actually doesn't change, we just emit a slightly
different vsetvl. Maybe it was reduced too far? Jin, any insight
there? I changed it into a run test for now.
Regtested on rv64gcv_zvl512b.
Regards
Robin
PR target/117955
gcc/ChangeLog:
* config/riscv/riscv-v.cc (calculate_ratio): Use LMUL of vsetvl
with larger SEW.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/bug-10.c: Convert to run test.
* gcc.target/riscv/rvv/base/pr117955.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc | 8 +-
.../gcc.target/riscv/rvv/base/bug-10.c | 32 +-
.../gcc.target/riscv/rvv/base/pr117955.c | 827 ++++++++++++++++++
3 files changed, 861 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 82284624a24..f0165f7b8c8 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1729,9 +1729,11 @@ private:
}
inline void use_max_sew (vsetvl_info &prev, const vsetvl_info &next)
{
- int max_sew = MAX (prev.get_sew (), next.get_sew ());
- prev.set_sew (max_sew);
- prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
+ bool prev_sew_larger = prev.get_sew () >= next.get_sew ();
+ const vsetvl_info from = prev_sew_larger ? prev : next;
+ prev.set_sew (from.get_sew ());
+ prev.set_vlmul (from.get_vlmul ());
+ prev.set_ratio (from.get_ratio ());
use_min_of_max_sew (prev, next);
}
inline void use_next_sew_lmul (vsetvl_info &prev, const vsetvl_info &next)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
index af3a8610d63..5f7490e8a3b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
@@ -1,14 +1,40 @@
-/* { dg-do compile { target { rv64 } } } */
+/* { dg-do run { target { rv64 } } } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-require-effective-target riscv_v } */
/* { dg-options " -march=rv64gcv_zvfh -mabi=lp64d -O2 --param=vsetvl-strategy=optim -fno-schedule-insns -fno-schedule-insns2 -fno-schedule-fusion " } */
#include <riscv_vector.h>
void
-foo (uint8_t *ptr, vfloat16m4_t *v1, vuint32m8_t *v2, vuint8m2_t *v3, size_t vl)
+__attribute__ ((noipa))
+foo (vfloat16m4_t *v1, vuint32m8_t *v2, vuint8m2_t *v3, size_t vl)
{
*v1 = __riscv_vfmv_s_f_f16m4 (1, vl);
*v2 = __riscv_vmv_s_x_u32m8 (2963090659u, vl);
*v3 = __riscv_vsll_vx_u8m2 (__riscv_vid_v_u8m2 (vl), 2, vl);
}
-/* { dg-final { scan-assembler-not {vsetvli.*zero,zero} } }*/
+int
+main ()
+{
+ vfloat16m4_t v1;
+ vuint32m8_t v2;
+ vuint8m2_t v3;
+ int vl = 4;
+ foo (&v1, &v2, &v3, vl);
+
+ _Float16 val1 = ((_Float16 *)&v1)[0];
+ if (val1 - 1.0000f > 0.00001f)
+ __builtin_abort ();
+
+ uint32_t val2 = ((uint32_t *)&v2)[0];
+ if (val2 != 2963090659u)
+ __builtin_abort ();
+
+ for (int i = 0; i < vl; i++)
+ {
+ uint8_t val = ((uint8_t *)&v3)[i];
+ if (val != i << 2)
+ __builtin_abort ();
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c
new file mode 100644
index 00000000000..49ccb6097d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c
@@ -0,0 +1,827 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -O3" } */
+
+#include <riscv_vector.h>
+
+#define dataLen 100
+#define isNaNF16UI( a ) (((~(a) & 0x7C00) == 0) && ((a) & 0x03FF))
+#define isNaNF32UI( a ) (((~(a) & 0x7F800000) == 0) && ((a) & 0x007FFFFF))
+#define isNaNF64UI( a ) (((~(a) & UINT64_C( 0x7FF0000000000000 )) == 0) && ((a) & UINT64_C( 0x000FFFFFFFFFFFFF )))
+typedef _Float16 float16_t;
+typedef float float32_t;
+typedef double float64_t;
+
+float16_t convert_binary_u16_f16(uint16_t u16){
+ union { float16_t f16; uint16_t u16; } converter;
+ converter.u16 = u16;
+ if(isNaNF16UI(converter.u16)) return 0;
+ return converter.f16;
+}
+float32_t convert_binary_u32_f32(uint32_t u32){
+ union { float32_t f32; uint32_t u32; } converter;
+ converter.u32 = u32;
+ if(isNaNF32UI(converter.u32)) return 0;
+ return converter.f32;
+}
+float64_t convert_binary_u64_f64(uint64_t u64){
+ union { float64_t f64; uint64_t u64; } converter;
+ converter.u64 = u64;
+ if(isNaNF64UI(converter.u64)) return 0;
+ return converter.f64;
+}
+
+int8_t data_mask[dataLen];
+int64_t data_load_0[dataLen];
+uint16_t data_load_1[dataLen];
+int8_t data_load_10[dataLen];
+float16_t data_load_11[dataLen];
+uint64_t data_load_12[dataLen];
+int16_t data_load_13[dataLen];
+uint8_t data_load_14[dataLen];
+uint8_t data_load_15[dataLen];
+int64_t data_load_16[dataLen];
+float32_t data_load_17[dataLen];
+int32_t data_load_18[dataLen];
+int64_t data_load_19[dataLen];
+uint16_t data_load_2[dataLen];
+uint8_t data_load_20[dataLen];
+int32_t data_load_21[dataLen];
+int32_t data_load_22[dataLen];
+uint32_t data_load_23[dataLen];
+float16_t data_load_24[dataLen];
+int64_t data_load_25[dataLen];
+int16_t data_load_26[dataLen];
+int16_t data_load_27[dataLen];
+int16_t data_load_28[dataLen];
+float32_t data_load_29[dataLen];
+int64_t data_load_3[dataLen];
+float64_t data_load_30[dataLen];
+uint8_t data_load_31[dataLen];
+float16_t data_load_32[dataLen];
+int32_t data_load_33[dataLen];
+int32_t data_load_34[dataLen];
+int16_t data_load_35[dataLen];
+uint16_t data_load_36[dataLen];
+uint64_t data_load_37[dataLen];
+uint64_t data_load_38[dataLen];
+float64_t data_load_39[dataLen];
+float16_t data_load_4[dataLen];
+float64_t data_load_40[dataLen];
+int8_t data_load_41[dataLen];
+uint64_t data_load_42[dataLen];
+uint64_t data_load_43[dataLen];
+int8_t data_load_44[dataLen];
+int8_t data_load_45[dataLen];
+int8_t data_load_46[dataLen];
+uint32_t data_load_47[dataLen];
+uint64_t data_load_48[dataLen];
+int16_t data_load_49[dataLen];
+uint16_t data_load_5[dataLen];
+uint16_t data_load_50[dataLen];
+uint16_t data_load_51[dataLen];
+uint16_t data_load_52[dataLen];
+uint16_t data_load_53[dataLen];
+int16_t data_load_54[dataLen];
+int64_t data_load_55[dataLen];
+float64_t data_load_56[dataLen];
+float32_t data_load_57[dataLen];
+int16_t data_load_58[dataLen];
+int16_t data_load_59[dataLen];
+uint16_t data_load_6[dataLen];
+int8_t data_load_60[dataLen];
+float32_t data_load_61[dataLen];
+int32_t data_load_62[dataLen];
+uint32_t data_load_63[dataLen];
+int16_t data_load_64[dataLen];
+uint32_t data_load_65[dataLen];
+uint8_t data_load_66[dataLen];
+uint64_t data_load_67[dataLen];
+int8_t data_load_68[dataLen];
+float32_t data_load_69[dataLen];
+uint8_t data_load_7[dataLen];
+int16_t data_load_70[dataLen];
+int16_t data_load_71[dataLen];
+uint32_t data_load_72[dataLen];
+uint32_t data_load_73[dataLen];
+int32_t data_load_74[dataLen];
+int16_t data_load_75[dataLen];
+int8_t data_load_76[dataLen];
+float32_t data_load_77[dataLen];
+uint8_t data_load_78[dataLen];
+int8_t data_load_79[dataLen];
+float32_t data_load_8[dataLen];
+float16_t data_load_9[dataLen];
+int32_t data_store_vreg_0[dataLen];
+int8_t data_store_vreg_1[dataLen];
+int16_t data_store_vreg_10[dataLen];
+uint32_t data_store_vreg_11[dataLen];
+int32_t data_store_vreg_12[dataLen];
+uint32_t data_store_vreg_13[dataLen];
+int64_t data_store_vreg_14[dataLen];
+float32_t data_store_vreg_15[dataLen];
+int32_t data_store_vreg_16[dataLen];
+float32_t data_store_vreg_17[dataLen];
+float64_t data_store_vreg_18[dataLen];
+float64_t data_store_vreg_19[dataLen];
+float16_t data_store_vreg_2[dataLen];
+int8_t data_store_vreg_20[dataLen];
+float64_t data_store_vreg_21[dataLen];
+int16_t data_store_vreg_22[dataLen];
+uint16_t data_store_vreg_23[dataLen];
+int8_t data_store_vreg_24[dataLen];
+int16_t data_store_vreg_25[dataLen];
+uint8_t data_store_vreg_26[dataLen];
+int16_t data_store_vreg_27[dataLen];
+int32_t data_store_vreg_28[dataLen];
+int16_t data_store_vreg_29[dataLen];
+uint16_t data_store_vreg_3[dataLen];
+float32_t data_store_vreg_30[dataLen];
+int8_t data_store_vreg_31[dataLen];
+uint64_t data_store_vreg_32[dataLen];
+int8_t data_store_vreg_33[dataLen];
+int8_t data_store_vreg_34[dataLen];
+uint64_t data_store_vreg_35[dataLen];
+int8_t data_store_vreg_36[dataLen];
+uint16_t data_store_vreg_37[dataLen];
+int64_t data_store_vreg_38[dataLen];
+float16_t data_store_vreg_39[dataLen];
+float32_t data_store_vreg_4[dataLen];
+float16_t data_store_vreg_40[dataLen];
+uint8_t data_store_vreg_41[dataLen];
+uint16_t data_store_vreg_42[dataLen];
+int8_t data_store_vreg_5[dataLen];
+float64_t data_store_vreg_6[dataLen];
+int8_t data_store_vreg_7[dataLen];
+int8_t data_store_vreg_8[dataLen];
+int32_t data_store_vreg_9[dataLen];
+uint16_t data_store_vreg_memory_1[dataLen];
+float16_t data_store_vreg_memory_11[dataLen];
+int16_t data_store_vreg_memory_13[dataLen];
+int64_t data_store_vreg_memory_16[dataLen];
+float32_t data_store_vreg_memory_17[dataLen];
+uint16_t data_store_vreg_memory_2[dataLen];
+int32_t data_store_vreg_memory_21[dataLen];
+float16_t data_store_vreg_memory_24[dataLen];
+int16_t data_store_vreg_memory_28[dataLen];
+int64_t data_store_vreg_memory_3[dataLen];
+float64_t data_store_vreg_memory_30[dataLen];
+uint8_t data_store_vreg_memory_31[dataLen];
+int32_t data_store_vreg_memory_33[dataLen];
+uint16_t data_store_vreg_memory_36[dataLen];
+uint64_t data_store_vreg_memory_38[dataLen];
+float64_t data_store_vreg_memory_40[dataLen];
+int8_t data_store_vreg_memory_41[dataLen];
+uint64_t data_store_vreg_memory_42[dataLen];
+uint32_t data_store_vreg_memory_47[dataLen];
+uint16_t data_store_vreg_memory_5[dataLen];
+int8_t data_store_vreg_memory_60[dataLen];
+float32_t data_store_vreg_memory_61[dataLen];
+uint32_t data_store_vreg_memory_63[dataLen];
+uint8_t data_store_vreg_memory_7[dataLen];
+uint32_t data_store_vreg_memory_72[dataLen];
+float32_t data_store_vreg_memory_8[dataLen];
+float16_t data_store_vreg_memory_9[dataLen];
+
+
+int main(){
+ int avl1 = dataLen;
+ int8_t* ptr_mask = data_mask;
+ int64_t* ptr_load_0 = data_load_0;
+ uint16_t* ptr_load_1 = data_load_1;
+ int8_t* ptr_load_10 = data_load_10;
+ float16_t* ptr_load_11 = data_load_11;
+ uint64_t* ptr_load_12 = data_load_12;
+ int16_t* ptr_load_13 = data_load_13;
+ uint8_t* ptr_load_14 = data_load_14;
+ uint8_t* ptr_load_15 = data_load_15;
+ int64_t* ptr_load_16 = data_load_16;
+ float32_t* ptr_load_17 = data_load_17;
+ int32_t* ptr_load_18 = data_load_18;
+ int64_t* ptr_load_19 = data_load_19;
+ uint16_t* ptr_load_2 = data_load_2;
+ uint8_t* ptr_load_20 = data_load_20;
+ int32_t* ptr_load_21 = data_load_21;
+ int32_t* ptr_load_22 = data_load_22;
+ uint32_t* ptr_load_23 = data_load_23;
+ float16_t* ptr_load_24 = data_load_24;
+ int64_t* ptr_load_25 = data_load_25;
+ int16_t* ptr_load_26 = data_load_26;
+ int16_t* ptr_load_27 = data_load_27;
+ int16_t* ptr_load_28 = data_load_28;
+ float32_t* ptr_load_29 = data_load_29;
+ int64_t* ptr_load_3 = data_load_3;
+ float64_t* ptr_load_30 = data_load_30;
+ uint8_t* ptr_load_31 = data_load_31;
+ float16_t* ptr_load_32 = data_load_32;
+ int32_t* ptr_load_33 = data_load_33;
+ int32_t* ptr_load_34 = data_load_34;
+ int16_t* ptr_load_35 = data_load_35;
+ uint16_t* ptr_load_36 = data_load_36;
+ uint64_t* ptr_load_37 = data_load_37;
+ uint64_t* ptr_load_38 = data_load_38;
+ float64_t* ptr_load_39 = data_load_39;
+ float16_t* ptr_load_4 = data_load_4;
+ float64_t* ptr_load_40 = data_load_40;
+ int8_t* ptr_load_41 = data_load_41;
+ uint64_t* ptr_load_42 = data_load_42;
+ uint64_t* ptr_load_43 = data_load_43;
+ int8_t* ptr_load_44 = data_load_44;
+ int8_t* ptr_load_45 = data_load_45;
+ int8_t* ptr_load_46 = data_load_46;
+ uint32_t* ptr_load_47 = data_load_47;
+ uint64_t* ptr_load_48 = data_load_48;
+ int16_t* ptr_load_49 = data_load_49;
+ uint16_t* ptr_load_5 = data_load_5;
+ uint16_t* ptr_load_50 = data_load_50;
+ uint16_t* ptr_load_51 = data_load_51;
+ uint16_t* ptr_load_52 = data_load_52;
+ uint16_t* ptr_load_53 = data_load_53;
+ int16_t* ptr_load_54 = data_load_54;
+ int64_t* ptr_load_55 = data_load_55;
+ float64_t* ptr_load_56 = data_load_56;
+ float32_t* ptr_load_57 = data_load_57;
+ int16_t* ptr_load_58 = data_load_58;
+ int16_t* ptr_load_59 = data_load_59;
+ uint16_t* ptr_load_6 = data_load_6;
+ int8_t* ptr_load_60 = data_load_60;
+ float32_t* ptr_load_61 = data_load_61;
+ int32_t* ptr_load_62 = data_load_62;
+ uint32_t* ptr_load_63 = data_load_63;
+ int16_t* ptr_load_64 = data_load_64;
+ uint32_t* ptr_load_65 = data_load_65;
+ uint8_t* ptr_load_66 = data_load_66;
+ uint64_t* ptr_load_67 = data_load_67;
+ int8_t* ptr_load_68 = data_load_68;
+ float32_t* ptr_load_69 = data_load_69;
+ uint8_t* ptr_load_7 = data_load_7;
+ int16_t* ptr_load_70 = data_load_70;
+ int16_t* ptr_load_71 = data_load_71;
+ uint32_t* ptr_load_72 = data_load_72;
+ uint32_t* ptr_load_73 = data_load_73;
+ int32_t* ptr_load_74 = data_load_74;
+ int16_t* ptr_load_75 = data_load_75;
+ int8_t* ptr_load_76 = data_load_76;
+ float32_t* ptr_load_77 = data_load_77;
+ uint8_t* ptr_load_78 = data_load_78;
+ int8_t* ptr_load_79 = data_load_79;
+ float32_t* ptr_load_8 = data_load_8;
+ float16_t* ptr_load_9 = data_load_9;
+ int32_t* ptr_store_vreg_0 = data_store_vreg_0;
+ int8_t* ptr_store_vreg_1 = data_store_vreg_1;
+ int16_t* ptr_store_vreg_10 = data_store_vreg_10;
+ uint32_t* ptr_store_vreg_11 = data_store_vreg_11;
+ int32_t* ptr_store_vreg_12 = data_store_vreg_12;
+ uint32_t* ptr_store_vreg_13 = data_store_vreg_13;
+ int64_t* ptr_store_vreg_14 = data_store_vreg_14;
+ float32_t* ptr_store_vreg_15 = data_store_vreg_15;
+ int32_t* ptr_store_vreg_16 = data_store_vreg_16;
+ float32_t* ptr_store_vreg_17 = data_store_vreg_17;
+ float64_t* ptr_store_vreg_18 = data_store_vreg_18;
+ float64_t* ptr_store_vreg_19 = data_store_vreg_19;
+ float16_t* ptr_store_vreg_2 = data_store_vreg_2;
+ int8_t* ptr_store_vreg_20 = data_store_vreg_20;
+ float64_t* ptr_store_vreg_21 = data_store_vreg_21;
+ int16_t* ptr_store_vreg_22 = data_store_vreg_22;
+ uint16_t* ptr_store_vreg_23 = data_store_vreg_23;
+ int8_t* ptr_store_vreg_24 = data_store_vreg_24;
+ int16_t* ptr_store_vreg_25 = data_store_vreg_25;
+ uint8_t* ptr_store_vreg_26 = data_store_vreg_26;
+ int16_t* ptr_store_vreg_27 = data_store_vreg_27;
+ int32_t* ptr_store_vreg_28 = data_store_vreg_28;
+ int16_t* ptr_store_vreg_29 = data_store_vreg_29;
+ uint16_t* ptr_store_vreg_3 = data_store_vreg_3;
+ float32_t* ptr_store_vreg_30 = data_store_vreg_30;
+ int8_t* ptr_store_vreg_31 = data_store_vreg_31;
+ uint64_t* ptr_store_vreg_32 = data_store_vreg_32;
+ int8_t* ptr_store_vreg_33 = data_store_vreg_33;
+ int8_t* ptr_store_vreg_34 = data_store_vreg_34;
+ uint64_t* ptr_store_vreg_35 = data_store_vreg_35;
+ int8_t* ptr_store_vreg_36 = data_store_vreg_36;
+ uint16_t* ptr_store_vreg_37 = data_store_vreg_37;
+ int64_t* ptr_store_vreg_38 = data_store_vreg_38;
+ float16_t* ptr_store_vreg_39 = data_store_vreg_39;
+ float32_t* ptr_store_vreg_4 = data_store_vreg_4;
+ float16_t* ptr_store_vreg_40 = data_store_vreg_40;
+ uint8_t* ptr_store_vreg_41 = data_store_vreg_41;
+ uint16_t* ptr_store_vreg_42 = data_store_vreg_42;
+ int8_t* ptr_store_vreg_5 = data_store_vreg_5;
+ float64_t* ptr_store_vreg_6 = data_store_vreg_6;
+ int8_t* ptr_store_vreg_7 = data_store_vreg_7;
+ int8_t* ptr_store_vreg_8 = data_store_vreg_8;
+ int32_t* ptr_store_vreg_9 = data_store_vreg_9;
+ uint16_t* ptr_store_vreg_memory_1 = data_store_vreg_memory_1;
+ float16_t* ptr_store_vreg_memory_11 = data_store_vreg_memory_11;
+ int16_t* ptr_store_vreg_memory_13 = data_store_vreg_memory_13;
+ int64_t* ptr_store_vreg_memory_16 = data_store_vreg_memory_16;
+ float32_t* ptr_store_vreg_memory_17 = data_store_vreg_memory_17;
+ uint16_t* ptr_store_vreg_memory_2 = data_store_vreg_memory_2;
+ int32_t* ptr_store_vreg_memory_21 = data_store_vreg_memory_21;
+ float16_t* ptr_store_vreg_memory_24 = data_store_vreg_memory_24;
+ int16_t* ptr_store_vreg_memory_28 = data_store_vreg_memory_28;
+ int64_t* ptr_store_vreg_memory_3 = data_store_vreg_memory_3;
+ float64_t* ptr_store_vreg_memory_30 = data_store_vreg_memory_30;
+ uint8_t* ptr_store_vreg_memory_31 = data_store_vreg_memory_31;
+ int32_t* ptr_store_vreg_memory_33 = data_store_vreg_memory_33;
+ uint16_t* ptr_store_vreg_memory_36 = data_store_vreg_memory_36;
+ uint64_t* ptr_store_vreg_memory_38 = data_store_vreg_memory_38;
+ float64_t* ptr_store_vreg_memory_40 = data_store_vreg_memory_40;
+ int8_t* ptr_store_vreg_memory_41 = data_store_vreg_memory_41;
+ uint64_t* ptr_store_vreg_memory_42 = data_store_vreg_memory_42;
+ uint32_t* ptr_store_vreg_memory_47 = data_store_vreg_memory_47;
+ uint16_t* ptr_store_vreg_memory_5 = data_store_vreg_memory_5;
+ int8_t* ptr_store_vreg_memory_60 = data_store_vreg_memory_60;
+ float32_t* ptr_store_vreg_memory_61 = data_store_vreg_memory_61;
+ uint32_t* ptr_store_vreg_memory_63 = data_store_vreg_memory_63;
+ uint8_t* ptr_store_vreg_memory_7 = data_store_vreg_memory_7;
+ uint32_t* ptr_store_vreg_memory_72 = data_store_vreg_memory_72;
+ float32_t* ptr_store_vreg_memory_8 = data_store_vreg_memory_8;
+ float16_t* ptr_store_vreg_memory_9 = data_store_vreg_memory_9;
+ for (size_t vl; avl1 > 0; avl1 -= vl){
+ vl = __riscv_vsetvl_e64m1(avl1);
+ vint8mf8_t mask_value= __riscv_vle8_v_i8mf8(ptr_mask, vl);
+ vbool64_t vmask= __riscv_vmseq_vx_i8mf8_b64(mask_value, 1, vl);
+ vint64m4_t vreg_memory_0 = __riscv_vle64_v_i64m4(ptr_load_0, vl);
+ vuint32mf2_t idx_0 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ vuint16mf4_t vreg_memory_1 = __riscv_vluxei32_v_u16mf4_m(vmask, ptr_load_1, idx_0, vl);
+ vuint16mf4_t idx_1 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 1, vl);
+ vuint16mf4_t vreg_memory_2 = __riscv_vluxei16_v_u16mf4(ptr_load_2, idx_1, vl);
+ vuint32mf2_t idx_2 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ vint64m1_t vreg_memory_3 = __riscv_vluxei32_v_i64m1(ptr_load_3, idx_2, vl);
+ vfloat16m1_t vreg_memory_4 = __riscv_vle16_v_f16m1(ptr_load_4, vl);
+ vuint16mf4_t vreg_memory_5 = __riscv_vle16_v_u16mf4_m(vmask, ptr_load_5, vl);
+ vuint16mf4_t vreg_memory_6 = __riscv_vle16_v_u16mf4(ptr_load_6, vl);
+ vuint16mf4_t idx_5 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 0, vl);
+ vuint8mf8_t vreg_memory_7 = __riscv_vluxei16_v_u8mf8(ptr_load_7, idx_5, vl);
+ vuint8mf8_t idx_8 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 2, vl);
+ vfloat32mf2_t vreg_memory_8 = __riscv_vloxei8_v_f32mf2(ptr_load_8, idx_8, vl);
+ vuint16mf4_t idx_9 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 1, vl);
+ vfloat16mf4_t vreg_memory_9 = __riscv_vluxei16_v_f16mf4(ptr_load_9, idx_9, vl);
+ vint8m2_t vreg_memory_10 = __riscv_vle8_v_i8m2(ptr_load_10, vl);
+ vuint64m1_t idx_10 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 1, vl);
+ vfloat16mf4_t vreg_memory_11 = __riscv_vluxei64_v_f16mf4_m(vmask, ptr_load_11, idx_10, vl);
+ vuint64m1_t vreg_memory_12 = __riscv_vle64_v_u64m1(ptr_load_12, vl);
+ vint16mf4_t vreg_memory_13 = __riscv_vle16_v_i16mf4(ptr_load_13, vl);
+ vuint16mf4_t idx_15 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 0, vl);
+ vuint8mf8_t vreg_memory_14 = __riscv_vloxei16_v_u8mf8(ptr_load_14, idx_15, vl);
+ vuint8m1_t vreg_memory_15 = __riscv_vle8_v_u8m1(ptr_load_15, vl);
+ vint64m1_t vreg_memory_16 = __riscv_vle64_v_i64m1(ptr_load_16, vl);
+ vfloat32mf2_t vreg_memory_17 = __riscv_vle32_v_f32mf2(ptr_load_17, vl);
+ vint32m1_t vreg_memory_18 = __riscv_vle32_v_i32m1(ptr_load_18, vl);
+ vint64m1_t vreg_memory_19 = __riscv_vle64_v_i64m1(ptr_load_19, vl);
+ vuint32mf2_t idx_20 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 0, vl);
+ vuint8mf8_t vreg_memory_20 = __riscv_vluxei32_v_u8mf8(ptr_load_20, idx_20, vl);
+ vint32mf2_t vreg_memory_21 = __riscv_vle32_v_i32mf2(ptr_load_21, vl);
+ vint32m1_t vreg_memory_22 = __riscv_vle32_v_i32m1(ptr_load_22, vl);
+ vuint32m4_t vreg_memory_23 = __riscv_vle32_v_u32m4(ptr_load_23, vl);
+ vfloat16mf4_t vreg_memory_24 = __riscv_vle16_v_f16mf4(ptr_load_24, vl);
+ vuint64m1_t idx_21 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 3, vl);
+ vint64m1_t vreg_memory_25 = __riscv_vloxei64_v_i64m1_m(vmask, ptr_load_25, idx_21, vl);
+ vuint64m1_t idx_24 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 1, vl);
+ vint16mf4_t vreg_memory_26 = __riscv_vluxei64_v_i16mf4(ptr_load_26, idx_24, vl);
+ vint16mf4_t vreg_memory_27 = __riscv_vlse16_v_i16mf4_m(vmask, ptr_load_27, 2, vl);
+ vint16mf4_t vreg_memory_28 = __riscv_vle16_v_i16mf4(ptr_load_28, vl);
+ vfloat32m2_t vreg_memory_29 = __riscv_vle32_v_f32m2(ptr_load_29, vl);
+ vuint32mf2_t idx_29 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ vfloat64m1_t vreg_memory_30 = __riscv_vluxei32_v_f64m1_m(vmask, ptr_load_30, idx_29, vl);
+ vuint64m1_t idx_30 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 0, vl);
+ vuint8mf8_t vreg_memory_31 = __riscv_vloxei64_v_u8mf8_m(vmask, ptr_load_31, idx_30, vl);
+ vuint8mf8_t idx_31 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ vfloat16mf4_t vreg_memory_32 = __riscv_vluxei8_v_f16mf4_m(vmask, ptr_load_32, idx_31, vl);
+ vuint8mf8_t idx_34 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 2, vl);
+ vint32mf2_t vreg_memory_33 = __riscv_vloxei8_v_i32mf2_m(vmask, ptr_load_33, idx_34, vl);
+ vuint16mf4_t idx_35 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 2, vl);
+ vint32mf2_t vreg_memory_34 = __riscv_vluxei16_v_i32mf2(ptr_load_34, idx_35, vl);
+ vint16mf4_t vreg_memory_35 = __riscv_vlse16_v_i16mf4(ptr_load_35, 2, vl);
+ vuint16mf4_t vreg_memory_36 = __riscv_vlse16_v_u16mf4(ptr_load_36, 2, vl);
+ vreg_memory_36 = __riscv_vremu_vx_u16mf4(vreg_memory_36, (uint16_t)(vl), vl);
+ vuint64m1_t vreg_memory_37 = __riscv_vle64_v_u64m1(ptr_load_37, vl);
+ vuint64m1_t vreg_memory_38 = __riscv_vlse64_v_u64m1(ptr_load_38, 8, vl);
+ vreg_memory_38 = __riscv_vremu_vx_u64m1(vreg_memory_38, (uint64_t)(vl), vl);
+ vfloat64m1_t vreg_memory_39 = __riscv_vlse64_v_f64m1(ptr_load_39, 8, vl);
+ vfloat64m1_t vreg_memory_40 = __riscv_vlse64_v_f64m1(ptr_load_40, 8, vl);
+ vint8mf8_t vload_tmp_3 = __riscv_vle8_v_i8mf8(ptr_load_41, vl);
+ vbool64_t vreg_memory_41 = __riscv_vmseq_vx_i8mf8_b64(vload_tmp_3, 1, vl);
+ vuint64m1_t vreg_memory_42 = __riscv_vle64_v_u64m1(ptr_load_42, vl);
+ vuint64m1_t vreg_memory_43 = __riscv_vle64_v_u64m1(ptr_load_43, vl);
+ vint8mf8_t vload_tmp_5 = __riscv_vle8_v_i8mf8(ptr_load_44, vl);
+ vbool64_t vreg_memory_44 = __riscv_vmseq_vx_i8mf8_b64(vload_tmp_5, 1, vl);
+ vint8mf8_t vload_tmp_7 = __riscv_vle8_v_i8mf8(ptr_load_45, vl);
+ vbool64_t vreg_memory_45 = __riscv_vmseq_vx_i8mf8_b64(vload_tmp_7, 1, vl);
+ vint8m1_t vreg_memory_46 = __riscv_vle8_v_i8m1(ptr_load_46, vl);
+ vuint32mf2_t vreg_memory_47 = __riscv_vle32_v_u32mf2(ptr_load_47, vl);
+ vuint64m1_t vreg_memory_48 = __riscv_vle64_v_u64m1(ptr_load_48, vl);
+ vint16mf4_t vreg_memory_49 = __riscv_vle16_v_i16mf4(ptr_load_49, vl);
+ vuint16mf4_t vreg_memory_50 = __riscv_vlse16_v_u16mf4(ptr_load_50, 2, vl);
+ vuint32mf2_t idx_50 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ vuint16mf4_t vreg_memory_51 = __riscv_vluxei32_v_u16mf4(ptr_load_51, idx_50, vl);
+ vuint16m8_t vreg_memory_52 = __riscv_vle16_v_u16m8(ptr_load_52, vl);
+ vuint16m1_t vreg_memory_53 = __riscv_vle16_v_u16m1(ptr_load_53, vl);
+ vint16m2_t vreg_memory_54 = __riscv_vle16_v_i16m2(ptr_load_54, vl);
+ vuint64m1_t idx_51 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 3, vl);
+ vint64m1_t vreg_memory_55 = __riscv_vloxei64_v_i64m1(ptr_load_55, idx_51, vl);
+ vfloat64m1_t vreg_memory_56 = __riscv_vlse64_v_f64m1(ptr_load_56, 8, vl);
+ vfloat32mf2_t vreg_memory_57 = __riscv_vlse32_v_f32mf2(ptr_load_57, 4, vl);
+ vuint8mf8_t idx_56 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ vint16mf4_t vreg_memory_58 = __riscv_vluxei8_v_i16mf4(ptr_load_58, idx_56, vl);
+ vint16mf4_t vreg_memory_59 = __riscv_vle16_v_i16mf4(ptr_load_59, vl);
+ vuint32mf2_t idx_59 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 0, vl);
+ vint8mf8_t vreg_memory_60 = __riscv_vluxei32_v_i8mf8_m(vmask, ptr_load_60, idx_59, vl);
+ vfloat32mf2_t vreg_memory_61 = __riscv_vle32_v_f32mf2(ptr_load_61, vl);
+ vint32m2_t vreg_memory_62 = __riscv_vle32_v_i32m2(ptr_load_62, vl);
+ vuint32mf2_t vreg_memory_63 = __riscv_vle32_v_u32mf2_m(vmask, ptr_load_63, vl);
+ vuint32mf2_t idx_62 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ vint16mf4_t vreg_memory_64 = __riscv_vluxei32_v_i16mf4(ptr_load_64, idx_62, vl);
+ vuint32m1_t vreg_memory_65 = __riscv_vle32_v_u32m1(ptr_load_65, vl);
+ vuint8m2_t vreg_memory_66 = __riscv_vle8_v_u8m2(ptr_load_66, vl);
+ vuint64m8_t vreg_memory_67 = __riscv_vle64_v_u64m8(ptr_load_67, vl);
+ vint8m2_t vreg_memory_68 = __riscv_vle8_v_i8m2(ptr_load_68, vl);
+ vuint32mf2_t idx_67 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 2, vl);
+ vfloat32mf2_t vreg_memory_69 = __riscv_vloxei32_v_f32mf2(ptr_load_69, idx_67, vl);
+ vuint16mf4_t idx_68 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 1, vl);
+ vint16mf4_t vreg_memory_70 = __riscv_vluxei16_v_i16mf4(ptr_load_70, idx_68, vl);
+ vint16mf4_t vreg_memory_71 = __riscv_vlse16_v_i16mf4(ptr_load_71, 2, vl);
+ vuint32mf2_t vreg_memory_72 = __riscv_vle32_v_u32mf2_m(vmask, ptr_load_72, vl);
+ vuint32mf2_t vreg_memory_73 = __riscv_vlse32_v_u32mf2_m(vmask, ptr_load_73, 4, vl);
+ vint32mf2_t vreg_memory_74 = __riscv_vle32_v_i32mf2(ptr_load_74, vl);
+ vint16m1_t vreg_memory_75 = __riscv_vle16_v_i16m1(ptr_load_75, vl);
+ vint8mf8_t vload_tmp_12 = __riscv_vle8_v_i8mf8(ptr_load_76, vl);
+ vbool64_t vreg_memory_76 = __riscv_vmseq_vx_i8mf8_b64(vload_tmp_12, 1, vl);
+ vuint64m1_t idx_75 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 2, vl);
+ vfloat32mf2_t vreg_memory_77 = __riscv_vluxei64_v_f32mf2_m(vmask, ptr_load_77, idx_75, vl);
+ vuint8mf8_t vreg_memory_78 = __riscv_vlse8_v_u8mf8(ptr_load_78, 1, vl);
+ vint8m1_t vload_tmp_13 = __riscv_vle8_v_i8m1(ptr_load_79, vl);
+ vbool8_t vreg_memory_79 = __riscv_vmseq_vx_i8m1_b8(vload_tmp_13, 1, vl);
+ vint32m4_t vreg_0 = __riscv_vreinterpret_v_i64m4_i32m4(vreg_memory_0);
+ vreg_memory_2 = __riscv_vmadd_vx_u16mf4_m(vmask, vreg_memory_1, 65136, vreg_memory_2, vl);
+ vbool64_t vreg_1 = __riscv_vmsge_vx_i64m1_b64(vreg_memory_3, -8444588278415581228ll, vl);
+ vfloat16m4_t vreg_2 = __riscv_vlmul_ext_v_f16m1_f16m4(vreg_memory_4);
+ vuint16mf4_t vreg_3 = __riscv_vmadd_vv_u16mf4_m(vmask, vreg_memory_5, vreg_memory_1, vreg_memory_6, vl);
+ vreg_memory_7 = __riscv_vslide1down_vx_u8mf8(vreg_memory_7, 43, vl);
+ vfloat32mf2_t vreg_4 = __riscv_vfwnmacc_vf_f32mf2_rm_m(vmask, vreg_memory_8, convert_binary_u16_f16(63541), vreg_memory_9, __RISCV_FRM_RNE, vl);
+ vint8mf8_t vreg_5 = __riscv_vlmul_trunc_v_i8m2_i8mf8(vreg_memory_10);
+ vreg_memory_9 = __riscv_vfmin_vf_f16mf4(vreg_memory_11, convert_binary_u16_f16(5566), vl);
+ vfloat64m1_t vreg_6 = __riscv_vreinterpret_v_u64m1_f64m1(vreg_memory_12);
+ vreg_memory_13 = __riscv_vwmaccsu_vv_i16mf4(vreg_memory_13, vreg_5, vreg_memory_14, vl);
+ vbool2_t vreg_7 = __riscv_vreinterpret_v_u8m1_b2(vreg_memory_15);
+ vbool64_t vreg_8 = __riscv_vreinterpret_v_i64m1_b64(vreg_memory_16);
+ vreg_4 = __riscv_vslidedown_vx_f32mf2(vreg_memory_17, 953680954u, vl);
+ vint32mf2_t vreg_9 = __riscv_vlmul_trunc_v_i32m1_i32mf2(vreg_memory_18);
+ vreg_memory_16 = __riscv_vor_vv_i64m1(vreg_memory_16, vreg_memory_19, vl);
+ vreg_9 = __riscv_vnmsub_vx_i32mf2(vreg_9, 1243647907, vreg_9, vl);
+ vreg_memory_13 = __riscv_vadc_vxm_i16mf4(vreg_memory_13, 30141, vreg_1, vl);
+ vreg_memory_11 = __riscv_vfsgnj_vf_f16mf4_m(vmask, vreg_memory_9, convert_binary_u16_f16(20419), vl);
+ vint16mf4_t vreg_10 = __riscv_vncvt_x_x_w_i16mf4(vreg_9, vl);
+ vuint32mf2_t vreg_11 = __riscv_vid_v_u32mf2(vl);
+ vreg_memory_1 = __riscv_vwsubu_wv_u16mf4(vreg_memory_6, vreg_memory_20, vl);
+ vint32m1_t vreg_12 = __riscv_vredmin_vs_i32mf2_i32m1(vreg_memory_21, vreg_memory_22, vl);
+ vuint32mf2_t vreg_13 = __riscv_vlmul_trunc_v_u32m4_u32mf2(vreg_memory_23);
+ vreg_memory_17 = __riscv_vfwnmsac_vf_f32mf2_rm(vreg_4, convert_binary_u16_f16(13771), vreg_memory_24, __RISCV_FRM_RNE, vl);
+ vint64m1_t vreg_14 = __riscv_vnmsub_vx_i64m1_m(vmask, vreg_memory_3, -7398488331651941832ll, vreg_memory_25, vl);
+ vreg_memory_13 = __riscv_vmacc_vv_i16mf4(vreg_memory_26, vreg_memory_27, vreg_memory_28, vl);
+ vreg_memory_5 = __riscv_vor_vx_u16mf4(vreg_memory_6, 50306, vl);
+ vfloat32m4_t vreg_15 = __riscv_vlmul_ext_v_f32m2_f32m4(vreg_memory_29);
+ vreg_memory_21 = __riscv_vundefined_i32mf2();
+ vreg_9 = __riscv_vrsub_vx_i32mf2(vreg_9, 321778147, vl);
+ vreg_memory_13 = __riscv_vsadd_vv_i16mf4(vreg_memory_13, vreg_memory_27, vl);
+ vint32mf2_t vreg_16 = __riscv_vfcvt_x_f_v_i32mf2_rm_m(vmask, vreg_4, __RISCV_FRM_RNE, vl);
+ vreg_memory_9 = __riscv_vfneg_v_f16mf4(vreg_memory_24, vl);
+ vreg_memory_30 = __riscv_vfsub_vf_f64m1_m(vmask, vreg_memory_30, convert_binary_u64_f64(45746ull), vl);
+ vreg_memory_31 = __riscv_vnsrl_wv_u8mf8(vreg_memory_6, vreg_memory_31, vl);
+ vreg_memory_8 = __riscv_vfwmacc_vv_f32mf2(vreg_memory_8, vreg_memory_32, vreg_memory_11, vl);
+ vreg_3 = __riscv_vmul_vv_u16mf4_m(vmask, vreg_memory_6, vreg_memory_5, vl);
+ vreg_16 = __riscv_vnmsac_vv_i32mf2_m(vmask, vreg_memory_33, vreg_16, vreg_memory_34, vl);
+ vreg_10 = __riscv_vrgather_vv_i16mf4(vreg_memory_35, vreg_memory_36, vl);
+ vreg_memory_38 = __riscv_vrgather_vv_u64m1(vreg_memory_37, vreg_memory_38, vl);
+ vfloat32mf2_t vreg_17 = __riscv_vfnmsac_vf_f32mf2_rm(vreg_4, convert_binary_u32_f32(10330u), vreg_memory_17, __RISCV_FRM_RNE, vl);
+ vfloat64m1_t vreg_18 = __riscv_vfnmacc_vv_f64m1(vreg_memory_30, vreg_memory_30, vreg_memory_39, vl);
+ vfloat64m1_t vreg_19 = __riscv_vcompress_vm_f64m1(vreg_memory_40, vreg_memory_41, vl);
+ vreg_1 = __riscv_vmsbc_vvm_i8mf8_b64(vreg_5, vreg_5, vreg_1, vl);
+ vreg_memory_42 = __riscv_vslideup_vx_u64m1(vreg_memory_42, vreg_memory_43, 102664729u, vl);
+ vreg_memory_41 = __riscv_vmorn_mm_b64(vreg_1, vreg_memory_44, vl);
+ vreg_memory_40 = __riscv_vfwnmacc_vv_f64m1_rm_m(vmask, vreg_memory_39, vreg_memory_8, vreg_4, __RISCV_FRM_RNE, vl);
+ vreg_1 = __riscv_vmadc_vxm_i64m1_b64(vreg_memory_19, -6991190491244929085ll, vreg_memory_45, vl);
+ vreg_memory_24 = __riscv_vfmv_s_f_f16mf4(convert_binary_u16_f16(58872), vl);
+ vbool64_t vreg_20 = __riscv_vreinterpret_v_i8m1_b64(vreg_memory_46);
+ vreg_memory_11 = __riscv_vfmax_vf_f16mf4(vreg_memory_32, convert_binary_u16_f16(3391), vl);
+ vreg_memory_42 = __riscv_vwredsumu_vs_u32mf2_u64m1(vreg_memory_47, vreg_memory_48, vl);
+ vreg_5 = __riscv_vmv_v_x_i8mf8(-52, vl);
+ vreg_memory_9 = __riscv_vmv_v_v_f16mf4(vreg_memory_11, vl);
+ vfloat64m1_t vreg_21 = __riscv_vfmv_s_f_f64m1(convert_binary_u64_f64(30491ull), vl);
+ vint16m8_t vreg_22 = __riscv_vlmul_ext_v_i16mf4_i16m8(vreg_memory_49);
+ vreg_5 = __riscv_vrsub_vx_i8mf8(vreg_5, 109, vl);
+ vreg_memory_36 = __riscv_vslideup_vx_u16mf4_m(vmask, vreg_memory_50, vreg_memory_51, 539231139u, vl);
+ vuint16mf4_t vreg_23 = __riscv_vlmul_trunc_v_u16m8_u16mf4(vreg_memory_52);
+ vbool64_t vreg_24 = __riscv_vreinterpret_v_u16m1_b64(vreg_memory_53);
+ vint16m1_t vreg_25 = __riscv_vlmul_trunc_v_i16m2_i16m1(vreg_memory_54);
+ vuint8mf8_t vreg_26 = __riscv_vmulhu_vx_u8mf8_m(vmask, vreg_memory_14, 116, vl);
+ vreg_memory_3 = __riscv_vnmsub_vx_i64m1_m(vmask, vreg_memory_55, 7037333148368913704ll, vreg_memory_3, vl);
+ vreg_18 = __riscv_vfmacc_vf_f64m1_rm_m(vmask, vreg_memory_56, convert_binary_u64_f64(14616ull), vreg_18, __RISCV_FRM_RNE, vl);
+ vreg_9 = __riscv_vsra_vx_i32mf2(vreg_9, 2782143639u, vl);
+ vreg_19 = __riscv_vfwmsac_vf_f64m1_m(vmask, vreg_memory_30, convert_binary_u32_f32(52139u), vreg_memory_57, vl);
+ vreg_5 = __riscv_vsadd_vv_i8mf8_m(vmask, vreg_5, vreg_5, vl);
+ vint16mf4_t vreg_27 = __riscv_vnmsac_vv_i16mf4_m(vmask, vreg_memory_58, vreg_memory_59, vreg_memory_49, vl);
+ vreg_memory_60 = __riscv_vnot_v_i8mf8_m(vmask, vreg_memory_60, vl);
+ vreg_memory_60 = __riscv_vnsra_wv_i8mf8(vreg_memory_35, vreg_memory_14, vl);
+ vreg_4 = __riscv_vfnmacc_vf_f32mf2(vreg_17, convert_binary_u32_f32(9735u), vreg_memory_61, vl);
+ vreg_5 = __riscv_vor_vv_i8mf8(vreg_memory_60, vreg_memory_60, vl);
+ vint32m4_t vreg_28 = __riscv_vlmul_ext_v_i32m2_i32m4(vreg_memory_62);
+ vreg_memory_47 = __riscv_vmacc_vv_u32mf2_m(vmask, vreg_11, vreg_13, vreg_memory_63, vl);
+ vint16mf4_t vreg_29 = __riscv_vslide1up_vx_i16mf4(vreg_memory_64, 4280, vl);
+ vreg_5 = __riscv_vdiv_vx_i8mf8_m(vmask, vreg_5, -37, vl);
+ vfloat32mf2_t vreg_30 = __riscv_vfnmacc_vv_f32mf2_rm_m(vmask, vreg_memory_8, vreg_memory_61, vreg_memory_57, __RISCV_FRM_RNE, vl);
+ vbool8_t vreg_31 = __riscv_vreinterpret_v_u32m1_b8(vreg_memory_65);
+ vuint64m2_t vreg_32 = __riscv_vreinterpret_v_u8m2_u64m2(vreg_memory_66);
+ vint8mf8_t vreg_33 = __riscv_vnmsac_vv_i8mf8(vreg_memory_60, vreg_5, vreg_memory_60, vl);
+ vint8mf8_t vreg_34 = __riscv_vsmul_vv_i8mf8(vreg_33, vreg_33, __RISCV_VXRM_RNU, vl);
+ vreg_memory_61 = __riscv_vfmsac_vv_f32mf2_m(vmask, vreg_17, vreg_17, vreg_30, vl);
+ vreg_memory_16 = __riscv_vasub_vv_i64m1(vreg_14, vreg_memory_25, __RISCV_VXRM_RNU, vl);
+ vuint64m2_t vreg_35 = __riscv_vlmul_trunc_v_u64m8_u64m2(vreg_memory_67);
+ vint8mf8_t vreg_36 = __riscv_vlmul_trunc_v_i8m2_i8mf8(vreg_memory_68);
+ vreg_memory_63 = __riscv_vfncvt_xu_f_w_u32mf2(vreg_memory_30, vl);
+ vreg_23 = __riscv_vfncvt_rtz_xu_f_w_u16mf4_m(vmask, vreg_memory_69, vl);
+ vuint16mf4_t vreg_37 = __riscv_vnmsub_vx_u16mf4_m(vmask, vreg_3, 61201, vreg_memory_50, vl);
+ vreg_memory_28 = __riscv_vsub_vv_i16mf4(vreg_memory_70, vreg_memory_71, vl);
+ vreg_memory_72 = __riscv_vnmsac_vv_u32mf2(vreg_memory_72, vreg_11, vreg_memory_73, vl);
+ vreg_memory_33 = __riscv_vrem_vv_i32mf2(vreg_16, vreg_memory_74, vl);
+ vint64m1_t vreg_38 = __riscv_vreinterpret_v_i16m1_i64m1(vreg_memory_75);
+ vfloat16mf4_t vreg_39 = __riscv_vfmerge_vfm_f16mf4(vreg_memory_32, convert_binary_u16_f16(37406), vreg_memory_76, vl);
+ vfloat16mf4_t vreg_40 = __riscv_vfcvt_f_xu_v_f16mf4_rm_m(vmask, vreg_memory_5, __RISCV_FRM_RNE, vl);
+ vreg_4 = __riscv_vfmsub_vf_f32mf2_rm(vreg_memory_77, convert_binary_u32_f32(21630u), vreg_4, __RISCV_FRM_RNE, vl);
+ vuint8mf8_t vreg_41 = __riscv_vslide1up_vx_u8mf8(vreg_memory_78, 132, vl);
+ vreg_memory_63 = __riscv_vwmulu_vx_u32mf2(vreg_23, 52729, vl);
+ vuint16m1_t vreg_42 = __riscv_vreinterpret_v_b8_u16m1(vreg_memory_79);
+ vreg_memory_7 = __riscv_vasubu_vx_u8mf8(vreg_memory_20, 76, __RISCV_VXRM_RNU, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_memory_2, vreg_memory_2, vl);
+ vint8mf8_t zero_0 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_0 = __riscv_vmerge_vxm_i8mf8(zero_0, 1, vreg_1, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_1, vstore_tmp_0, vl);
+ vuint32mf2_t idx_4 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ __riscv_vsuxei32_v_u16mf4_m(vmask, ptr_store_vreg_3, idx_4, vreg_3, vl);
+ vuint8mf8_t idx_7 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 0, vl);
+ __riscv_vsoxei8_v_u8mf8(ptr_store_vreg_memory_7, idx_7, vreg_memory_7, vl);
+ __riscv_vse32_v_f32mf2(ptr_store_vreg_4, vreg_4, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_5, vreg_5, vl);
+ vuint64m1_t idx_12 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 1, vl);
+ __riscv_vsoxei64_v_f16mf4(ptr_store_vreg_memory_9, idx_12, vreg_memory_9, vl);
+ vuint8mf8_t idx_17 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ __riscv_vsuxei8_v_i16mf4(ptr_store_vreg_memory_13, idx_17, vreg_memory_13, vl);
+ vint8m4_t zero_1 = __riscv_vmv_v_x_i8m4(0, __riscv_vsetvlmax_e8m4());
+ vint8m4_t vstore_tmp_1 = __riscv_vmerge_vxm_i8m4(zero_1, 1, vreg_7, vl);
+ __riscv_vse8_v_i8m4(ptr_store_vreg_7, vstore_tmp_1, vl);
+ vint8mf8_t zero_2 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_2 = __riscv_vmerge_vxm_i8mf8(zero_2, 1, vreg_8, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_8, vstore_tmp_2, vl);
+ __riscv_vsse32_v_f32mf2(ptr_store_vreg_4, 4, vreg_4, vl);
+ __riscv_vse32_v_i32mf2(ptr_store_vreg_9, vreg_9, vl);
+ __riscv_vse64_v_i64m1(ptr_store_vreg_memory_16, vreg_memory_16, vl);
+ __riscv_vse32_v_i32mf2(ptr_store_vreg_9, vreg_9, vl);
+ __riscv_vsse16_v_i16mf4(ptr_store_vreg_memory_13, 2, vreg_memory_13, vl);
+ __riscv_vse16_v_f16mf4_m(vmask, ptr_store_vreg_memory_11, vreg_memory_11, vl);
+ vuint32mf2_t idx_19 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ __riscv_vsoxei32_v_i16mf4(ptr_store_vreg_10, idx_19, vreg_10, vl);
+ __riscv_vse32_v_u32mf2_m(vmask, ptr_store_vreg_11, vreg_11, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_memory_1, vreg_memory_1, vl);
+ __riscv_vse32_v_i32m1(ptr_store_vreg_12, vreg_12, vl);
+ __riscv_vse32_v_u32mf2(ptr_store_vreg_13, vreg_13, vl);
+ __riscv_vse32_v_f32mf2(ptr_store_vreg_memory_17, vreg_memory_17, vl);
+ vuint8mf8_t idx_23 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 3, vl);
+ __riscv_vsoxei8_v_i64m1(ptr_store_vreg_14, idx_23, vreg_14, vl);
+ vuint64m1_t idx_26 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 1, vl);
+ __riscv_vsuxei64_v_i16mf4(ptr_store_vreg_memory_13, idx_26, vreg_memory_13, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_memory_5, vreg_memory_5, vl);
+ __riscv_vse32_v_i32mf2_m(vmask, ptr_store_vreg_9, vreg_9, vl);
+ vuint32mf2_t idx_28 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ __riscv_vsoxei32_v_i16mf4(ptr_store_vreg_memory_13, idx_28, vreg_memory_13, vl);
+ __riscv_vse32_v_i32mf2_m(vmask, ptr_store_vreg_16, vreg_16, vl);
+ __riscv_vse16_v_f16mf4(ptr_store_vreg_memory_9, vreg_memory_9, vl);
+ __riscv_vsse64_v_f64m1(ptr_store_vreg_memory_30, 8, vreg_memory_30, vl);
+ __riscv_vse8_v_u8mf8(ptr_store_vreg_memory_31, vreg_memory_31, vl);
+ vuint32mf2_t idx_33 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 2, vl);
+ __riscv_vsoxei32_v_f32mf2_m(vmask, ptr_store_vreg_memory_8, idx_33, vreg_memory_8, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_3, vreg_3, vl);
+ __riscv_vsse32_v_i32mf2_m(vmask, ptr_store_vreg_16, 4, vreg_16, vl);
+ __riscv_vse16_v_i16mf4(ptr_store_vreg_10, vreg_10, vl);
+ vuint32mf2_t idx_37 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ __riscv_vsoxei32_v_u64m1(ptr_store_vreg_memory_38, idx_37, vreg_memory_38, vl);
+ __riscv_vse32_v_f32mf2_m(vmask, ptr_store_vreg_17, vreg_17, vl);
+ __riscv_vsse64_v_f64m1(ptr_store_vreg_18, 8, vreg_18, vl);
+ vuint32mf2_t idx_39 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ __riscv_vsoxei32_v_f64m1(ptr_store_vreg_19, idx_39, vreg_19, vl);
+ vint8mf8_t zero_4 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_4 = __riscv_vmerge_vxm_i8mf8(zero_4, 1, vreg_1, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_1, vstore_tmp_4, vl);
+ __riscv_vse64_v_u64m1(ptr_store_vreg_memory_42, vreg_memory_42, vl);
+ vint8mf8_t zero_6 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_6 = __riscv_vmerge_vxm_i8mf8(zero_6, 1, vreg_memory_41, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_memory_41, vstore_tmp_6, vl);
+ vuint8mf8_t idx_41 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 3, vl);
+ __riscv_vsuxei8_v_f64m1_m(vmask, ptr_store_vreg_memory_40, idx_41, vreg_memory_40, vl);
+ vint8mf8_t zero_8 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_8 = __riscv_vmerge_vxm_i8mf8(zero_8, 1, vreg_1, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_1, vstore_tmp_8, vl);
+ vuint16mf4_t idx_43 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 1, vl);
+ __riscv_vsoxei16_v_f16mf4(ptr_store_vreg_memory_24, idx_43, vreg_memory_24, vl);
+ vint8mf8_t zero_9 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_9 = __riscv_vmerge_vxm_i8mf8(zero_9, 1, vreg_20, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_20, vstore_tmp_9, vl);
+ vuint32mf2_t idx_45 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ __riscv_vsoxei32_v_f16mf4(ptr_store_vreg_memory_11, idx_45, vreg_memory_11, vl);
+ __riscv_vsse64_v_u64m1(ptr_store_vreg_memory_42, 8, vreg_memory_42, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_5, vreg_5, vl);
+ vuint8mf8_t idx_47 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ __riscv_vsoxei8_v_f16mf4(ptr_store_vreg_memory_9, idx_47, vreg_memory_9, vl);
+ vuint8mf8_t idx_49 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 3, vl);
+ __riscv_vsoxei8_v_f64m1(ptr_store_vreg_21, idx_49, vreg_21, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_5, vreg_5, vl);
+ __riscv_vsse16_v_u16mf4(ptr_store_vreg_memory_36, 2, vreg_memory_36, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_23, vreg_23, vl);
+ vint8mf8_t zero_10 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_10 = __riscv_vmerge_vxm_i8mf8(zero_10, 1, vreg_24, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_24, vstore_tmp_10, vl);
+ __riscv_vse16_v_i16m1(ptr_store_vreg_25, vreg_25, vl);
+ __riscv_vsse8_v_u8mf8(ptr_store_vreg_26, 1, vreg_26, vl);
+ vuint32mf2_t idx_53 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ __riscv_vsuxei32_v_i64m1_m(vmask, ptr_store_vreg_memory_3, idx_53, vreg_memory_3, vl);
+ __riscv_vsse64_v_f64m1(ptr_store_vreg_18, 8, vreg_18, vl);
+ __riscv_vsse32_v_i32mf2(ptr_store_vreg_9, 4, vreg_9, vl);
+ vuint32mf2_t idx_55 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ __riscv_vsuxei32_v_f64m1(ptr_store_vreg_19, idx_55, vreg_19, vl);
+ __riscv_vsse8_v_i8mf8(ptr_store_vreg_5, 1, vreg_5, vl);
+ vuint8mf8_t idx_58 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ __riscv_vsoxei8_v_i16mf4_m(vmask, ptr_store_vreg_27, idx_58, vreg_27, vl);
+ __riscv_vse8_v_i8mf8_m(vmask, ptr_store_vreg_memory_60, vreg_memory_60, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_memory_60, vreg_memory_60, vl);
+ vuint32mf2_t idx_61 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 2, vl);
+ __riscv_vsuxei32_v_f32mf2_m(vmask, ptr_store_vreg_4, idx_61, vreg_4, vl);
+ __riscv_vsse8_v_i8mf8_m(vmask, ptr_store_vreg_5, 1, vreg_5, vl);
+ __riscv_vse32_v_u32mf2_m(vmask, ptr_store_vreg_memory_47, vreg_memory_47, vl);
+ __riscv_vse16_v_i16mf4(ptr_store_vreg_29, vreg_29, vl);
+ __riscv_vse8_v_i8mf8_m(vmask, ptr_store_vreg_5, vreg_5, vl);
+ __riscv_vsse32_v_f32mf2(ptr_store_vreg_30, 4, vreg_30, vl);
+ vint8m1_t zero_11 = __riscv_vmv_v_x_i8m1(0, __riscv_vsetvlmax_e8m1());
+ vint8m1_t vstore_tmp_11 = __riscv_vmerge_vxm_i8m1(zero_11, 1, vreg_31, vl);
+ __riscv_vse8_v_i8m1(ptr_store_vreg_31, vstore_tmp_11, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_33, vreg_33, vl);
+ vuint64m1_t idx_64 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 0, vl);
+ __riscv_vsoxei64_v_i8mf8(ptr_store_vreg_34, idx_64, vreg_34, vl);
+ __riscv_vsse32_v_f32mf2_m(vmask, ptr_store_vreg_memory_61, 4, vreg_memory_61, vl);
+ __riscv_vse64_v_i64m1(ptr_store_vreg_memory_16, vreg_memory_16, vl);
+ __riscv_vse64_v_u64m2(ptr_store_vreg_35, vreg_35, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_36, vreg_36, vl);
+
+ ptr_mask += vl;
+ ptr_load_0 += vl;
+ ptr_load_1 += vl;
+ ptr_load_10 += vl;
+ ptr_load_11 += vl;
+ ptr_load_12 += vl;
+ ptr_load_13 += vl;
+ ptr_load_14 += vl;
+ ptr_load_15 += vl;
+ ptr_load_16 += vl;
+ ptr_load_17 += vl;
+ ptr_load_18 += vl;
+ ptr_load_19 += vl;
+ ptr_load_2 += vl;
+ ptr_load_20 += vl;
+ ptr_load_21 += vl;
+ ptr_load_22 += vl;
+ ptr_load_23 += vl;
+ ptr_load_24 += vl;
+ ptr_load_25 += vl;
+ ptr_load_26 += vl;
+ ptr_load_27 += vl;
+ ptr_load_28 += vl;
+ ptr_load_29 += vl;
+ ptr_load_3 += vl;
+ ptr_load_30 += vl;
+ ptr_load_31 += vl;
+ ptr_load_32 += vl;
+ ptr_load_33 += vl;
+ ptr_load_34 += vl;
+ ptr_load_35 += vl;
+ ptr_load_36 += vl;
+ ptr_load_37 += vl;
+ ptr_load_38 += vl;
+ ptr_load_39 += vl;
+ ptr_load_4 += vl;
+ ptr_load_40 += vl;
+ ptr_load_41 += vl;
+ ptr_load_42 += vl;
+ ptr_load_43 += vl;
+ ptr_load_44 += vl;
+ ptr_load_45 += vl;
+ ptr_load_46 += vl;
+ ptr_load_47 += vl;
+ ptr_load_48 += vl;
+ ptr_load_49 += vl;
+ ptr_load_5 += vl;
+ ptr_load_50 += vl;
+ ptr_load_51 += vl;
+ ptr_load_52 += vl;
+ ptr_load_53 += vl;
+ ptr_load_54 += vl;
+ ptr_load_55 += vl;
+ ptr_load_56 += vl;
+ ptr_load_57 += vl;
+ ptr_load_58 += vl;
+ ptr_load_59 += vl;
+ ptr_load_6 += vl;
+ ptr_load_60 += vl;
+ ptr_load_61 += vl;
+ ptr_load_62 += vl;
+ ptr_load_63 += vl;
+ ptr_load_64 += vl;
+ ptr_load_65 += vl;
+ ptr_load_66 += vl;
+ ptr_load_67 += vl;
+ ptr_load_68 += vl;
+ ptr_load_69 += vl;
+ ptr_load_7 += vl;
+ ptr_load_70 += vl;
+ ptr_load_71 += vl;
+ ptr_load_72 += vl;
+ ptr_load_73 += vl;
+ ptr_load_74 += vl;
+ ptr_load_75 += vl;
+ ptr_load_76 += vl;
+ ptr_load_77 += vl;
+ ptr_load_78 += vl;
+ ptr_load_79 += vl;
+ ptr_load_8 += vl;
+ ptr_load_9 += vl;
+ ptr_store_vreg_0 += vl;
+ ptr_store_vreg_1 += vl;
+ ptr_store_vreg_10 += vl;
+ ptr_store_vreg_11 += vl;
+ ptr_store_vreg_12 += vl;
+ ptr_store_vreg_13 += vl;
+ ptr_store_vreg_14 += vl;
+ ptr_store_vreg_15 += vl;
+ ptr_store_vreg_16 += vl;
+ ptr_store_vreg_17 += vl;
+ ptr_store_vreg_18 += vl;
+ ptr_store_vreg_19 += vl;
+ ptr_store_vreg_2 += vl;
+ ptr_store_vreg_20 += vl;
+ ptr_store_vreg_21 += vl;
+ ptr_store_vreg_22 += vl;
+ ptr_store_vreg_23 += vl;
+ ptr_store_vreg_24 += vl;
+ ptr_store_vreg_25 += vl;
+ ptr_store_vreg_26 += vl;
+ ptr_store_vreg_27 += vl;
+ ptr_store_vreg_28 += vl;
+ ptr_store_vreg_29 += vl;
+ ptr_store_vreg_3 += vl;
+ ptr_store_vreg_30 += vl;
+ ptr_store_vreg_31 += vl;
+ ptr_store_vreg_32 += vl;
+ ptr_store_vreg_33 += vl;
+ ptr_store_vreg_34 += vl;
+ ptr_store_vreg_35 += vl;
+ ptr_store_vreg_36 += vl;
+ ptr_store_vreg_37 += vl;
+ ptr_store_vreg_38 += vl;
+ ptr_store_vreg_39 += vl;
+ ptr_store_vreg_4 += vl;
+ ptr_store_vreg_40 += vl;
+ ptr_store_vreg_41 += vl;
+ ptr_store_vreg_42 += vl;
+ ptr_store_vreg_5 += vl;
+ ptr_store_vreg_6 += vl;
+ ptr_store_vreg_7 += vl;
+ ptr_store_vreg_8 += vl;
+ ptr_store_vreg_9 += vl;
+ ptr_store_vreg_memory_1 += vl;
+ ptr_store_vreg_memory_11 += vl;
+ ptr_store_vreg_memory_13 += vl;
+ ptr_store_vreg_memory_16 += vl;
+ ptr_store_vreg_memory_17 += vl;
+ ptr_store_vreg_memory_2 += vl;
+ ptr_store_vreg_memory_21 += vl;
+ ptr_store_vreg_memory_24 += vl;
+ ptr_store_vreg_memory_28 += vl;
+ ptr_store_vreg_memory_3 += vl;
+ ptr_store_vreg_memory_30 += vl;
+ ptr_store_vreg_memory_31 += vl;
+ ptr_store_vreg_memory_33 += vl;
+ ptr_store_vreg_memory_36 += vl;
+ ptr_store_vreg_memory_38 += vl;
+ ptr_store_vreg_memory_40 += vl;
+ ptr_store_vreg_memory_41 += vl;
+ ptr_store_vreg_memory_42 += vl;
+ ptr_store_vreg_memory_47 += vl;
+ ptr_store_vreg_memory_5 += vl;
+ ptr_store_vreg_memory_60 += vl;
+ ptr_store_vreg_memory_61 += vl;
+ ptr_store_vreg_memory_63 += vl;
+ ptr_store_vreg_memory_7 += vl;
+ ptr_store_vreg_memory_72 += vl;
+ ptr_store_vreg_memory_8 += vl;
+ ptr_store_vreg_memory_9 += vl;
+ }
+ return 0;
+}
+
+/* { dg-final { scan-assembler-not "e64,mf4" } } */
--
2.48.1
On Thu, 27 Feb 2025 16:00:08 +0100, "Robin Dapp" wrote:
> Hi,
>
> when merging two vsetvls that both only demand "SEW >= ..." we
> use their maximum SEW and keep the LMUL. That may lead to invalid
> vector configurations like
> e64, mf4.
> As we make sure that the SEW requirements overlap we can use the SEW
> and LMUL of the configuration with the larger SEW.
>
> Ma Jin already touched this merge rule some weeks ago and fixed the
> ratio calculation (r15-6873). Calculating the ratio from an invalid
> SEW/LMUL combination lead to an overflow in the ratio variable, though.
> I'd argue the proper fix is to update SEW and LMUL, keeping the ratio
> as before. This breaks bug-10.c, though, and I'm not sure what it
> really tests. SEW/LMUL actually doesn't change, we just emit a slightly
> different vsetvl. Maybe it was reduced too far? Jin, any insight
> there? I changed it into a run test for now.
>
> Regtested on rv64gcv_zvl512b.
>
> Regards
> Robin
>
> PR target/117955
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (calculate_ratio): Use LMUL of vsetvl
> with larger SEW.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/bug-10.c: Convert to run test.
> * gcc.target/riscv/rvv/base/pr117955.c: New test.
> ---
> gcc/config/riscv/riscv-vsetvl.cc | 8 +-
> .../gcc.target/riscv/rvv/base/bug-10.c | 32 +-
> .../gcc.target/riscv/rvv/base/pr117955.c | 827 ++++++++++++++++++
> 3 files changed, 861 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
> index 82284624a24..f0165f7b8c8 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1729,9 +1729,11 @@ private:
> }
> inline void use_max_sew (vsetvl_info &prev, const vsetvl_info &next)
> {
> - int max_sew = MAX (prev.get_sew (), next.get_sew ());
> - prev.set_sew (max_sew);
> - prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
> + bool prev_sew_larger = prev.get_sew () >= next.get_sew ();
> + const vsetvl_info from = prev_sew_larger ? prev : next;
> + prev.set_sew (from.get_sew ());
> + prev.set_vlmul (from.get_vlmul ());
> + prev.set_ratio (from.get_ratio ());
> use_min_of_max_sew (prev, next);
> }
> inline void use_next_sew_lmul (vsetvl_info &prev, const vsetvl_info &next)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
> index af3a8610d63..5f7490e8a3b 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
> @@ -1,14 +1,40 @@
> -/* { dg-do compile { target { rv64 } } } */
> +/* { dg-do run { target { rv64 } } } */
> +/* { dg-require-effective-target rv64 } */
> +/* { dg-require-effective-target riscv_v } */
> /* { dg-options " -march=rv64gcv_zvfh -mabi=lp64d -O2 --param=vsetvl-strategy=optim -fno-schedule-insns -fno-schedule-insns2 -fno-schedule-fusion " } */
>
> #include <riscv_vector.h>
>
> void
> -foo (uint8_t *ptr, vfloat16m4_t *v1, vuint32m8_t *v2, vuint8m2_t *v3, size_t vl)
> +__attribute__ ((noipa))
> +foo (vfloat16m4_t *v1, vuint32m8_t *v2, vuint8m2_t *v3, size_t vl)
> {
> *v1 = __riscv_vfmv_s_f_f16m4 (1, vl);
> *v2 = __riscv_vmv_s_x_u32m8 (2963090659u, vl);
> *v3 = __riscv_vsll_vx_u8m2 (__riscv_vid_v_u8m2 (vl), 2, vl);
> }
This patch modifies the sequence:
vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,a4,e8,m2,ta,ma
to:
vsetvli zero,a4,e32,m8,ta,ma + vsetvli zero,zero,e8,m2,ta,ma
Functionally, there is no difference. However, this change resolves the
issue with "e64,mf4", and allows the second vsetvli to omit a4, which is
beneficial.
> -/* { dg-final { scan-assembler-not {vsetvli.*zero,zero} } }*/
> +int
> +main ()
> +{
> + vfloat16m4_t v1;
> + vuint32m8_t v2;
> + vuint8m2_t v3;
> + int vl = 4;
> + foo (&v1, &v2, &v3, vl);
> +
> + _Float16 val1 = ((_Float16 *)&v1)[0];
> + if (val1 - 1.0000f > 0.00001f)
> + __builtin_abort ();
> +
> + uint32_t val2 = ((uint32_t *)&v2)[0];
> + if (val2 != 2963090659u)
> + __builtin_abort ();
> +
> + for (int i = 0; i < vl; i++)
> + {
> + uint8_t val = ((uint8_t *)&v3)[i];
> + if (val != i << 2)
> + __builtin_abort ();
> + }
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c
> new file mode 100644
> index 00000000000..49ccb6097d0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr117955.c
> @@ -0,0 +1,827 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvfh -O3" } */
Here are three issues with this test case:
1. The test case does not seem to take effect, as it appears to pass both before and after applying the patch for RV64.
2. Since no mabi is specified, it consistently fails for RV32 with the error: "Excess errors: cc1: error: ABI requires '-march=rv32'."
3. The test case seems to contain a lot of unnecessary code; perhaps we can streamline it.
Best regards,
Jin Ma
> +#include <riscv_vector.h>
> +
> +#define dataLen 100
> +#define isNaNF16UI( a ) (((~(a) & 0x7C00) == 0) && ((a) & 0x03FF))
> +#define isNaNF32UI( a ) (((~(a) & 0x7F800000) == 0) && ((a) & 0x007FFFFF))
> +#define isNaNF64UI( a ) (((~(a) & UINT64_C( 0x7FF0000000000000 )) == 0) && ((a) & UINT64_C( 0x000FFFFFFFFFFFFF )))
> +typedef _Float16 float16_t;
> +typedef float float32_t;
> +typedef double float64_t;
> This patch modifies the sequence:
> vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,a4,e8,m2,ta,ma
> to:
> vsetvli zero,a4,e32,m8,ta,ma + vsetvli zero,zero,e8,m2,ta,ma
> Functionally, there is no difference. However, this change resolves the
> issue with "e64,mf4", and allows the second vsetvli to omit a4, which is
> beneficial.
My question rather was: Why did your test check for the presence of this a4?
Did you see a different issue in an unreduced test apart from what is tested
right now (which seems at least partially wrong)?
> Here are three issues with this test case:
> 1. The test case does not seem to take effect, as it appears to pass both
> before and after applying the patch for RV64.
> 2. Since no mabi is specified, it consistently fails for RV32 with the error:
> "Excess errors: cc1: error: ABI requires '-march=rv32'."
> 3. The test case seems to contain a lot of unnecessary code; perhaps we can
> streamline it.
As referenced in the PR the issue is flaky and only rarely occurs, under
specific circumstances (and is latent on trunk). The test case was already
reduced.
You're right about the missing -mabi of course, I keep forgetting it...
On Fri, 28 Feb 2025 06:47:24 +0100, "Robin Dapp" wrote:
> > This patch modifies the sequence:
> > vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,a4,e8,m2,ta,ma
> > to:
> > vsetvli zero,a4,e32,m8,ta,ma + vsetvli zero,zero,e8,m2,ta,ma
> > Functionally, there is no difference. However, this change resolves the
> > issue with "e64,mf4", and allows the second vsetvli to omit a4, which is
> > beneficial.
>
> My question rather was: Why did your test check for the presence of this a4?
> Did you see a different issue in an unreduced test apart from what is tested
> right now (which seems at least partially wrong)?
Okay, let me explain the background of my previous patch.
Prior to applying my patch, for the test case bug-10.c (a reduced example of a larger program with incorrect runtime results),
the vsetvli sequence compiled with --param=vsetvl-strategy=simple was as follows:
1. vsetvli zero,a4,e16,m4,ta,ma + vsetvli zero,a4,e32,m8,ta,ma + vsetvli zero,a4,e8,m2,ta,ma
The vsetvli sequence compiled with --param=vsetvl-strategy=optim was as follows:
2. vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,zero,e8,m2,ta,ma
Although vl remains unchanged, the SEW/LMUL ratio in sequence 2 changes, leading to undefined behavior.
The RVV specification includes the following content related to this:
6.2. AVL encoding
...
When rs1=x0 and rd=x0, the instruction operates as if the current vector length in vl is used as the AVL, and the resulting value is written to vl, but not to a destination register. This form can only be used when VLMAX and hence vl is not actually changed by the new SEW/LMUL ratio. Use of the instruction with a new SEW/LMUL ratio that would result in a change of VLMAX is reserved. Implementations may set vill in this case.
Ref:
https://github.com/riscvarchive/riscv-v-spec/releases/tag/v1.0
> > Here are three issues with this test case:
> > 1. The test case does not seem to take effect, as it appears to pass both
> > before and after applying the patch for RV64.
> > 2. Since no mabi is specified, it consistently fails for RV32 with the error:
> > "Excess errors: cc1: error: ABI requires '-march=rv32'."
> > 3. The test case seems to contain a lot of unnecessary code; perhaps we can
> > streamline it.
Perhaps you could give this a try.
/* { dg-do compile { target { rv64 } } } */
/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3" } */
#include <riscv_vector.h>
_Float16 a (uint64_t);
int8_t b () {
int c = 100;
double *d;
_Float16 *e;
for (size_t f;; c -= f)
{
f = c;
__riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 2, f);
vfloat16mf4_t g;
a (1);
g = __riscv_vfmv_s_f_f16mf4 (2, f);
vfloat64m1_t i = __riscv_vfmv_s_f_f64m1 (30491, f);
vuint16mf4_t j;
__riscv_vsoxei16_v_f16mf4 (e, j, g, f);
vuint8mf8_t k = __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 3, f);
__riscv_vsoxei8_v_f64m1 (d, k, i, f);
}
}
/* { dg-final { scan-assembler-not "e64,mf4" } } */
Best regards,
Jin Ma
> As referenced in the PR the issue is flaky and only rarely occurs, under
> specific circumstances (and is latent on trunk). The test case was already
> reduced.
>
> You're right about the missing -mabi of course, I keep forgetting it...
>
> --
> Regards
> Robin
> It seems the issue is we didn't set "vlmul" ?
>
> Can we do that:
>
> int max_sew = MAX (prev.get_sew (), next.get_sew ());
> prev.set_sew (max_sew);
> prev.set_vlmul (calculate_vlmul (...));
> prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
What we could do is
prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
prev.set_vlmul (calculate_vlmul (prev.get_sew (), prev.get_ratio ()));
Ratio needs to be corrected first according to old LMUL and new SEW. Then we
can adjust LMUL. Otherwise we'd set the wrong LMUL according to the old ratio.
But I find the recalculation more confusing than just re-using the values from
the vsetvl with larger SEW.
> Okay, let me explain the background of my previous patch.
>
> Prior to applying my patch, for the test case bug-10.c (a reduced example of
> a larger program with incorrect runtime results),
> the vsetvli sequence compiled with --param=vsetvl-strategy=simple was as
> follows:
> 1. vsetvli zero,a4,e16,m4,ta,ma + vsetvli zero,a4,e32,m8,ta,ma + vsetvli
> zero,a4,e8,m2,ta,ma
>
> The vsetvli sequence compiled with --param=vsetvl-strategy=optim was as
> follows:
> 2. vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,zero,e8,m2,ta,ma >
> Although vl remains unchanged, the SEW/LMUL ratio in sequence 2 changes,
> leading to undefined behavior.
The only difference I see with your patch vs without is
< vsetvli zero,zero,e8,m2,ta,ma
---
> vsetvli zero,a3,e8,m2,ta,ma
and we ensure the former doesn't occur in the test.
But that difference doesn't matter because the ratio is the same before and
after. That's why I'm asking. bug-10.c as is doesn't test anything reasonable
IMHO. Right, the ratio (or rather the associated LMUL) was wrong but the
current test doesn't make sure it isn't. Can you share the non-reduced (or
less reduced) case?
> /* { dg-do compile { target { rv64 } } } */
> /* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3" } */
>
> #include <riscv_vector.h>
>
> _Float16 a (uint64_t);
> int8_t b () {
> int c = 100;
> double *d;
> _Float16 *e;
> for (size_t f;; c -= f)
> {
> f = c;
> __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 2, f);
> vfloat16mf4_t g;
> a (1);
> g = __riscv_vfmv_s_f_f16mf4 (2, f);
> vfloat64m1_t i = __riscv_vfmv_s_f_f64m1 (30491, f);
> vuint16mf4_t j;
> __riscv_vsoxei16_v_f16mf4 (e, j, g, f);
> vuint8mf8_t k = __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 3, f);
> __riscv_vsoxei8_v_f64m1 (d, k, i, f);
> }
> }
>
> /* { dg-final { scan-assembler-not "e64,mf4" } } */
That works, thanks.
> What we could do is
>
> prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
> prev.set_vlmul (calculate_vlmul (prev.get_sew (), prev.get_ratio ()));
No, that also doesn't work because the ratio can be invalid then.
We fuse two vsetvls. One of them has a larger SEW which we use. Then
we either
- Keep the old ratio and calculate a new LMUL. That might lead to invalid
LMUL.
- Keep the old LMUL and calculate a new ratio. That might lead to an invalid
ratio.
I think the easiest way out is to copy the vsetvl with the larger SEW.
On Fri, 28 Feb 2025 12:48:36 +0100, "Robin Dapp" wrote:
> > Okay, let me explain the background of my previous patch.
> >
> > Prior to applying my patch, for the test case bug-10.c (a reduced example of
> > a larger program with incorrect runtime results),
> > the vsetvli sequence compiled with --param=vsetvl-strategy=simple was as
> > follows:
> > 1. vsetvli zero,a4,e16,m4,ta,ma + vsetvli zero,a4,e32,m8,ta,ma + vsetvli
> > zero,a4,e8,m2,ta,ma
> >
> > The vsetvli sequence compiled with --param=vsetvl-strategy=optim was as
> > follows:
> > 2. vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,zero,e8,m2,ta,ma >
> > Although vl remains unchanged, the SEW/LMUL ratio in sequence 2 changes,
> > leading to undefined behavior.
>
> The only difference I see with your patch vs without is
>
> < vsetvli zero,zero,e8,m2,ta,ma
> ---
> > vsetvli zero,a3,e8,m2,ta,ma
>
> and we ensure the former doesn't occur in the test.
>
> But that difference doesn't matter because the ratio is the same before and
> after. That's why I'm asking. bug-10.c as is doesn't test anything reasonable
> IMHO. Right, the ratio (or rather the associated LMUL) was wrong but the
> current test doesn't make sure it isn't. Can you share the non-reduced (or
> less reduced) case?
Hi Robin,
I apologize for the delayed response. I spent quite a bit of time trying to reproduce
the case, and given the passage of time, it wasn't easy to refine the testing.
Fortunately, you can see the results here.
https://godbolt.org/z/Mc8veW7oT
Using GCC version 14.2.0 should allow you to replicate the issue. If all goes as
expected, you will encounter a "Segmentation fault (core dumped)."
By disassembling the binary, you'll notice the presence of "vsetvli zero,zero,e32,m4,ta,ma",
which is where the problem lies, just as I mentioned previously.
Best regards,
Jin Ma
> > /* { dg-do compile { target { rv64 } } } */
> > /* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3" } */
> >
> > #include <riscv_vector.h>
> >
> > _Float16 a (uint64_t);
> > int8_t b () {
> > int c = 100;
> > double *d;
> > _Float16 *e;
> > for (size_t f;; c -= f)
> > {
> > f = c;
> > __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 2, f);
> > vfloat16mf4_t g;
> > a (1);
> > g = __riscv_vfmv_s_f_f16mf4 (2, f);
> > vfloat64m1_t i = __riscv_vfmv_s_f_f64m1 (30491, f);
> > vuint16mf4_t j;
> > __riscv_vsoxei16_v_f16mf4 (e, j, g, f);
> > vuint8mf8_t k = __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 3, f);
> > __riscv_vsoxei8_v_f64m1 (d, k, i, f);
> > }
> > }
> >
> > /* { dg-final { scan-assembler-not "e64,mf4" } } */
>
> That works, thanks.
>
> --
> Regards
> Robin
Hi Jin,
> I apologize for the delayed response. I spent quite a bit of time trying to
> reproduce
> the case, and given the passage of time, it wasn't easy to refine the testing.
> Fortunately, you can see the results here.
>
> https://godbolt.org/z/Mc8veW7oT
>
> Using GCC version 14.2.0 should allow you to replicate the issue. If all goes as
> expected, you will encounter a "Segmentation fault (core dumped)."
> By disassembling the binary, you'll notice the presence of "vsetvli
> zero,zero,e32,m4,ta,ma",
> which is where the problem lies, just as I mentioned previously.
Thanks for the full example, this is helpful but it still required some more
digging on my side.
I realize now how you came to your conclusion and why you wrote the test that
way.
In QEMU there is a segfault (but no illegal instruction) on
vloxei16.v v8,(t2),v8.
I'm not really sure why yet because vtype and vl are OK. Most likely we
corrupted something before.
On the BPI, however, the more useful information is a SIGILL on
vfmv.s.f v12,fa5.
That's because we do
vsetvli zero,zero,e16,m4,ta,ma
...
vsetvli zero,zero,e8,m2,ta,ma
...
vsetvli zero,zero,e32,m4,ta,ma
The last vsetvl changes the SEW without adjusting LMUL, so changing SEW/LMUL
and VLMAX which is not allowed for vsetvl zero, zero, ...
Subsequently, VILL is set and we SIGILL on the next vector instruction.
So I'd say QEMU should emit a SIGILL here. I'm preparing a QEMU patch for
that.
With your patch the last vsetvl becomes
vsetvli zero,a5,e32,m4,ta,ma
When setting a new VL it is permitted to change SEW/LMUL even though it's not
desirable as we don't need to change VL.
With my patch the vsetvl becomes
vsetvli zero,zero,e32,m8,ta,ma
which doesn't change SEW/LMUL or VL and I think that's what we want.
As summary: Before your patch we only changed SEW which caused a VLMAX-changing
vsetvl, your patch works around that by adjusting the ratio without touching
LMUL. But that still leaves us with PR117955 where the ratio is "correct" but
invalid.
Given all this, I think the only way to fix this is to re-use/copy the vsetvl
information from either prev or next (as in my patch).
I'm going to change bug-10.c into a run test, add the test you last provided as
a run test as well as well as your reduced test from PR117955.
On Wed, 05 Mar 2025 12:17:24 +0100, "Robin Dapp" wrote:
> Hi Jin,
>
> > I apologize for the delayed response. I spent quite a bit of time trying to
> > reproduce
> > the case, and given the passage of time, it wasn't easy to refine the testing.
> > Fortunately, you can see the results here.
> >
> > https://godbolt.org/z/Mc8veW7oT
> >
> > Using GCC version 14.2.0 should allow you to replicate the issue. If all goes as
> > expected, you will encounter a "Segmentation fault (core dumped)."
> > By disassembling the binary, you'll notice the presence of "vsetvli
> > zero,zero,e32,m4,ta,ma",
> > which is where the problem lies, just as I mentioned previously.
>
> Thanks for the full example, this is helpful but it still required some more
> digging on my side.
>
> I realize now how you came to your conclusion and why you wrote the test that
> way.
>
> In QEMU there is a segfault (but no illegal instruction) on
> vloxei16.v v8,(t2),v8.
> I'm not really sure why yet because vtype and vl are OK. Most likely we
> corrupted something before.
>
> On the BPI, however, the more useful information is a SIGILL on
> vfmv.s.f v12,fa5.
>
> That's because we do
> vsetvli zero,zero,e16,m4,ta,ma
> ...
> vsetvli zero,zero,e8,m2,ta,ma
> ...
> vsetvli zero,zero,e32,m4,ta,ma
>
> The last vsetvl changes the SEW without adjusting LMUL, so changing SEW/LMUL
> and VLMAX which is not allowed for vsetvl zero, zero, ...
> Subsequently, VILL is set and we SIGILL on the next vector instruction.
>
> So I'd say QEMU should emit a SIGILL here. I'm preparing a QEMU patch for
> that.
>
> With your patch the last vsetvl becomes
> vsetvli zero,a5,e32,m4,ta,ma
> When setting a new VL it is permitted to change SEW/LMUL even though it's not
> desirable as we don't need to change VL.
>
> With my patch the vsetvl becomes
> vsetvli zero,zero,e32,m8,ta,ma
> which doesn't change SEW/LMUL or VL and I think that's what we want.
>
> As summary: Before your patch we only changed SEW which caused a VLMAX-changing
> vsetvl, your patch works around that by adjusting the ratio without touching
> LMUL. But that still leaves us with PR117955 where the ratio is "correct" but
> invalid.
>
> Given all this, I think the only way to fix this is to re-use/copy the vsetvl
> information from either prev or next (as in my patch).
>
> I'm going to change bug-10.c into a run test, add the test you last provided as
> a run test as well as well as your reduced test from PR117955.
LGTM :)
Best regards,
Jin Ma
> --
> Regards
> Robin
@@ -1729,9 +1729,11 @@ private:
}
inline void use_max_sew (vsetvl_info &prev, const vsetvl_info &next)
{
- int max_sew = MAX (prev.get_sew (), next.get_sew ());
- prev.set_sew (max_sew);
- prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
+ bool prev_sew_larger = prev.get_sew () >= next.get_sew ();
+ const vsetvl_info from = prev_sew_larger ? prev : next;
+ prev.set_sew (from.get_sew ());
+ prev.set_vlmul (from.get_vlmul ());
+ prev.set_ratio (from.get_ratio ());
use_min_of_max_sew (prev, next);
}
inline void use_next_sew_lmul (vsetvl_info &prev, const vsetvl_info &next)
@@ -1,14 +1,40 @@
-/* { dg-do compile { target { rv64 } } } */
+/* { dg-do run { target { rv64 } } } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-require-effective-target riscv_v } */
/* { dg-options " -march=rv64gcv_zvfh -mabi=lp64d -O2 --param=vsetvl-strategy=optim -fno-schedule-insns -fno-schedule-insns2 -fno-schedule-fusion " } */
#include <riscv_vector.h>
void
-foo (uint8_t *ptr, vfloat16m4_t *v1, vuint32m8_t *v2, vuint8m2_t *v3, size_t vl)
+__attribute__ ((noipa))
+foo (vfloat16m4_t *v1, vuint32m8_t *v2, vuint8m2_t *v3, size_t vl)
{
*v1 = __riscv_vfmv_s_f_f16m4 (1, vl);
*v2 = __riscv_vmv_s_x_u32m8 (2963090659u, vl);
*v3 = __riscv_vsll_vx_u8m2 (__riscv_vid_v_u8m2 (vl), 2, vl);
}
-/* { dg-final { scan-assembler-not {vsetvli.*zero,zero} } }*/
+int
+main ()
+{
+ vfloat16m4_t v1;
+ vuint32m8_t v2;
+ vuint8m2_t v3;
+ int vl = 4;
+ foo (&v1, &v2, &v3, vl);
+
+ _Float16 val1 = ((_Float16 *)&v1)[0];
+ if (val1 - 1.0000f > 0.00001f)
+ __builtin_abort ();
+
+ uint32_t val2 = ((uint32_t *)&v2)[0];
+ if (val2 != 2963090659u)
+ __builtin_abort ();
+
+ for (int i = 0; i < vl; i++)
+ {
+ uint8_t val = ((uint8_t *)&v3)[i];
+ if (val != i << 2)
+ __builtin_abort ();
+ }
+}
new file mode 100644
@@ -0,0 +1,827 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -O3" } */
+
+#include <riscv_vector.h>
+
+#define dataLen 100
+#define isNaNF16UI( a ) (((~(a) & 0x7C00) == 0) && ((a) & 0x03FF))
+#define isNaNF32UI( a ) (((~(a) & 0x7F800000) == 0) && ((a) & 0x007FFFFF))
+#define isNaNF64UI( a ) (((~(a) & UINT64_C( 0x7FF0000000000000 )) == 0) && ((a) & UINT64_C( 0x000FFFFFFFFFFFFF )))
+typedef _Float16 float16_t;
+typedef float float32_t;
+typedef double float64_t;
+
+float16_t convert_binary_u16_f16(uint16_t u16){
+ union { float16_t f16; uint16_t u16; } converter;
+ converter.u16 = u16;
+ if(isNaNF16UI(converter.u16)) return 0;
+ return converter.f16;
+}
+float32_t convert_binary_u32_f32(uint32_t u32){
+ union { float32_t f32; uint32_t u32; } converter;
+ converter.u32 = u32;
+ if(isNaNF32UI(converter.u32)) return 0;
+ return converter.f32;
+}
+float64_t convert_binary_u64_f64(uint64_t u64){
+ union { float64_t f64; uint64_t u64; } converter;
+ converter.u64 = u64;
+ if(isNaNF64UI(converter.u64)) return 0;
+ return converter.f64;
+}
+
+int8_t data_mask[dataLen];
+int64_t data_load_0[dataLen];
+uint16_t data_load_1[dataLen];
+int8_t data_load_10[dataLen];
+float16_t data_load_11[dataLen];
+uint64_t data_load_12[dataLen];
+int16_t data_load_13[dataLen];
+uint8_t data_load_14[dataLen];
+uint8_t data_load_15[dataLen];
+int64_t data_load_16[dataLen];
+float32_t data_load_17[dataLen];
+int32_t data_load_18[dataLen];
+int64_t data_load_19[dataLen];
+uint16_t data_load_2[dataLen];
+uint8_t data_load_20[dataLen];
+int32_t data_load_21[dataLen];
+int32_t data_load_22[dataLen];
+uint32_t data_load_23[dataLen];
+float16_t data_load_24[dataLen];
+int64_t data_load_25[dataLen];
+int16_t data_load_26[dataLen];
+int16_t data_load_27[dataLen];
+int16_t data_load_28[dataLen];
+float32_t data_load_29[dataLen];
+int64_t data_load_3[dataLen];
+float64_t data_load_30[dataLen];
+uint8_t data_load_31[dataLen];
+float16_t data_load_32[dataLen];
+int32_t data_load_33[dataLen];
+int32_t data_load_34[dataLen];
+int16_t data_load_35[dataLen];
+uint16_t data_load_36[dataLen];
+uint64_t data_load_37[dataLen];
+uint64_t data_load_38[dataLen];
+float64_t data_load_39[dataLen];
+float16_t data_load_4[dataLen];
+float64_t data_load_40[dataLen];
+int8_t data_load_41[dataLen];
+uint64_t data_load_42[dataLen];
+uint64_t data_load_43[dataLen];
+int8_t data_load_44[dataLen];
+int8_t data_load_45[dataLen];
+int8_t data_load_46[dataLen];
+uint32_t data_load_47[dataLen];
+uint64_t data_load_48[dataLen];
+int16_t data_load_49[dataLen];
+uint16_t data_load_5[dataLen];
+uint16_t data_load_50[dataLen];
+uint16_t data_load_51[dataLen];
+uint16_t data_load_52[dataLen];
+uint16_t data_load_53[dataLen];
+int16_t data_load_54[dataLen];
+int64_t data_load_55[dataLen];
+float64_t data_load_56[dataLen];
+float32_t data_load_57[dataLen];
+int16_t data_load_58[dataLen];
+int16_t data_load_59[dataLen];
+uint16_t data_load_6[dataLen];
+int8_t data_load_60[dataLen];
+float32_t data_load_61[dataLen];
+int32_t data_load_62[dataLen];
+uint32_t data_load_63[dataLen];
+int16_t data_load_64[dataLen];
+uint32_t data_load_65[dataLen];
+uint8_t data_load_66[dataLen];
+uint64_t data_load_67[dataLen];
+int8_t data_load_68[dataLen];
+float32_t data_load_69[dataLen];
+uint8_t data_load_7[dataLen];
+int16_t data_load_70[dataLen];
+int16_t data_load_71[dataLen];
+uint32_t data_load_72[dataLen];
+uint32_t data_load_73[dataLen];
+int32_t data_load_74[dataLen];
+int16_t data_load_75[dataLen];
+int8_t data_load_76[dataLen];
+float32_t data_load_77[dataLen];
+uint8_t data_load_78[dataLen];
+int8_t data_load_79[dataLen];
+float32_t data_load_8[dataLen];
+float16_t data_load_9[dataLen];
+int32_t data_store_vreg_0[dataLen];
+int8_t data_store_vreg_1[dataLen];
+int16_t data_store_vreg_10[dataLen];
+uint32_t data_store_vreg_11[dataLen];
+int32_t data_store_vreg_12[dataLen];
+uint32_t data_store_vreg_13[dataLen];
+int64_t data_store_vreg_14[dataLen];
+float32_t data_store_vreg_15[dataLen];
+int32_t data_store_vreg_16[dataLen];
+float32_t data_store_vreg_17[dataLen];
+float64_t data_store_vreg_18[dataLen];
+float64_t data_store_vreg_19[dataLen];
+float16_t data_store_vreg_2[dataLen];
+int8_t data_store_vreg_20[dataLen];
+float64_t data_store_vreg_21[dataLen];
+int16_t data_store_vreg_22[dataLen];
+uint16_t data_store_vreg_23[dataLen];
+int8_t data_store_vreg_24[dataLen];
+int16_t data_store_vreg_25[dataLen];
+uint8_t data_store_vreg_26[dataLen];
+int16_t data_store_vreg_27[dataLen];
+int32_t data_store_vreg_28[dataLen];
+int16_t data_store_vreg_29[dataLen];
+uint16_t data_store_vreg_3[dataLen];
+float32_t data_store_vreg_30[dataLen];
+int8_t data_store_vreg_31[dataLen];
+uint64_t data_store_vreg_32[dataLen];
+int8_t data_store_vreg_33[dataLen];
+int8_t data_store_vreg_34[dataLen];
+uint64_t data_store_vreg_35[dataLen];
+int8_t data_store_vreg_36[dataLen];
+uint16_t data_store_vreg_37[dataLen];
+int64_t data_store_vreg_38[dataLen];
+float16_t data_store_vreg_39[dataLen];
+float32_t data_store_vreg_4[dataLen];
+float16_t data_store_vreg_40[dataLen];
+uint8_t data_store_vreg_41[dataLen];
+uint16_t data_store_vreg_42[dataLen];
+int8_t data_store_vreg_5[dataLen];
+float64_t data_store_vreg_6[dataLen];
+int8_t data_store_vreg_7[dataLen];
+int8_t data_store_vreg_8[dataLen];
+int32_t data_store_vreg_9[dataLen];
+uint16_t data_store_vreg_memory_1[dataLen];
+float16_t data_store_vreg_memory_11[dataLen];
+int16_t data_store_vreg_memory_13[dataLen];
+int64_t data_store_vreg_memory_16[dataLen];
+float32_t data_store_vreg_memory_17[dataLen];
+uint16_t data_store_vreg_memory_2[dataLen];
+int32_t data_store_vreg_memory_21[dataLen];
+float16_t data_store_vreg_memory_24[dataLen];
+int16_t data_store_vreg_memory_28[dataLen];
+int64_t data_store_vreg_memory_3[dataLen];
+float64_t data_store_vreg_memory_30[dataLen];
+uint8_t data_store_vreg_memory_31[dataLen];
+int32_t data_store_vreg_memory_33[dataLen];
+uint16_t data_store_vreg_memory_36[dataLen];
+uint64_t data_store_vreg_memory_38[dataLen];
+float64_t data_store_vreg_memory_40[dataLen];
+int8_t data_store_vreg_memory_41[dataLen];
+uint64_t data_store_vreg_memory_42[dataLen];
+uint32_t data_store_vreg_memory_47[dataLen];
+uint16_t data_store_vreg_memory_5[dataLen];
+int8_t data_store_vreg_memory_60[dataLen];
+float32_t data_store_vreg_memory_61[dataLen];
+uint32_t data_store_vreg_memory_63[dataLen];
+uint8_t data_store_vreg_memory_7[dataLen];
+uint32_t data_store_vreg_memory_72[dataLen];
+float32_t data_store_vreg_memory_8[dataLen];
+float16_t data_store_vreg_memory_9[dataLen];
+
+
+int main(){
+ int avl1 = dataLen;
+ int8_t* ptr_mask = data_mask;
+ int64_t* ptr_load_0 = data_load_0;
+ uint16_t* ptr_load_1 = data_load_1;
+ int8_t* ptr_load_10 = data_load_10;
+ float16_t* ptr_load_11 = data_load_11;
+ uint64_t* ptr_load_12 = data_load_12;
+ int16_t* ptr_load_13 = data_load_13;
+ uint8_t* ptr_load_14 = data_load_14;
+ uint8_t* ptr_load_15 = data_load_15;
+ int64_t* ptr_load_16 = data_load_16;
+ float32_t* ptr_load_17 = data_load_17;
+ int32_t* ptr_load_18 = data_load_18;
+ int64_t* ptr_load_19 = data_load_19;
+ uint16_t* ptr_load_2 = data_load_2;
+ uint8_t* ptr_load_20 = data_load_20;
+ int32_t* ptr_load_21 = data_load_21;
+ int32_t* ptr_load_22 = data_load_22;
+ uint32_t* ptr_load_23 = data_load_23;
+ float16_t* ptr_load_24 = data_load_24;
+ int64_t* ptr_load_25 = data_load_25;
+ int16_t* ptr_load_26 = data_load_26;
+ int16_t* ptr_load_27 = data_load_27;
+ int16_t* ptr_load_28 = data_load_28;
+ float32_t* ptr_load_29 = data_load_29;
+ int64_t* ptr_load_3 = data_load_3;
+ float64_t* ptr_load_30 = data_load_30;
+ uint8_t* ptr_load_31 = data_load_31;
+ float16_t* ptr_load_32 = data_load_32;
+ int32_t* ptr_load_33 = data_load_33;
+ int32_t* ptr_load_34 = data_load_34;
+ int16_t* ptr_load_35 = data_load_35;
+ uint16_t* ptr_load_36 = data_load_36;
+ uint64_t* ptr_load_37 = data_load_37;
+ uint64_t* ptr_load_38 = data_load_38;
+ float64_t* ptr_load_39 = data_load_39;
+ float16_t* ptr_load_4 = data_load_4;
+ float64_t* ptr_load_40 = data_load_40;
+ int8_t* ptr_load_41 = data_load_41;
+ uint64_t* ptr_load_42 = data_load_42;
+ uint64_t* ptr_load_43 = data_load_43;
+ int8_t* ptr_load_44 = data_load_44;
+ int8_t* ptr_load_45 = data_load_45;
+ int8_t* ptr_load_46 = data_load_46;
+ uint32_t* ptr_load_47 = data_load_47;
+ uint64_t* ptr_load_48 = data_load_48;
+ int16_t* ptr_load_49 = data_load_49;
+ uint16_t* ptr_load_5 = data_load_5;
+ uint16_t* ptr_load_50 = data_load_50;
+ uint16_t* ptr_load_51 = data_load_51;
+ uint16_t* ptr_load_52 = data_load_52;
+ uint16_t* ptr_load_53 = data_load_53;
+ int16_t* ptr_load_54 = data_load_54;
+ int64_t* ptr_load_55 = data_load_55;
+ float64_t* ptr_load_56 = data_load_56;
+ float32_t* ptr_load_57 = data_load_57;
+ int16_t* ptr_load_58 = data_load_58;
+ int16_t* ptr_load_59 = data_load_59;
+ uint16_t* ptr_load_6 = data_load_6;
+ int8_t* ptr_load_60 = data_load_60;
+ float32_t* ptr_load_61 = data_load_61;
+ int32_t* ptr_load_62 = data_load_62;
+ uint32_t* ptr_load_63 = data_load_63;
+ int16_t* ptr_load_64 = data_load_64;
+ uint32_t* ptr_load_65 = data_load_65;
+ uint8_t* ptr_load_66 = data_load_66;
+ uint64_t* ptr_load_67 = data_load_67;
+ int8_t* ptr_load_68 = data_load_68;
+ float32_t* ptr_load_69 = data_load_69;
+ uint8_t* ptr_load_7 = data_load_7;
+ int16_t* ptr_load_70 = data_load_70;
+ int16_t* ptr_load_71 = data_load_71;
+ uint32_t* ptr_load_72 = data_load_72;
+ uint32_t* ptr_load_73 = data_load_73;
+ int32_t* ptr_load_74 = data_load_74;
+ int16_t* ptr_load_75 = data_load_75;
+ int8_t* ptr_load_76 = data_load_76;
+ float32_t* ptr_load_77 = data_load_77;
+ uint8_t* ptr_load_78 = data_load_78;
+ int8_t* ptr_load_79 = data_load_79;
+ float32_t* ptr_load_8 = data_load_8;
+ float16_t* ptr_load_9 = data_load_9;
+ int32_t* ptr_store_vreg_0 = data_store_vreg_0;
+ int8_t* ptr_store_vreg_1 = data_store_vreg_1;
+ int16_t* ptr_store_vreg_10 = data_store_vreg_10;
+ uint32_t* ptr_store_vreg_11 = data_store_vreg_11;
+ int32_t* ptr_store_vreg_12 = data_store_vreg_12;
+ uint32_t* ptr_store_vreg_13 = data_store_vreg_13;
+ int64_t* ptr_store_vreg_14 = data_store_vreg_14;
+ float32_t* ptr_store_vreg_15 = data_store_vreg_15;
+ int32_t* ptr_store_vreg_16 = data_store_vreg_16;
+ float32_t* ptr_store_vreg_17 = data_store_vreg_17;
+ float64_t* ptr_store_vreg_18 = data_store_vreg_18;
+ float64_t* ptr_store_vreg_19 = data_store_vreg_19;
+ float16_t* ptr_store_vreg_2 = data_store_vreg_2;
+ int8_t* ptr_store_vreg_20 = data_store_vreg_20;
+ float64_t* ptr_store_vreg_21 = data_store_vreg_21;
+ int16_t* ptr_store_vreg_22 = data_store_vreg_22;
+ uint16_t* ptr_store_vreg_23 = data_store_vreg_23;
+ int8_t* ptr_store_vreg_24 = data_store_vreg_24;
+ int16_t* ptr_store_vreg_25 = data_store_vreg_25;
+ uint8_t* ptr_store_vreg_26 = data_store_vreg_26;
+ int16_t* ptr_store_vreg_27 = data_store_vreg_27;
+ int32_t* ptr_store_vreg_28 = data_store_vreg_28;
+ int16_t* ptr_store_vreg_29 = data_store_vreg_29;
+ uint16_t* ptr_store_vreg_3 = data_store_vreg_3;
+ float32_t* ptr_store_vreg_30 = data_store_vreg_30;
+ int8_t* ptr_store_vreg_31 = data_store_vreg_31;
+ uint64_t* ptr_store_vreg_32 = data_store_vreg_32;
+ int8_t* ptr_store_vreg_33 = data_store_vreg_33;
+ int8_t* ptr_store_vreg_34 = data_store_vreg_34;
+ uint64_t* ptr_store_vreg_35 = data_store_vreg_35;
+ int8_t* ptr_store_vreg_36 = data_store_vreg_36;
+ uint16_t* ptr_store_vreg_37 = data_store_vreg_37;
+ int64_t* ptr_store_vreg_38 = data_store_vreg_38;
+ float16_t* ptr_store_vreg_39 = data_store_vreg_39;
+ float32_t* ptr_store_vreg_4 = data_store_vreg_4;
+ float16_t* ptr_store_vreg_40 = data_store_vreg_40;
+ uint8_t* ptr_store_vreg_41 = data_store_vreg_41;
+ uint16_t* ptr_store_vreg_42 = data_store_vreg_42;
+ int8_t* ptr_store_vreg_5 = data_store_vreg_5;
+ float64_t* ptr_store_vreg_6 = data_store_vreg_6;
+ int8_t* ptr_store_vreg_7 = data_store_vreg_7;
+ int8_t* ptr_store_vreg_8 = data_store_vreg_8;
+ int32_t* ptr_store_vreg_9 = data_store_vreg_9;
+ uint16_t* ptr_store_vreg_memory_1 = data_store_vreg_memory_1;
+ float16_t* ptr_store_vreg_memory_11 = data_store_vreg_memory_11;
+ int16_t* ptr_store_vreg_memory_13 = data_store_vreg_memory_13;
+ int64_t* ptr_store_vreg_memory_16 = data_store_vreg_memory_16;
+ float32_t* ptr_store_vreg_memory_17 = data_store_vreg_memory_17;
+ uint16_t* ptr_store_vreg_memory_2 = data_store_vreg_memory_2;
+ int32_t* ptr_store_vreg_memory_21 = data_store_vreg_memory_21;
+ float16_t* ptr_store_vreg_memory_24 = data_store_vreg_memory_24;
+ int16_t* ptr_store_vreg_memory_28 = data_store_vreg_memory_28;
+ int64_t* ptr_store_vreg_memory_3 = data_store_vreg_memory_3;
+ float64_t* ptr_store_vreg_memory_30 = data_store_vreg_memory_30;
+ uint8_t* ptr_store_vreg_memory_31 = data_store_vreg_memory_31;
+ int32_t* ptr_store_vreg_memory_33 = data_store_vreg_memory_33;
+ uint16_t* ptr_store_vreg_memory_36 = data_store_vreg_memory_36;
+ uint64_t* ptr_store_vreg_memory_38 = data_store_vreg_memory_38;
+ float64_t* ptr_store_vreg_memory_40 = data_store_vreg_memory_40;
+ int8_t* ptr_store_vreg_memory_41 = data_store_vreg_memory_41;
+ uint64_t* ptr_store_vreg_memory_42 = data_store_vreg_memory_42;
+ uint32_t* ptr_store_vreg_memory_47 = data_store_vreg_memory_47;
+ uint16_t* ptr_store_vreg_memory_5 = data_store_vreg_memory_5;
+ int8_t* ptr_store_vreg_memory_60 = data_store_vreg_memory_60;
+ float32_t* ptr_store_vreg_memory_61 = data_store_vreg_memory_61;
+ uint32_t* ptr_store_vreg_memory_63 = data_store_vreg_memory_63;
+ uint8_t* ptr_store_vreg_memory_7 = data_store_vreg_memory_7;
+ uint32_t* ptr_store_vreg_memory_72 = data_store_vreg_memory_72;
+ float32_t* ptr_store_vreg_memory_8 = data_store_vreg_memory_8;
+ float16_t* ptr_store_vreg_memory_9 = data_store_vreg_memory_9;
+ for (size_t vl; avl1 > 0; avl1 -= vl){
+ vl = __riscv_vsetvl_e64m1(avl1);
+ vint8mf8_t mask_value= __riscv_vle8_v_i8mf8(ptr_mask, vl);
+ vbool64_t vmask= __riscv_vmseq_vx_i8mf8_b64(mask_value, 1, vl);
+ vint64m4_t vreg_memory_0 = __riscv_vle64_v_i64m4(ptr_load_0, vl);
+ vuint32mf2_t idx_0 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ vuint16mf4_t vreg_memory_1 = __riscv_vluxei32_v_u16mf4_m(vmask, ptr_load_1, idx_0, vl);
+ vuint16mf4_t idx_1 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 1, vl);
+ vuint16mf4_t vreg_memory_2 = __riscv_vluxei16_v_u16mf4(ptr_load_2, idx_1, vl);
+ vuint32mf2_t idx_2 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ vint64m1_t vreg_memory_3 = __riscv_vluxei32_v_i64m1(ptr_load_3, idx_2, vl);
+ vfloat16m1_t vreg_memory_4 = __riscv_vle16_v_f16m1(ptr_load_4, vl);
+ vuint16mf4_t vreg_memory_5 = __riscv_vle16_v_u16mf4_m(vmask, ptr_load_5, vl);
+ vuint16mf4_t vreg_memory_6 = __riscv_vle16_v_u16mf4(ptr_load_6, vl);
+ vuint16mf4_t idx_5 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 0, vl);
+ vuint8mf8_t vreg_memory_7 = __riscv_vluxei16_v_u8mf8(ptr_load_7, idx_5, vl);
+ vuint8mf8_t idx_8 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 2, vl);
+ vfloat32mf2_t vreg_memory_8 = __riscv_vloxei8_v_f32mf2(ptr_load_8, idx_8, vl);
+ vuint16mf4_t idx_9 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 1, vl);
+ vfloat16mf4_t vreg_memory_9 = __riscv_vluxei16_v_f16mf4(ptr_load_9, idx_9, vl);
+ vint8m2_t vreg_memory_10 = __riscv_vle8_v_i8m2(ptr_load_10, vl);
+ vuint64m1_t idx_10 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 1, vl);
+ vfloat16mf4_t vreg_memory_11 = __riscv_vluxei64_v_f16mf4_m(vmask, ptr_load_11, idx_10, vl);
+ vuint64m1_t vreg_memory_12 = __riscv_vle64_v_u64m1(ptr_load_12, vl);
+ vint16mf4_t vreg_memory_13 = __riscv_vle16_v_i16mf4(ptr_load_13, vl);
+ vuint16mf4_t idx_15 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 0, vl);
+ vuint8mf8_t vreg_memory_14 = __riscv_vloxei16_v_u8mf8(ptr_load_14, idx_15, vl);
+ vuint8m1_t vreg_memory_15 = __riscv_vle8_v_u8m1(ptr_load_15, vl);
+ vint64m1_t vreg_memory_16 = __riscv_vle64_v_i64m1(ptr_load_16, vl);
+ vfloat32mf2_t vreg_memory_17 = __riscv_vle32_v_f32mf2(ptr_load_17, vl);
+ vint32m1_t vreg_memory_18 = __riscv_vle32_v_i32m1(ptr_load_18, vl);
+ vint64m1_t vreg_memory_19 = __riscv_vle64_v_i64m1(ptr_load_19, vl);
+ vuint32mf2_t idx_20 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 0, vl);
+ vuint8mf8_t vreg_memory_20 = __riscv_vluxei32_v_u8mf8(ptr_load_20, idx_20, vl);
+ vint32mf2_t vreg_memory_21 = __riscv_vle32_v_i32mf2(ptr_load_21, vl);
+ vint32m1_t vreg_memory_22 = __riscv_vle32_v_i32m1(ptr_load_22, vl);
+ vuint32m4_t vreg_memory_23 = __riscv_vle32_v_u32m4(ptr_load_23, vl);
+ vfloat16mf4_t vreg_memory_24 = __riscv_vle16_v_f16mf4(ptr_load_24, vl);
+ vuint64m1_t idx_21 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 3, vl);
+ vint64m1_t vreg_memory_25 = __riscv_vloxei64_v_i64m1_m(vmask, ptr_load_25, idx_21, vl);
+ vuint64m1_t idx_24 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 1, vl);
+ vint16mf4_t vreg_memory_26 = __riscv_vluxei64_v_i16mf4(ptr_load_26, idx_24, vl);
+ vint16mf4_t vreg_memory_27 = __riscv_vlse16_v_i16mf4_m(vmask, ptr_load_27, 2, vl);
+ vint16mf4_t vreg_memory_28 = __riscv_vle16_v_i16mf4(ptr_load_28, vl);
+ vfloat32m2_t vreg_memory_29 = __riscv_vle32_v_f32m2(ptr_load_29, vl);
+ vuint32mf2_t idx_29 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ vfloat64m1_t vreg_memory_30 = __riscv_vluxei32_v_f64m1_m(vmask, ptr_load_30, idx_29, vl);
+ vuint64m1_t idx_30 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 0, vl);
+ vuint8mf8_t vreg_memory_31 = __riscv_vloxei64_v_u8mf8_m(vmask, ptr_load_31, idx_30, vl);
+ vuint8mf8_t idx_31 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ vfloat16mf4_t vreg_memory_32 = __riscv_vluxei8_v_f16mf4_m(vmask, ptr_load_32, idx_31, vl);
+ vuint8mf8_t idx_34 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 2, vl);
+ vint32mf2_t vreg_memory_33 = __riscv_vloxei8_v_i32mf2_m(vmask, ptr_load_33, idx_34, vl);
+ vuint16mf4_t idx_35 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 2, vl);
+ vint32mf2_t vreg_memory_34 = __riscv_vluxei16_v_i32mf2(ptr_load_34, idx_35, vl);
+ vint16mf4_t vreg_memory_35 = __riscv_vlse16_v_i16mf4(ptr_load_35, 2, vl);
+ vuint16mf4_t vreg_memory_36 = __riscv_vlse16_v_u16mf4(ptr_load_36, 2, vl);
+ vreg_memory_36 = __riscv_vremu_vx_u16mf4(vreg_memory_36, (uint16_t)(vl), vl);
+ vuint64m1_t vreg_memory_37 = __riscv_vle64_v_u64m1(ptr_load_37, vl);
+ vuint64m1_t vreg_memory_38 = __riscv_vlse64_v_u64m1(ptr_load_38, 8, vl);
+ vreg_memory_38 = __riscv_vremu_vx_u64m1(vreg_memory_38, (uint64_t)(vl), vl);
+ vfloat64m1_t vreg_memory_39 = __riscv_vlse64_v_f64m1(ptr_load_39, 8, vl);
+ vfloat64m1_t vreg_memory_40 = __riscv_vlse64_v_f64m1(ptr_load_40, 8, vl);
+ vint8mf8_t vload_tmp_3 = __riscv_vle8_v_i8mf8(ptr_load_41, vl);
+ vbool64_t vreg_memory_41 = __riscv_vmseq_vx_i8mf8_b64(vload_tmp_3, 1, vl);
+ vuint64m1_t vreg_memory_42 = __riscv_vle64_v_u64m1(ptr_load_42, vl);
+ vuint64m1_t vreg_memory_43 = __riscv_vle64_v_u64m1(ptr_load_43, vl);
+ vint8mf8_t vload_tmp_5 = __riscv_vle8_v_i8mf8(ptr_load_44, vl);
+ vbool64_t vreg_memory_44 = __riscv_vmseq_vx_i8mf8_b64(vload_tmp_5, 1, vl);
+ vint8mf8_t vload_tmp_7 = __riscv_vle8_v_i8mf8(ptr_load_45, vl);
+ vbool64_t vreg_memory_45 = __riscv_vmseq_vx_i8mf8_b64(vload_tmp_7, 1, vl);
+ vint8m1_t vreg_memory_46 = __riscv_vle8_v_i8m1(ptr_load_46, vl);
+ vuint32mf2_t vreg_memory_47 = __riscv_vle32_v_u32mf2(ptr_load_47, vl);
+ vuint64m1_t vreg_memory_48 = __riscv_vle64_v_u64m1(ptr_load_48, vl);
+ vint16mf4_t vreg_memory_49 = __riscv_vle16_v_i16mf4(ptr_load_49, vl);
+ vuint16mf4_t vreg_memory_50 = __riscv_vlse16_v_u16mf4(ptr_load_50, 2, vl);
+ vuint32mf2_t idx_50 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ vuint16mf4_t vreg_memory_51 = __riscv_vluxei32_v_u16mf4(ptr_load_51, idx_50, vl);
+ vuint16m8_t vreg_memory_52 = __riscv_vle16_v_u16m8(ptr_load_52, vl);
+ vuint16m1_t vreg_memory_53 = __riscv_vle16_v_u16m1(ptr_load_53, vl);
+ vint16m2_t vreg_memory_54 = __riscv_vle16_v_i16m2(ptr_load_54, vl);
+ vuint64m1_t idx_51 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 3, vl);
+ vint64m1_t vreg_memory_55 = __riscv_vloxei64_v_i64m1(ptr_load_55, idx_51, vl);
+ vfloat64m1_t vreg_memory_56 = __riscv_vlse64_v_f64m1(ptr_load_56, 8, vl);
+ vfloat32mf2_t vreg_memory_57 = __riscv_vlse32_v_f32mf2(ptr_load_57, 4, vl);
+ vuint8mf8_t idx_56 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ vint16mf4_t vreg_memory_58 = __riscv_vluxei8_v_i16mf4(ptr_load_58, idx_56, vl);
+ vint16mf4_t vreg_memory_59 = __riscv_vle16_v_i16mf4(ptr_load_59, vl);
+ vuint32mf2_t idx_59 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 0, vl);
+ vint8mf8_t vreg_memory_60 = __riscv_vluxei32_v_i8mf8_m(vmask, ptr_load_60, idx_59, vl);
+ vfloat32mf2_t vreg_memory_61 = __riscv_vle32_v_f32mf2(ptr_load_61, vl);
+ vint32m2_t vreg_memory_62 = __riscv_vle32_v_i32m2(ptr_load_62, vl);
+ vuint32mf2_t vreg_memory_63 = __riscv_vle32_v_u32mf2_m(vmask, ptr_load_63, vl);
+ vuint32mf2_t idx_62 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ vint16mf4_t vreg_memory_64 = __riscv_vluxei32_v_i16mf4(ptr_load_64, idx_62, vl);
+ vuint32m1_t vreg_memory_65 = __riscv_vle32_v_u32m1(ptr_load_65, vl);
+ vuint8m2_t vreg_memory_66 = __riscv_vle8_v_u8m2(ptr_load_66, vl);
+ vuint64m8_t vreg_memory_67 = __riscv_vle64_v_u64m8(ptr_load_67, vl);
+ vint8m2_t vreg_memory_68 = __riscv_vle8_v_i8m2(ptr_load_68, vl);
+ vuint32mf2_t idx_67 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 2, vl);
+ vfloat32mf2_t vreg_memory_69 = __riscv_vloxei32_v_f32mf2(ptr_load_69, idx_67, vl);
+ vuint16mf4_t idx_68 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 1, vl);
+ vint16mf4_t vreg_memory_70 = __riscv_vluxei16_v_i16mf4(ptr_load_70, idx_68, vl);
+ vint16mf4_t vreg_memory_71 = __riscv_vlse16_v_i16mf4(ptr_load_71, 2, vl);
+ vuint32mf2_t vreg_memory_72 = __riscv_vle32_v_u32mf2_m(vmask, ptr_load_72, vl);
+ vuint32mf2_t vreg_memory_73 = __riscv_vlse32_v_u32mf2_m(vmask, ptr_load_73, 4, vl);
+ vint32mf2_t vreg_memory_74 = __riscv_vle32_v_i32mf2(ptr_load_74, vl);
+ vint16m1_t vreg_memory_75 = __riscv_vle16_v_i16m1(ptr_load_75, vl);
+ vint8mf8_t vload_tmp_12 = __riscv_vle8_v_i8mf8(ptr_load_76, vl);
+ vbool64_t vreg_memory_76 = __riscv_vmseq_vx_i8mf8_b64(vload_tmp_12, 1, vl);
+ vuint64m1_t idx_75 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 2, vl);
+ vfloat32mf2_t vreg_memory_77 = __riscv_vluxei64_v_f32mf2_m(vmask, ptr_load_77, idx_75, vl);
+ vuint8mf8_t vreg_memory_78 = __riscv_vlse8_v_u8mf8(ptr_load_78, 1, vl);
+ vint8m1_t vload_tmp_13 = __riscv_vle8_v_i8m1(ptr_load_79, vl);
+ vbool8_t vreg_memory_79 = __riscv_vmseq_vx_i8m1_b8(vload_tmp_13, 1, vl);
+ vint32m4_t vreg_0 = __riscv_vreinterpret_v_i64m4_i32m4(vreg_memory_0);
+ vreg_memory_2 = __riscv_vmadd_vx_u16mf4_m(vmask, vreg_memory_1, 65136, vreg_memory_2, vl);
+ vbool64_t vreg_1 = __riscv_vmsge_vx_i64m1_b64(vreg_memory_3, -8444588278415581228ll, vl);
+ vfloat16m4_t vreg_2 = __riscv_vlmul_ext_v_f16m1_f16m4(vreg_memory_4);
+ vuint16mf4_t vreg_3 = __riscv_vmadd_vv_u16mf4_m(vmask, vreg_memory_5, vreg_memory_1, vreg_memory_6, vl);
+ vreg_memory_7 = __riscv_vslide1down_vx_u8mf8(vreg_memory_7, 43, vl);
+ vfloat32mf2_t vreg_4 = __riscv_vfwnmacc_vf_f32mf2_rm_m(vmask, vreg_memory_8, convert_binary_u16_f16(63541), vreg_memory_9, __RISCV_FRM_RNE, vl);
+ vint8mf8_t vreg_5 = __riscv_vlmul_trunc_v_i8m2_i8mf8(vreg_memory_10);
+ vreg_memory_9 = __riscv_vfmin_vf_f16mf4(vreg_memory_11, convert_binary_u16_f16(5566), vl);
+ vfloat64m1_t vreg_6 = __riscv_vreinterpret_v_u64m1_f64m1(vreg_memory_12);
+ vreg_memory_13 = __riscv_vwmaccsu_vv_i16mf4(vreg_memory_13, vreg_5, vreg_memory_14, vl);
+ vbool2_t vreg_7 = __riscv_vreinterpret_v_u8m1_b2(vreg_memory_15);
+ vbool64_t vreg_8 = __riscv_vreinterpret_v_i64m1_b64(vreg_memory_16);
+ vreg_4 = __riscv_vslidedown_vx_f32mf2(vreg_memory_17, 953680954u, vl);
+ vint32mf2_t vreg_9 = __riscv_vlmul_trunc_v_i32m1_i32mf2(vreg_memory_18);
+ vreg_memory_16 = __riscv_vor_vv_i64m1(vreg_memory_16, vreg_memory_19, vl);
+ vreg_9 = __riscv_vnmsub_vx_i32mf2(vreg_9, 1243647907, vreg_9, vl);
+ vreg_memory_13 = __riscv_vadc_vxm_i16mf4(vreg_memory_13, 30141, vreg_1, vl);
+ vreg_memory_11 = __riscv_vfsgnj_vf_f16mf4_m(vmask, vreg_memory_9, convert_binary_u16_f16(20419), vl);
+ vint16mf4_t vreg_10 = __riscv_vncvt_x_x_w_i16mf4(vreg_9, vl);
+ vuint32mf2_t vreg_11 = __riscv_vid_v_u32mf2(vl);
+ vreg_memory_1 = __riscv_vwsubu_wv_u16mf4(vreg_memory_6, vreg_memory_20, vl);
+ vint32m1_t vreg_12 = __riscv_vredmin_vs_i32mf2_i32m1(vreg_memory_21, vreg_memory_22, vl);
+ vuint32mf2_t vreg_13 = __riscv_vlmul_trunc_v_u32m4_u32mf2(vreg_memory_23);
+ vreg_memory_17 = __riscv_vfwnmsac_vf_f32mf2_rm(vreg_4, convert_binary_u16_f16(13771), vreg_memory_24, __RISCV_FRM_RNE, vl);
+ vint64m1_t vreg_14 = __riscv_vnmsub_vx_i64m1_m(vmask, vreg_memory_3, -7398488331651941832ll, vreg_memory_25, vl);
+ vreg_memory_13 = __riscv_vmacc_vv_i16mf4(vreg_memory_26, vreg_memory_27, vreg_memory_28, vl);
+ vreg_memory_5 = __riscv_vor_vx_u16mf4(vreg_memory_6, 50306, vl);
+ vfloat32m4_t vreg_15 = __riscv_vlmul_ext_v_f32m2_f32m4(vreg_memory_29);
+ vreg_memory_21 = __riscv_vundefined_i32mf2();
+ vreg_9 = __riscv_vrsub_vx_i32mf2(vreg_9, 321778147, vl);
+ vreg_memory_13 = __riscv_vsadd_vv_i16mf4(vreg_memory_13, vreg_memory_27, vl);
+ vint32mf2_t vreg_16 = __riscv_vfcvt_x_f_v_i32mf2_rm_m(vmask, vreg_4, __RISCV_FRM_RNE, vl);
+ vreg_memory_9 = __riscv_vfneg_v_f16mf4(vreg_memory_24, vl);
+ vreg_memory_30 = __riscv_vfsub_vf_f64m1_m(vmask, vreg_memory_30, convert_binary_u64_f64(45746ull), vl);
+ vreg_memory_31 = __riscv_vnsrl_wv_u8mf8(vreg_memory_6, vreg_memory_31, vl);
+ vreg_memory_8 = __riscv_vfwmacc_vv_f32mf2(vreg_memory_8, vreg_memory_32, vreg_memory_11, vl);
+ vreg_3 = __riscv_vmul_vv_u16mf4_m(vmask, vreg_memory_6, vreg_memory_5, vl);
+ vreg_16 = __riscv_vnmsac_vv_i32mf2_m(vmask, vreg_memory_33, vreg_16, vreg_memory_34, vl);
+ vreg_10 = __riscv_vrgather_vv_i16mf4(vreg_memory_35, vreg_memory_36, vl);
+ vreg_memory_38 = __riscv_vrgather_vv_u64m1(vreg_memory_37, vreg_memory_38, vl);
+ vfloat32mf2_t vreg_17 = __riscv_vfnmsac_vf_f32mf2_rm(vreg_4, convert_binary_u32_f32(10330u), vreg_memory_17, __RISCV_FRM_RNE, vl);
+ vfloat64m1_t vreg_18 = __riscv_vfnmacc_vv_f64m1(vreg_memory_30, vreg_memory_30, vreg_memory_39, vl);
+ vfloat64m1_t vreg_19 = __riscv_vcompress_vm_f64m1(vreg_memory_40, vreg_memory_41, vl);
+ vreg_1 = __riscv_vmsbc_vvm_i8mf8_b64(vreg_5, vreg_5, vreg_1, vl);
+ vreg_memory_42 = __riscv_vslideup_vx_u64m1(vreg_memory_42, vreg_memory_43, 102664729u, vl);
+ vreg_memory_41 = __riscv_vmorn_mm_b64(vreg_1, vreg_memory_44, vl);
+ vreg_memory_40 = __riscv_vfwnmacc_vv_f64m1_rm_m(vmask, vreg_memory_39, vreg_memory_8, vreg_4, __RISCV_FRM_RNE, vl);
+ vreg_1 = __riscv_vmadc_vxm_i64m1_b64(vreg_memory_19, -6991190491244929085ll, vreg_memory_45, vl);
+ vreg_memory_24 = __riscv_vfmv_s_f_f16mf4(convert_binary_u16_f16(58872), vl);
+ vbool64_t vreg_20 = __riscv_vreinterpret_v_i8m1_b64(vreg_memory_46);
+ vreg_memory_11 = __riscv_vfmax_vf_f16mf4(vreg_memory_32, convert_binary_u16_f16(3391), vl);
+ vreg_memory_42 = __riscv_vwredsumu_vs_u32mf2_u64m1(vreg_memory_47, vreg_memory_48, vl);
+ vreg_5 = __riscv_vmv_v_x_i8mf8(-52, vl);
+ vreg_memory_9 = __riscv_vmv_v_v_f16mf4(vreg_memory_11, vl);
+ vfloat64m1_t vreg_21 = __riscv_vfmv_s_f_f64m1(convert_binary_u64_f64(30491ull), vl);
+ vint16m8_t vreg_22 = __riscv_vlmul_ext_v_i16mf4_i16m8(vreg_memory_49);
+ vreg_5 = __riscv_vrsub_vx_i8mf8(vreg_5, 109, vl);
+ vreg_memory_36 = __riscv_vslideup_vx_u16mf4_m(vmask, vreg_memory_50, vreg_memory_51, 539231139u, vl);
+ vuint16mf4_t vreg_23 = __riscv_vlmul_trunc_v_u16m8_u16mf4(vreg_memory_52);
+ vbool64_t vreg_24 = __riscv_vreinterpret_v_u16m1_b64(vreg_memory_53);
+ vint16m1_t vreg_25 = __riscv_vlmul_trunc_v_i16m2_i16m1(vreg_memory_54);
+ vuint8mf8_t vreg_26 = __riscv_vmulhu_vx_u8mf8_m(vmask, vreg_memory_14, 116, vl);
+ vreg_memory_3 = __riscv_vnmsub_vx_i64m1_m(vmask, vreg_memory_55, 7037333148368913704ll, vreg_memory_3, vl);
+ vreg_18 = __riscv_vfmacc_vf_f64m1_rm_m(vmask, vreg_memory_56, convert_binary_u64_f64(14616ull), vreg_18, __RISCV_FRM_RNE, vl);
+ vreg_9 = __riscv_vsra_vx_i32mf2(vreg_9, 2782143639u, vl);
+ vreg_19 = __riscv_vfwmsac_vf_f64m1_m(vmask, vreg_memory_30, convert_binary_u32_f32(52139u), vreg_memory_57, vl);
+ vreg_5 = __riscv_vsadd_vv_i8mf8_m(vmask, vreg_5, vreg_5, vl);
+ vint16mf4_t vreg_27 = __riscv_vnmsac_vv_i16mf4_m(vmask, vreg_memory_58, vreg_memory_59, vreg_memory_49, vl);
+ vreg_memory_60 = __riscv_vnot_v_i8mf8_m(vmask, vreg_memory_60, vl);
+ vreg_memory_60 = __riscv_vnsra_wv_i8mf8(vreg_memory_35, vreg_memory_14, vl);
+ vreg_4 = __riscv_vfnmacc_vf_f32mf2(vreg_17, convert_binary_u32_f32(9735u), vreg_memory_61, vl);
+ vreg_5 = __riscv_vor_vv_i8mf8(vreg_memory_60, vreg_memory_60, vl);
+ vint32m4_t vreg_28 = __riscv_vlmul_ext_v_i32m2_i32m4(vreg_memory_62);
+ vreg_memory_47 = __riscv_vmacc_vv_u32mf2_m(vmask, vreg_11, vreg_13, vreg_memory_63, vl);
+ vint16mf4_t vreg_29 = __riscv_vslide1up_vx_i16mf4(vreg_memory_64, 4280, vl);
+ vreg_5 = __riscv_vdiv_vx_i8mf8_m(vmask, vreg_5, -37, vl);
+ vfloat32mf2_t vreg_30 = __riscv_vfnmacc_vv_f32mf2_rm_m(vmask, vreg_memory_8, vreg_memory_61, vreg_memory_57, __RISCV_FRM_RNE, vl);
+ vbool8_t vreg_31 = __riscv_vreinterpret_v_u32m1_b8(vreg_memory_65);
+ vuint64m2_t vreg_32 = __riscv_vreinterpret_v_u8m2_u64m2(vreg_memory_66);
+ vint8mf8_t vreg_33 = __riscv_vnmsac_vv_i8mf8(vreg_memory_60, vreg_5, vreg_memory_60, vl);
+ vint8mf8_t vreg_34 = __riscv_vsmul_vv_i8mf8(vreg_33, vreg_33, __RISCV_VXRM_RNU, vl);
+ vreg_memory_61 = __riscv_vfmsac_vv_f32mf2_m(vmask, vreg_17, vreg_17, vreg_30, vl);
+ vreg_memory_16 = __riscv_vasub_vv_i64m1(vreg_14, vreg_memory_25, __RISCV_VXRM_RNU, vl);
+ vuint64m2_t vreg_35 = __riscv_vlmul_trunc_v_u64m8_u64m2(vreg_memory_67);
+ vint8mf8_t vreg_36 = __riscv_vlmul_trunc_v_i8m2_i8mf8(vreg_memory_68);
+ vreg_memory_63 = __riscv_vfncvt_xu_f_w_u32mf2(vreg_memory_30, vl);
+ vreg_23 = __riscv_vfncvt_rtz_xu_f_w_u16mf4_m(vmask, vreg_memory_69, vl);
+ vuint16mf4_t vreg_37 = __riscv_vnmsub_vx_u16mf4_m(vmask, vreg_3, 61201, vreg_memory_50, vl);
+ vreg_memory_28 = __riscv_vsub_vv_i16mf4(vreg_memory_70, vreg_memory_71, vl);
+ vreg_memory_72 = __riscv_vnmsac_vv_u32mf2(vreg_memory_72, vreg_11, vreg_memory_73, vl);
+ vreg_memory_33 = __riscv_vrem_vv_i32mf2(vreg_16, vreg_memory_74, vl);
+ vint64m1_t vreg_38 = __riscv_vreinterpret_v_i16m1_i64m1(vreg_memory_75);
+ vfloat16mf4_t vreg_39 = __riscv_vfmerge_vfm_f16mf4(vreg_memory_32, convert_binary_u16_f16(37406), vreg_memory_76, vl);
+ vfloat16mf4_t vreg_40 = __riscv_vfcvt_f_xu_v_f16mf4_rm_m(vmask, vreg_memory_5, __RISCV_FRM_RNE, vl);
+ vreg_4 = __riscv_vfmsub_vf_f32mf2_rm(vreg_memory_77, convert_binary_u32_f32(21630u), vreg_4, __RISCV_FRM_RNE, vl);
+ vuint8mf8_t vreg_41 = __riscv_vslide1up_vx_u8mf8(vreg_memory_78, 132, vl);
+ vreg_memory_63 = __riscv_vwmulu_vx_u32mf2(vreg_23, 52729, vl);
+ vuint16m1_t vreg_42 = __riscv_vreinterpret_v_b8_u16m1(vreg_memory_79);
+ vreg_memory_7 = __riscv_vasubu_vx_u8mf8(vreg_memory_20, 76, __RISCV_VXRM_RNU, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_memory_2, vreg_memory_2, vl);
+ vint8mf8_t zero_0 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_0 = __riscv_vmerge_vxm_i8mf8(zero_0, 1, vreg_1, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_1, vstore_tmp_0, vl);
+ vuint32mf2_t idx_4 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ __riscv_vsuxei32_v_u16mf4_m(vmask, ptr_store_vreg_3, idx_4, vreg_3, vl);
+ vuint8mf8_t idx_7 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 0, vl);
+ __riscv_vsoxei8_v_u8mf8(ptr_store_vreg_memory_7, idx_7, vreg_memory_7, vl);
+ __riscv_vse32_v_f32mf2(ptr_store_vreg_4, vreg_4, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_5, vreg_5, vl);
+ vuint64m1_t idx_12 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 1, vl);
+ __riscv_vsoxei64_v_f16mf4(ptr_store_vreg_memory_9, idx_12, vreg_memory_9, vl);
+ vuint8mf8_t idx_17 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ __riscv_vsuxei8_v_i16mf4(ptr_store_vreg_memory_13, idx_17, vreg_memory_13, vl);
+ vint8m4_t zero_1 = __riscv_vmv_v_x_i8m4(0, __riscv_vsetvlmax_e8m4());
+ vint8m4_t vstore_tmp_1 = __riscv_vmerge_vxm_i8m4(zero_1, 1, vreg_7, vl);
+ __riscv_vse8_v_i8m4(ptr_store_vreg_7, vstore_tmp_1, vl);
+ vint8mf8_t zero_2 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_2 = __riscv_vmerge_vxm_i8mf8(zero_2, 1, vreg_8, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_8, vstore_tmp_2, vl);
+ __riscv_vsse32_v_f32mf2(ptr_store_vreg_4, 4, vreg_4, vl);
+ __riscv_vse32_v_i32mf2(ptr_store_vreg_9, vreg_9, vl);
+ __riscv_vse64_v_i64m1(ptr_store_vreg_memory_16, vreg_memory_16, vl);
+ __riscv_vse32_v_i32mf2(ptr_store_vreg_9, vreg_9, vl);
+ __riscv_vsse16_v_i16mf4(ptr_store_vreg_memory_13, 2, vreg_memory_13, vl);
+ __riscv_vse16_v_f16mf4_m(vmask, ptr_store_vreg_memory_11, vreg_memory_11, vl);
+ vuint32mf2_t idx_19 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ __riscv_vsoxei32_v_i16mf4(ptr_store_vreg_10, idx_19, vreg_10, vl);
+ __riscv_vse32_v_u32mf2_m(vmask, ptr_store_vreg_11, vreg_11, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_memory_1, vreg_memory_1, vl);
+ __riscv_vse32_v_i32m1(ptr_store_vreg_12, vreg_12, vl);
+ __riscv_vse32_v_u32mf2(ptr_store_vreg_13, vreg_13, vl);
+ __riscv_vse32_v_f32mf2(ptr_store_vreg_memory_17, vreg_memory_17, vl);
+ vuint8mf8_t idx_23 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 3, vl);
+ __riscv_vsoxei8_v_i64m1(ptr_store_vreg_14, idx_23, vreg_14, vl);
+ vuint64m1_t idx_26 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 1, vl);
+ __riscv_vsuxei64_v_i16mf4(ptr_store_vreg_memory_13, idx_26, vreg_memory_13, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_memory_5, vreg_memory_5, vl);
+ __riscv_vse32_v_i32mf2_m(vmask, ptr_store_vreg_9, vreg_9, vl);
+ vuint32mf2_t idx_28 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ __riscv_vsoxei32_v_i16mf4(ptr_store_vreg_memory_13, idx_28, vreg_memory_13, vl);
+ __riscv_vse32_v_i32mf2_m(vmask, ptr_store_vreg_16, vreg_16, vl);
+ __riscv_vse16_v_f16mf4(ptr_store_vreg_memory_9, vreg_memory_9, vl);
+ __riscv_vsse64_v_f64m1(ptr_store_vreg_memory_30, 8, vreg_memory_30, vl);
+ __riscv_vse8_v_u8mf8(ptr_store_vreg_memory_31, vreg_memory_31, vl);
+ vuint32mf2_t idx_33 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 2, vl);
+ __riscv_vsoxei32_v_f32mf2_m(vmask, ptr_store_vreg_memory_8, idx_33, vreg_memory_8, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_3, vreg_3, vl);
+ __riscv_vsse32_v_i32mf2_m(vmask, ptr_store_vreg_16, 4, vreg_16, vl);
+ __riscv_vse16_v_i16mf4(ptr_store_vreg_10, vreg_10, vl);
+ vuint32mf2_t idx_37 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ __riscv_vsoxei32_v_u64m1(ptr_store_vreg_memory_38, idx_37, vreg_memory_38, vl);
+ __riscv_vse32_v_f32mf2_m(vmask, ptr_store_vreg_17, vreg_17, vl);
+ __riscv_vsse64_v_f64m1(ptr_store_vreg_18, 8, vreg_18, vl);
+ vuint32mf2_t idx_39 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ __riscv_vsoxei32_v_f64m1(ptr_store_vreg_19, idx_39, vreg_19, vl);
+ vint8mf8_t zero_4 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_4 = __riscv_vmerge_vxm_i8mf8(zero_4, 1, vreg_1, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_1, vstore_tmp_4, vl);
+ __riscv_vse64_v_u64m1(ptr_store_vreg_memory_42, vreg_memory_42, vl);
+ vint8mf8_t zero_6 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_6 = __riscv_vmerge_vxm_i8mf8(zero_6, 1, vreg_memory_41, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_memory_41, vstore_tmp_6, vl);
+ vuint8mf8_t idx_41 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 3, vl);
+ __riscv_vsuxei8_v_f64m1_m(vmask, ptr_store_vreg_memory_40, idx_41, vreg_memory_40, vl);
+ vint8mf8_t zero_8 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_8 = __riscv_vmerge_vxm_i8mf8(zero_8, 1, vreg_1, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_1, vstore_tmp_8, vl);
+ vuint16mf4_t idx_43 = __riscv_vsll_vx_u16mf4(__riscv_vid_v_u16mf4(vl), 1, vl);
+ __riscv_vsoxei16_v_f16mf4(ptr_store_vreg_memory_24, idx_43, vreg_memory_24, vl);
+ vint8mf8_t zero_9 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_9 = __riscv_vmerge_vxm_i8mf8(zero_9, 1, vreg_20, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_20, vstore_tmp_9, vl);
+ vuint32mf2_t idx_45 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 1, vl);
+ __riscv_vsoxei32_v_f16mf4(ptr_store_vreg_memory_11, idx_45, vreg_memory_11, vl);
+ __riscv_vsse64_v_u64m1(ptr_store_vreg_memory_42, 8, vreg_memory_42, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_5, vreg_5, vl);
+ vuint8mf8_t idx_47 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ __riscv_vsoxei8_v_f16mf4(ptr_store_vreg_memory_9, idx_47, vreg_memory_9, vl);
+ vuint8mf8_t idx_49 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 3, vl);
+ __riscv_vsoxei8_v_f64m1(ptr_store_vreg_21, idx_49, vreg_21, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_5, vreg_5, vl);
+ __riscv_vsse16_v_u16mf4(ptr_store_vreg_memory_36, 2, vreg_memory_36, vl);
+ __riscv_vse16_v_u16mf4(ptr_store_vreg_23, vreg_23, vl);
+ vint8mf8_t zero_10 = __riscv_vmv_v_x_i8mf8(0, __riscv_vsetvlmax_e8mf8());
+ vint8mf8_t vstore_tmp_10 = __riscv_vmerge_vxm_i8mf8(zero_10, 1, vreg_24, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_24, vstore_tmp_10, vl);
+ __riscv_vse16_v_i16m1(ptr_store_vreg_25, vreg_25, vl);
+ __riscv_vsse8_v_u8mf8(ptr_store_vreg_26, 1, vreg_26, vl);
+ vuint32mf2_t idx_53 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ __riscv_vsuxei32_v_i64m1_m(vmask, ptr_store_vreg_memory_3, idx_53, vreg_memory_3, vl);
+ __riscv_vsse64_v_f64m1(ptr_store_vreg_18, 8, vreg_18, vl);
+ __riscv_vsse32_v_i32mf2(ptr_store_vreg_9, 4, vreg_9, vl);
+ vuint32mf2_t idx_55 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 3, vl);
+ __riscv_vsuxei32_v_f64m1(ptr_store_vreg_19, idx_55, vreg_19, vl);
+ __riscv_vsse8_v_i8mf8(ptr_store_vreg_5, 1, vreg_5, vl);
+ vuint8mf8_t idx_58 = __riscv_vsll_vx_u8mf8(__riscv_vid_v_u8mf8(vl), 1, vl);
+ __riscv_vsoxei8_v_i16mf4_m(vmask, ptr_store_vreg_27, idx_58, vreg_27, vl);
+ __riscv_vse8_v_i8mf8_m(vmask, ptr_store_vreg_memory_60, vreg_memory_60, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_memory_60, vreg_memory_60, vl);
+ vuint32mf2_t idx_61 = __riscv_vsll_vx_u32mf2(__riscv_vid_v_u32mf2(vl), 2, vl);
+ __riscv_vsuxei32_v_f32mf2_m(vmask, ptr_store_vreg_4, idx_61, vreg_4, vl);
+ __riscv_vsse8_v_i8mf8_m(vmask, ptr_store_vreg_5, 1, vreg_5, vl);
+ __riscv_vse32_v_u32mf2_m(vmask, ptr_store_vreg_memory_47, vreg_memory_47, vl);
+ __riscv_vse16_v_i16mf4(ptr_store_vreg_29, vreg_29, vl);
+ __riscv_vse8_v_i8mf8_m(vmask, ptr_store_vreg_5, vreg_5, vl);
+ __riscv_vsse32_v_f32mf2(ptr_store_vreg_30, 4, vreg_30, vl);
+ vint8m1_t zero_11 = __riscv_vmv_v_x_i8m1(0, __riscv_vsetvlmax_e8m1());
+ vint8m1_t vstore_tmp_11 = __riscv_vmerge_vxm_i8m1(zero_11, 1, vreg_31, vl);
+ __riscv_vse8_v_i8m1(ptr_store_vreg_31, vstore_tmp_11, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_33, vreg_33, vl);
+ vuint64m1_t idx_64 = __riscv_vsll_vx_u64m1(__riscv_vid_v_u64m1(vl), 0, vl);
+ __riscv_vsoxei64_v_i8mf8(ptr_store_vreg_34, idx_64, vreg_34, vl);
+ __riscv_vsse32_v_f32mf2_m(vmask, ptr_store_vreg_memory_61, 4, vreg_memory_61, vl);
+ __riscv_vse64_v_i64m1(ptr_store_vreg_memory_16, vreg_memory_16, vl);
+ __riscv_vse64_v_u64m2(ptr_store_vreg_35, vreg_35, vl);
+ __riscv_vse8_v_i8mf8(ptr_store_vreg_36, vreg_36, vl);
+
+ ptr_mask += vl;
+ ptr_load_0 += vl;
+ ptr_load_1 += vl;
+ ptr_load_10 += vl;
+ ptr_load_11 += vl;
+ ptr_load_12 += vl;
+ ptr_load_13 += vl;
+ ptr_load_14 += vl;
+ ptr_load_15 += vl;
+ ptr_load_16 += vl;
+ ptr_load_17 += vl;
+ ptr_load_18 += vl;
+ ptr_load_19 += vl;
+ ptr_load_2 += vl;
+ ptr_load_20 += vl;
+ ptr_load_21 += vl;
+ ptr_load_22 += vl;
+ ptr_load_23 += vl;
+ ptr_load_24 += vl;
+ ptr_load_25 += vl;
+ ptr_load_26 += vl;
+ ptr_load_27 += vl;
+ ptr_load_28 += vl;
+ ptr_load_29 += vl;
+ ptr_load_3 += vl;
+ ptr_load_30 += vl;
+ ptr_load_31 += vl;
+ ptr_load_32 += vl;
+ ptr_load_33 += vl;
+ ptr_load_34 += vl;
+ ptr_load_35 += vl;
+ ptr_load_36 += vl;
+ ptr_load_37 += vl;
+ ptr_load_38 += vl;
+ ptr_load_39 += vl;
+ ptr_load_4 += vl;
+ ptr_load_40 += vl;
+ ptr_load_41 += vl;
+ ptr_load_42 += vl;
+ ptr_load_43 += vl;
+ ptr_load_44 += vl;
+ ptr_load_45 += vl;
+ ptr_load_46 += vl;
+ ptr_load_47 += vl;
+ ptr_load_48 += vl;
+ ptr_load_49 += vl;
+ ptr_load_5 += vl;
+ ptr_load_50 += vl;
+ ptr_load_51 += vl;
+ ptr_load_52 += vl;
+ ptr_load_53 += vl;
+ ptr_load_54 += vl;
+ ptr_load_55 += vl;
+ ptr_load_56 += vl;
+ ptr_load_57 += vl;
+ ptr_load_58 += vl;
+ ptr_load_59 += vl;
+ ptr_load_6 += vl;
+ ptr_load_60 += vl;
+ ptr_load_61 += vl;
+ ptr_load_62 += vl;
+ ptr_load_63 += vl;
+ ptr_load_64 += vl;
+ ptr_load_65 += vl;
+ ptr_load_66 += vl;
+ ptr_load_67 += vl;
+ ptr_load_68 += vl;
+ ptr_load_69 += vl;
+ ptr_load_7 += vl;
+ ptr_load_70 += vl;
+ ptr_load_71 += vl;
+ ptr_load_72 += vl;
+ ptr_load_73 += vl;
+ ptr_load_74 += vl;
+ ptr_load_75 += vl;
+ ptr_load_76 += vl;
+ ptr_load_77 += vl;
+ ptr_load_78 += vl;
+ ptr_load_79 += vl;
+ ptr_load_8 += vl;
+ ptr_load_9 += vl;
+ ptr_store_vreg_0 += vl;
+ ptr_store_vreg_1 += vl;
+ ptr_store_vreg_10 += vl;
+ ptr_store_vreg_11 += vl;
+ ptr_store_vreg_12 += vl;
+ ptr_store_vreg_13 += vl;
+ ptr_store_vreg_14 += vl;
+ ptr_store_vreg_15 += vl;
+ ptr_store_vreg_16 += vl;
+ ptr_store_vreg_17 += vl;
+ ptr_store_vreg_18 += vl;
+ ptr_store_vreg_19 += vl;
+ ptr_store_vreg_2 += vl;
+ ptr_store_vreg_20 += vl;
+ ptr_store_vreg_21 += vl;
+ ptr_store_vreg_22 += vl;
+ ptr_store_vreg_23 += vl;
+ ptr_store_vreg_24 += vl;
+ ptr_store_vreg_25 += vl;
+ ptr_store_vreg_26 += vl;
+ ptr_store_vreg_27 += vl;
+ ptr_store_vreg_28 += vl;
+ ptr_store_vreg_29 += vl;
+ ptr_store_vreg_3 += vl;
+ ptr_store_vreg_30 += vl;
+ ptr_store_vreg_31 += vl;
+ ptr_store_vreg_32 += vl;
+ ptr_store_vreg_33 += vl;
+ ptr_store_vreg_34 += vl;
+ ptr_store_vreg_35 += vl;
+ ptr_store_vreg_36 += vl;
+ ptr_store_vreg_37 += vl;
+ ptr_store_vreg_38 += vl;
+ ptr_store_vreg_39 += vl;
+ ptr_store_vreg_4 += vl;
+ ptr_store_vreg_40 += vl;
+ ptr_store_vreg_41 += vl;
+ ptr_store_vreg_42 += vl;
+ ptr_store_vreg_5 += vl;
+ ptr_store_vreg_6 += vl;
+ ptr_store_vreg_7 += vl;
+ ptr_store_vreg_8 += vl;
+ ptr_store_vreg_9 += vl;
+ ptr_store_vreg_memory_1 += vl;
+ ptr_store_vreg_memory_11 += vl;
+ ptr_store_vreg_memory_13 += vl;
+ ptr_store_vreg_memory_16 += vl;
+ ptr_store_vreg_memory_17 += vl;
+ ptr_store_vreg_memory_2 += vl;
+ ptr_store_vreg_memory_21 += vl;
+ ptr_store_vreg_memory_24 += vl;
+ ptr_store_vreg_memory_28 += vl;
+ ptr_store_vreg_memory_3 += vl;
+ ptr_store_vreg_memory_30 += vl;
+ ptr_store_vreg_memory_31 += vl;
+ ptr_store_vreg_memory_33 += vl;
+ ptr_store_vreg_memory_36 += vl;
+ ptr_store_vreg_memory_38 += vl;
+ ptr_store_vreg_memory_40 += vl;
+ ptr_store_vreg_memory_41 += vl;
+ ptr_store_vreg_memory_42 += vl;
+ ptr_store_vreg_memory_47 += vl;
+ ptr_store_vreg_memory_5 += vl;
+ ptr_store_vreg_memory_60 += vl;
+ ptr_store_vreg_memory_61 += vl;
+ ptr_store_vreg_memory_63 += vl;
+ ptr_store_vreg_memory_7 += vl;
+ ptr_store_vreg_memory_72 += vl;
+ ptr_store_vreg_memory_8 += vl;
+ ptr_store_vreg_memory_9 += vl;
+ }
+ return 0;
+}
+
+/* { dg-final { scan-assembler-not "e64,mf4" } } */