RISC-V: Implement vec_set and vec_extract.

Message ID fcd153fb-4e70-a772-14b1-730490e35611@gmail.com
State Committed
Delegated to: Jeff Law
Headers
Series RISC-V: Implement vec_set and vec_extract. |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed

Commit Message

Robin Dapp June 12, 2023, 2:55 p.m. UTC
  Hi,

this implements the vec_set and vec_extract patterns for integer and
floating-point data types.  For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.

vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.

The patch does not include vector-vector extraction which
will be done at a later time.

The vec_set tests required a vector calling convention/ABI because
a vector is being returned.  I'm currently experimenting with adding
preliminary vector ABI support locally and still finishing some tests
after discussing with Juzhe.  Consequently, I would not push this
before ABI support is upstream.

Regards
 Robin

gcc/ChangeLog:

	* config/riscv/autovec.md (vec_set<mode>): Implement.
	(vec_extract<mode><vel>): Implement.
	* config/riscv/riscv-protos.h (enum insn_type): Add slide insn.
	(emit_vlmax_slide_insn): Declare.
	(emit_nonvlmax_slide_tu_insn): Declare.
	(emit_scalar_move_insn): Export.
	(emit_nonvlmax_integer_move_insn): Export.
	* config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function.
	(emit_nonvlmax_slide_tu_insn): New function.
	(emit_vlmax_masked_mu_insn): No change.
	(emit_vlmax_integer_move_insn): Export.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
	* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
---
 gcc/config/riscv/autovec.md                   |  79 ++++++
 gcc/config/riscv/riscv-protos.h               |   5 +
 gcc/config/riscv/riscv-v.cc                   |  62 ++++-
 .../rvv/autovec/vls-vlmax/vec_extract-1.c     |  49 ++++
 .../rvv/autovec/vls-vlmax/vec_extract-2.c     |  58 +++++
 .../rvv/autovec/vls-vlmax/vec_extract-3.c     |  59 +++++
 .../rvv/autovec/vls-vlmax/vec_extract-4.c     |  60 +++++
 .../rvv/autovec/vls-vlmax/vec_extract-run.c   | 230 ++++++++++++++++++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-1.c   |  52 ++++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-2.c   |  62 +++++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-3.c   |  63 +++++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-4.c   |  64 +++++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-run.c | 230 ++++++++++++++++++
 13 files changed, 1071 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
  

Comments

钟居哲 June 12, 2023, 3:13 p.m. UTC | #1
+  /* If the slide offset fits into 5 bits we can
+     use the immediate variant instead of the register variant.
+     The expander's operand[2] is ops[3] here. */
+  if (!satisfies_constraint_K (ops[3]))
+    ops[3] = force_reg (Pmode, ops[3]);

I don't think we need this. maybe_expand_insn should be able to handle this.


juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-06-12 22:55
To: gcc-patches; palmer; Kito Cheng; juzhe.zhong@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Implement vec_set and vec_extract.
Hi,
 
this implements the vec_set and vec_extract patterns for integer and
floating-point data types.  For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.
 
vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.
 
The patch does not include vector-vector extraction which
will be done at a later time.
 
The vec_set tests required a vector calling convention/ABI because
a vector is being returned.  I'm currently experimenting with adding
preliminary vector ABI support locally and still finishing some tests
after discussing with Juzhe.  Consequently, I would not push this
before ABI support is upstream.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (vec_set<mode>): Implement.
(vec_extract<mode><vel>): Implement.
* config/riscv/riscv-protos.h (enum insn_type): Add slide insn.
(emit_vlmax_slide_insn): Declare.
(emit_nonvlmax_slide_tu_insn): Declare.
(emit_scalar_move_insn): Export.
(emit_nonvlmax_integer_move_insn): Export.
* config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function.
(emit_nonvlmax_slide_tu_insn): New function.
(emit_vlmax_masked_mu_insn): No change.
(emit_vlmax_integer_move_insn): Export.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
---
gcc/config/riscv/autovec.md                   |  79 ++++++
gcc/config/riscv/riscv-protos.h               |   5 +
gcc/config/riscv/riscv-v.cc                   |  62 ++++-
.../rvv/autovec/vls-vlmax/vec_extract-1.c     |  49 ++++
.../rvv/autovec/vls-vlmax/vec_extract-2.c     |  58 +++++
.../rvv/autovec/vls-vlmax/vec_extract-3.c     |  59 +++++
.../rvv/autovec/vls-vlmax/vec_extract-4.c     |  60 +++++
.../rvv/autovec/vls-vlmax/vec_extract-run.c   | 230 ++++++++++++++++++
.../riscv/rvv/autovec/vls-vlmax/vec_set-1.c   |  52 ++++
.../riscv/rvv/autovec/vls-vlmax/vec_set-2.c   |  62 +++++
.../riscv/rvv/autovec/vls-vlmax/vec_set-3.c   |  63 +++++
.../riscv/rvv/autovec/vls-vlmax/vec_set-4.c   |  64 +++++
.../riscv/rvv/autovec/vls-vlmax/vec_set-run.c | 230 ++++++++++++++++++
13 files changed, 1071 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b7070099f29..9cfa48f94b5 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -640,3 +640,82 @@ (define_expand "select_vl<mode>"
   riscv_vector::expand_select_vl (operands);
   DONE;
})
+
+;; -------------------------------------------------------------------------
+;; ---- [INT,FP] Insert a vector element.
+;; -------------------------------------------------------------------------
+
+(define_expand "vec_set<mode>"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:<VEL> 1 "register_operand")
+   (match_operand 2 "immediate_operand")]
+  "TARGET_VECTOR"
+{
+  /* If we set the first element, emit an v(f)mv.s.[xf].  */
+  if (operands[2] == const0_rtx)
+    {
+      rtx ops[] = {operands[0], riscv_vector::gen_scalar_move_mask (<VM>mode),
+    RVV_VUNDEF (<MODE>mode), operands[1]};
+      riscv_vector::emit_scalar_move_insn
+   (code_for_pred_broadcast (<MODE>mode), ops);
+    }
+  else
+    {
+      /* Move the desired value into a vector register and insert
+ it at the proper position using vslideup with an
+ "effective length" of 1 i.e. a VL 1 past the offset.  */
+
+      /* Slide offset = element index.  */
+      int offset = INTVAL (operands[2]);
+
+      /* Only insert one element, i.e. VL = offset + 1.  */
+      rtx length = gen_reg_rtx (Pmode);
+      emit_move_insn (length, GEN_INT (offset + 1));
+
+      /* Move operands[1] into a vector register via vmv.v.x using the same
+ VL we need for the slide.  */
+      rtx tmp = gen_reg_rtx (<MODE>mode);
+      rtx ops1[] = {tmp, operands[1]};
+      riscv_vector::emit_nonvlmax_integer_move_insn
+ (code_for_pred_broadcast (<MODE>mode), ops1, length);
+
+      /* Slide exactly one element up leaving the tail elements
+ unchanged.  */
+      rtx ops2[] = {operands[0], operands[0], tmp, operands[2]};
+      riscv_vector::emit_nonvlmax_slide_tu_insn
+ (code_for_pred_slide (UNSPEC_VSLIDEUP, <MODE>mode), ops2, length);
+    }
+  DONE;
+})
+
+;; -------------------------------------------------------------------------
+;; ---- [INT,FP] Extract a vector element.
+;; -------------------------------------------------------------------------
+(define_expand "vec_extract<mode><vel>"
+  [(set (match_operand:<VEL>   0 "register_operand")
+     (vec_select:<VEL>
+       (match_operand:V   1 "register_operand")
+       (parallel
+ [(match_operand   2 "nonmemory_operand")])))]
+  "TARGET_VECTOR"
+{
+  /* Element extraction can be done by sliding down the requested element
+     to index 0 and then v(f)mv.[xf].s it to a scalar register.  */
+
+  /* When extracting any other than the first element we need to slide
+     it down.  */
+  rtx tmp = NULL_RTX;
+  if (operands[2] != const0_rtx)
+    {
+      /* Emit the slide down to index 0 in a new vector.  */
+      tmp = gen_reg_rtx (<MODE>mode);
+      rtx ops[] = {tmp, RVV_VUNDEF (<MODE>mode), operands[1], operands[2]};
+      riscv_vector::emit_vlmax_slide_insn
+ (code_for_pred_slide (UNSPEC_VSLIDEDOWN, <MODE>mode), ops);
+    }
+
+  /* Emit v(f)mv.[xf].s.  */
+  emit_insn (gen_pred_extract_first (<MODE>mode, operands[0],
+      tmp ? tmp : operands[1]));
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6db3a46c682..7b327047ad5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -146,6 +146,7 @@ enum insn_type
   RVV_TERNOP = 5,
   RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
+  RVV_SLIDE_OP = 4,      /* Dest, VUNDEF, source and offset.  */
};
enum vlmul_type
{
@@ -186,10 +187,14 @@ void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0);
void emit_vlmax_ternary_insn (unsigned, int, rtx *, rtx = 0);
void emit_nonvlmax_insn (unsigned, int, rtx *, rtx);
+void emit_vlmax_slide_insn (unsigned, rtx *);
+void emit_nonvlmax_slide_tu_insn (unsigned, rtx *, rtx);
void emit_vlmax_merge_insn (unsigned, int, rtx *);
void emit_vlmax_cmp_insn (unsigned, rtx *);
void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
+void emit_scalar_move_insn (unsigned, rtx *);
+void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
enum vlmul_type get_vlmul (machine_mode);
unsigned int get_ratio (machine_mode);
unsigned int get_nf (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e1b85a5af91..0ecf338eba8 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -685,6 +685,64 @@ emit_nonvlmax_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
   e.emit_insn ((enum insn_code) icode, ops);
}
+/* This function emits a {NONVLMAX, TAIL_UNDISTURBED, MASK_ANY} vsetvli
+   followed by a vslide insn (with real merge operand).  */
+void
+emit_vlmax_slide_insn (unsigned icode, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_SLIDE_OP,
+   /* HAS_DEST_P */ true,
+   /* FULLY_UNMASKED_P */ true,
+   /* USE_REAL_MERGE_P */ true,
+   /* HAS_AVL_P */ true,
+   /* VLMAX_P */ true,
+   dest_mode,
+   mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.set_policy (MASK_ANY);
+
+  /* If the slide offset fits into 5 bits we can
+     use the immediate variant instead of the register variant.
+     The expander's operand[2] is ops[3] here. */
+  if (!satisfies_constraint_K (ops[3]))
+    ops[3] = force_reg (Pmode, ops[3]);
+
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* This function emits a {NONVLMAX, TAIL_UNDISTURBED, MASK_ANY} vsetvli
+   followed by a vslide insn (with real merge operand).  */
+void
+emit_nonvlmax_slide_tu_insn (unsigned icode, rtx *ops, rtx avl)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_SLIDE_OP,
+   /* HAS_DEST_P */ true,
+   /* FULLY_UNMASKED_P */ true,
+   /* USE_REAL_MERGE_P */ true,
+   /* HAS_AVL_P */ true,
+   /* VLMAX_P */ false,
+   dest_mode,
+   mask_mode);
+
+  e.set_policy (TAIL_UNDISTURBED);
+  e.set_policy (MASK_ANY);
+  e.set_vl (avl);
+
+  /* If the slide offset fits into 5 bits we can
+     use the immediate variant instead of the register variant.
+     The expander's operand[2] is ops[3] here. */
+  if (!satisfies_constraint_K (ops[3]))
+    ops[3] = force_reg (Pmode, ops[3]);
+
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+
/* This function emits merge instruction.  */
void
emit_vlmax_merge_insn (unsigned icode, int op_num, rtx *ops)
@@ -758,7 +816,7 @@ emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
/* Emit vmv.s.x instruction.  */
-static void
+void
emit_scalar_move_insn (unsigned icode, rtx *ops)
{
   machine_mode dest_mode = GET_MODE (ops[0]);
@@ -788,7 +846,7 @@ emit_vlmax_integer_move_insn (unsigned icode, rtx *ops, rtx vl)
/* Emit vmv.v.x instruction with nonvlmax.  */
-static void
+void
emit_nonvlmax_integer_move_insn (unsigned icode, rtx *ops, rtx avl)
{
   emit_nonvlmax_insn (icode, riscv_vector::RVV_UNOP, ops, avl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
new file mode 100644
index 00000000000..b631fdb9cc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx2di __attribute__((vector_size (16)));
+typedef int32_t vnx4si __attribute__((vector_size (16)));
+typedef int16_t vnx8hi __attribute__((vector_size (16)));
+typedef int8_t vnx16qi __attribute__((vector_size (16)));
+typedef double vnx2df __attribute__((vector_size (16)));
+typedef float vnx4sf __attribute__((vector_size (16)));
+
+
+#define VEC_EXTRACT(S,V,IDX) \
+  S \
+  __attribute__((noipa)) \
+  vec_extract_##V##_##IDX (V v) \
+  { \
+    return v[IDX]; \
+  }
+
+#define TEST_ALL1(T) \
+  T (int64_t, vnx2di, 0) \
+  T (int64_t, vnx2di, 1) \
+  T (int32_t, vnx4si, 0) \
+  T (int32_t, vnx4si, 1) \
+  T (int32_t, vnx4si, 3) \
+  T (int16_t, vnx8hi, 0) \
+  T (int16_t, vnx8hi, 2) \
+  T (int16_t, vnx8hi, 6) \
+  T (int8_t, vnx16qi, 0) \
+  T (int8_t, vnx16qi, 1) \
+  T (int8_t, vnx16qi, 7) \
+  T (int8_t, vnx16qi, 11) \
+  T (int8_t, vnx16qi, 15) \
+  T (float, vnx4sf, 0) \
+  T (float, vnx4sf, 1) \
+  T (float, vnx4sf, 3) \
+  T (double, vnx2df, 0) \
+  T (double, vnx2df, 1) \
+
+TEST_ALL1 (VEC_EXTRACT)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9]+,\s*[a-z0-9]+,\s*e[1-8]+,\s*m1,\s*ta,\s*ma} 18 } } */
+
+/* { dg-final { scan-assembler-times {\tvslidedown.vi} 12 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.f.s} 5 } } */
+/* { dg-final { scan-assembler-times {\tvmv.x.s} 13 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
new file mode 100644
index 00000000000..0a93752bd4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx4di __attribute__((vector_size (32)));
+typedef int32_t vnx8si __attribute__((vector_size (32)));
+typedef int16_t vnx16hi __attribute__((vector_size (32)));
+typedef int8_t vnx32qi __attribute__((vector_size (32)));
+typedef double vnx4df __attribute__((vector_size (32)));
+typedef float vnx8sf __attribute__((vector_size (32)));
+
+#define VEC_EXTRACT(S,V,IDX) \
+  S \
+  __attribute__((noipa)) \
+  vec_extract_##V##_##IDX (V v) \
+  { \
+    return v[IDX]; \
+  }
+
+#define TEST_ALL2(T) \
+  T (float, vnx8sf, 0) \
+  T (float, vnx8sf, 1) \
+  T (float, vnx8sf, 3) \
+  T (float, vnx8sf, 4) \
+  T (float, vnx8sf, 7) \
+  T (double, vnx4df, 0) \
+  T (double, vnx4df, 1) \
+  T (double, vnx4df, 2) \
+  T (double, vnx4df, 3) \
+  T (int64_t, vnx4di, 0) \
+  T (int64_t, vnx4di, 1) \
+  T (int64_t, vnx4di, 2) \
+  T (int64_t, vnx4di, 3) \
+  T (int32_t, vnx8si, 0) \
+  T (int32_t, vnx8si, 1) \
+  T (int32_t, vnx8si, 3) \
+  T (int32_t, vnx8si, 4) \
+  T (int32_t, vnx8si, 7) \
+  T (int16_t, vnx16hi, 0) \
+  T (int16_t, vnx16hi, 1) \
+  T (int16_t, vnx16hi, 7) \
+  T (int16_t, vnx16hi, 8) \
+  T (int16_t, vnx16hi, 15) \
+  T (int8_t, vnx32qi, 0) \
+  T (int8_t, vnx32qi, 1) \
+  T (int8_t, vnx32qi, 15) \
+  T (int8_t, vnx32qi, 16) \
+  T (int8_t, vnx32qi, 31) \
+
+TEST_ALL2 (VEC_EXTRACT)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9]+,\s*[a-z0-9]+,\s*e[1-8]+,\s*m2,\s*ta,\s*ma} 28 } } */
+
+/* { dg-final { scan-assembler-times {\tvslidedown.vi} 22 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.f.s} 9 } } */
+/* { dg-final { scan-assembler-times {\tvmv.x.s} 19 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
new file mode 100644
index 00000000000..24c39168578
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx8di __attribute__((vector_size (64)));
+typedef int32_t vnx16si __attribute__((vector_size (64)));
+typedef int16_t vnx32hi __attribute__((vector_size (64)));
+typedef int8_t vnx64qi __attribute__((vector_size (64)));
+typedef double vnx8df __attribute__((vector_size (64)));
+typedef float vnx16sf __attribute__((vector_size (64)));
+
+#define VEC_EXTRACT(S,V,IDX) \
+  S \
+  __attribute__((noipa)) \
+  vec_extract_##V##_##IDX (V v) \
+  { \
+    return v[IDX]; \
+  }
+
+#define TEST_ALL3(T) \
+  T (float, vnx16sf, 0) \
+  T (float, vnx16sf, 2) \
+  T (float, vnx16sf, 6) \
+  T (float, vnx16sf, 8) \
+  T (float, vnx16sf, 14) \
+  T (double, vnx8df, 0) \
+  T (double, vnx8df, 2) \
+  T (double, vnx8df, 4) \
+  T (double, vnx8df, 6) \
+  T (int64_t, vnx8di, 0) \
+  T (int64_t, vnx8di, 2) \
+  T (int64_t, vnx8di, 4) \
+  T (int64_t, vnx8di, 6) \
+  T (int32_t, vnx16si, 0) \
+  T (int32_t, vnx16si, 2) \
+  T (int32_t, vnx16si, 6) \
+  T (int32_t, vnx16si, 8) \
+  T (int32_t, vnx16si, 14) \
+  T (int16_t, vnx32hi, 0) \
+  T (int16_t, vnx32hi, 2) \
+  T (int16_t, vnx32hi, 14) \
+  T (int16_t, vnx32hi, 16) \
+  T (int16_t, vnx32hi, 30) \
+  T (int8_t, vnx64qi, 0) \
+  T (int8_t, vnx64qi, 2) \
+  T (int8_t, vnx64qi, 30) \
+  T (int8_t, vnx64qi, 32) \
+  T (int8_t, vnx64qi, 63) \
+
+TEST_ALL3 (VEC_EXTRACT)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9]+,\s*[a-z0-9]+,\s*e[1-8]+,\s*m4,\s*ta,\s*ma} 28 } } */
+
+/* { dg-final { scan-assembler-times {\tvslidedown.vi} 20 } } */
+/* { dg-final { scan-assembler-times {\tvslidedown.vx} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.f.s} 9 } } */
+/* { dg-final { scan-assembler-times {\tvmv.x.s} 19 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
new file mode 100644
index 00000000000..e3d29cab628
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx16di __attribute__((vector_size (128)));
+typedef int32_t vnx32si __attribute__((vector_size (128)));
+typedef int16_t vnx64hi __attribute__((vector_size (128)));
+typedef int8_t vnx128qi __attribute__((vector_size (128)));
+typedef double vnx16df __attribute__((vector_size (128)));
+typedef float vnx32sf __attribute__((vector_size (128)));
+
+#define VEC_EXTRACT(S,V,IDX) \
+  S \
+  __attribute__((noipa)) \
+  vec_extract_##V##_##IDX (V v) \
+  { \
+    return v[IDX]; \
+  }
+
+#define TEST_ALL4(T) \
+  T (float, vnx32sf, 0) \
+  T (float, vnx32sf, 3) \
+  T (float, vnx32sf, 12) \
+  T (float, vnx32sf, 17) \
+  T (float, vnx32sf, 14) \
+  T (double, vnx16df, 0) \
+  T (double, vnx16df, 4) \
+  T (double, vnx16df, 8) \
+  T (double, vnx16df, 12) \
+  T (int64_t, vnx16di, 0) \
+  T (int64_t, vnx16di, 4) \
+  T (int64_t, vnx16di, 8) \
+  T (int64_t, vnx16di, 12) \
+  T (int32_t, vnx32si, 0) \
+  T (int32_t, vnx32si, 4) \
+  T (int32_t, vnx32si, 12) \
+  T (int32_t, vnx32si, 16) \
+  T (int32_t, vnx32si, 28) \
+  T (int16_t, vnx64hi, 0) \
+  T (int16_t, vnx64hi, 4) \
+  T (int16_t, vnx64hi, 28) \
+  T (int16_t, vnx64hi, 32) \
+  T (int16_t, vnx64hi, 60) \
+  T (int8_t, vnx128qi, 0) \
+  T (int8_t, vnx128qi, 4) \
+  T (int8_t, vnx128qi, 30) \
+  T (int8_t, vnx128qi, 60) \
+  T (int8_t, vnx128qi, 64) \
+  T (int8_t, vnx128qi, 127) \
+
+TEST_ALL4 (VEC_EXTRACT)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-the slide offset fits into 5 bits we can
+     use the immediate variant instead of the register variant.
+     The expander's operand[2] is ops[3] here. */9]+,\s*[a-z0-9]+,\s*e[1-8]+,\s*m8,\s*ta,\s*ma} 29 } } */
+
+/* { dg-final { scan-assembler-times {\tvslidedown.vi} 18 } } */
+/* { dg-final { scan-assembler-times {\tvslidedown.vx} 5 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.f.s} 9 } } */
+/* { dg-final { scan-assembler-times {\tvmv.x.s} 20 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
new file mode 100644
index 00000000000..534eb19f613
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
@@ -0,0 +1,230 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdlib.h>
+#include <assert.h>
+
+#include "vec_extract-1.c"
+#include "vec_extract-2.c"
+#include "vec_extract-3.c"
+#include "vec_extract-4.c"
+
+#define CHECK(S, V, IDX) \
+void check_##V##_##IDX () \
+  { \
+    V v; \
+    for (int i = 0; i < sizeof (V) / sizeof (S); i++) \
+      v[i] = i; \
+    S res = vec_extract_##V##_##IDX (v); \
+    assert (res == v[IDX]); \
+  }
+
+#define CHECK_ALL(T) \
+  T (int64_t, vnx2di, 0) \
+  T (int64_t, vnx2di, 1) \
+  T (int32_t, vnx4si, 0) \
+  T (int32_t, vnx4si, 1) \
+  T (int32_t, vnx4si, 3) \
+  T (int16_t, vnx8hi, 0) \
+  T (int16_t, vnx8hi, 2) \
+  T (int16_t, vnx8hi, 6) \
+  T (int8_t, vnx16qi, 0) \
+  T (int8_t, vnx16qi, 1) \
+  T (int8_t, vnx16qi, 7) \
+  T (int8_t, vnx16qi, 11) \
+  T (int8_t, vnx16qi, 15) \
+  T (float, vnx8sf, 0) \
+  T (float, vnx8sf, 1) \
+  T (float, vnx8sf, 3) \
+  T (float, vnx8sf, 4) \
+  T (float, vnx8sf, 7) \
+  T (double, vnx4df, 0) \
+  T (double, vnx4df, 1) \
+  T (double, vnx4df, 2) \
+  T (double, vnx4df, 3) \
+  T (int64_t, vnx4di, 0) \
+  T (int64_t, vnx4di, 1) \
+  T (int64_t, vnx4di, 2) \
+  T (int64_t, vnx4di, 3) \
+  T (int32_t, vnx8si, 0) \
+  T (int32_t, vnx8si, 1) \
+  T (int32_t, vnx8si, 3) \
+  T (int32_t, vnx8si, 4) \
+  T (int32_t, vnx8si, 7) \
+  T (int16_t, vnx16hi, 0) \
+  T (int16_t, vnx16hi, 1) \
+  T (int16_t, vnx16hi, 7) \
+  T (int16_t, vnx16hi, 8) \
+  T (int16_t, vnx16hi, 15) \
+  T (int8_t, vnx32qi, 0) \
+  T (int8_t, vnx32qi, 1) \
+  T (int8_t, vnx32qi, 15) \
+  T (int8_t, vnx32qi, 16) \
+  T (int8_t, vnx32qi, 31) \
+  T (float, vnx16sf, 0) \
+  T (float, vnx16sf, 2) \
+  T (float, vnx16sf, 6) \
+  T (float, vnx16sf, 8) \
+  T (float, vnx16sf, 14) \
+  T (double, vnx8df, 0) \
+  T (double, vnx8df, 2) \
+  T (double, vnx8df, 4) \
+  T (double, vnx8df, 6) \
+  T (int64_t, vnx8di, 0) \
+  T (int64_t, vnx8di, 2) \
+  T (int64_t, vnx8di, 4) \
+  T (int64_t, vnx8di, 6) \
+  T (int32_t, vnx16si, 0) \
+  T (int32_t, vnx16si, 2) \
+  T (int32_t, vnx16si, 6) \
+  T (int32_t, vnx16si, 8) \
+  T (int32_t, vnx16si, 14) \
+  T (int16_t, vnx32hi, 0) \
+  T (int16_t, vnx32hi, 2) \
+  T (int16_t, vnx32hi, 14) \
+  T (int16_t, vnx32hi, 16) \
+  T (int16_t, vnx32hi, 30) \
+  T (int8_t, vnx64qi, 0) \
+  T (int8_t, vnx64qi, 2) \
+  T (int8_t, vnx64qi, 30) \
+  T (int8_t, vnx64qi, 32) \
+  T (int8_t, vnx64qi, 63) \
+  T (float, vnx32sf, 0) \
+  T (float, vnx32sf, 3) \
+  T (float, vnx32sf, 12) \
+  T (float, vnx32sf, 17) \
+  T (float, vnx32sf, 14) \
+  T (double, vnx16df, 0) \
+  T (double, vnx16df, 4) \
+  T (double, vnx16df, 8) \
+  T (double, vnx16df, 12) \
+  T (int64_t, vnx16di, 0) \
+  T (int64_t, vnx16di, 4) \
+  T (int64_t, vnx16di, 8) \
+  T (int64_t, vnx16di, 12) \
+  T (int32_t, vnx32si, 0) \
+  T (int32_t, vnx32si, 4) \
+  T (int32_t, vnx32si, 12) \
+  T (int32_t, vnx32si, 16) \
+  T (int32_t, vnx32si, 28) \
+  T (int16_t, vnx64hi, 0) \
+  T (int16_t, vnx64hi, 4) \
+  T (int16_t, vnx64hi, 28) \
+  T (int16_t, vnx64hi, 32) \
+  T (int16_t, vnx64hi, 60) \
+  T (int8_t, vnx128qi, 0) \
+  T (int8_t, vnx128qi, 4) \
+  T (int8_t, vnx128qi, 30) \
+  T (int8_t, vnx128qi, 60) \
+  T (int8_t, vnx128qi, 64) \
+  T (int8_t, vnx128qi, 127) \
+
+CHECK_ALL (CHECK)
+
+#define RUN(S, V, IDX) \
+  check_##V##_##IDX ();
+
+#define RUN_ALL(T) \
+  T (int64_t, vnx2di, 0) \
+  T (int64_t, vnx2di, 1) \
+  T (int32_t, vnx4si, 0) \
+  T (int32_t, vnx4si, 1) \
+  T (int32_t, vnx4si, 3) \
+  T (int16_t, vnx8hi, 0) \
+  T (int16_t, vnx8hi, 2) \
+  T (int16_t, vnx8hi, 6) \
+  T (int8_t, vnx16qi, 0) \
+  T (int8_t, vnx16qi, 1) \
+  T (int8_t, vnx16qi, 7) \
+  T (int8_t, vnx16qi, 11) \
+  T (int8_t, vnx16qi, 15) \
+  T (float, vnx8sf, 0) \
+  T (float, vnx8sf, 1) \
+  T (float, vnx8sf, 3) \
+  T (float, vnx8sf, 4) \
+  T (float, vnx8sf, 7) \
+  T (double, vnx4df, 0) \
+  T (double, vnx4df, 1) \
+  T (double, vnx4df, 2) \
+  T (double, vnx4df, 3) \
+  T (int64_t, vnx4di, 0) \
+  T (int64_t, vnx4di, 1) \
+  T (int64_t, vnx4di, 2) \
+  T (int64_t, vnx4di, 3) \
+  T (int32_t, vnx8si, 0) \
+  T (int32_t, vnx8si, 1) \
+  T (int32_t, vnx8si, 3) \
+  T (int32_t, vnx8si, 4) \
+  T (int32_t, vnx8si, 7) \
+  T (int16_t, vnx16hi, 0) \
+  T (int16_t, vnx16hi, 1) \
+  T (int16_t, vnx16hi, 7) \
+  T (int16_t, vnx16hi, 8) \
+  T (int16_t, vnx16hi, 15) \
+  T (int8_t, vnx32qi, 0) \
+  T (int8_t, vnx32qi, 1) \
+  T (int8_t, vnx32qi, 15) \
+  T (int8_t, vnx32qi, 16) \
+  T (int8_t, vnx32qi, 31) \
+  T (float, vnx16sf, 0) \
+  T (float, vnx16sf, 2) \
+  T (float, vnx16sf, 6) \
+  T (float, vnx16sf, 8) \
+  T (float, vnx16sf, 14) \
+  T (double, vnx8df, 0) \
+  T (double, vnx8df, 2) \
+  T (double, vnx8df, 4) \
+  T (double, vnx8df, 6) \
+  T (int64_t, vnx8di, 0) \
+  T (int64_t, vnx8di, 2) \
+  T (int64_t, vnx8di, 4) \
+  T (int64_t, vnx8di, 6) \
+  T (int32_t, vnx16si, 0) \
+  T (int32_t, vnx16si, 2) \
+  T (int32_t, vnx16si, 6) \
+  T (int32_t, vnx16si, 8) \
+  T (int32_t, vnx16si, 14) \
+  T (int16_t, vnx32hi, 0) \
+  T (int16_t, vnx32hi, 2) \
+  T (int16_t, vnx32hi, 14) \
+  T (int16_t, vnx32hi, 16) \
+  T (int16_t, vnx32hi, 30) \
+  T (int8_t, vnx64qi, 0) \
+  T (int8_t, vnx64qi, 2) \
+  T (int8_t, vnx64qi, 30) \
+  T (int8_t, vnx64qi, 32) \
+  T (int8_t, vnx64qi, 63) \
+  T (float, vnx32sf, 0) \
+  T (float, vnx32sf, 3) \
+  T (float, vnx32sf, 12) \
+  T (float, vnx32sf, 17) \
+  T (float, vnx32sf, 14) \
+  T (double, vnx16df, 0) \
+  T (double, vnx16df, 4) \
+  T (double, vnx16df, 8) \
+  T (double, vnx16df, 12) \
+  T (int64_t, vnx16di, 0) \
+  T (int64_t, vnx16di, 4) \
+  T (int64_t, vnx16di, 8) \
+  T (int64_t, vnx16di, 12) \
+  T (int32_t, vnx32si, 0) \
+  T (int32_t, vnx32si, 4) \
+  T (int32_t, vnx32si, 12) \
+  T (int32_t, vnx32si, 16) \
+  T (int32_t, vnx32si, 28) \
+  T (int16_t, vnx64hi, 0) \
+  T (int16_t, vnx64hi, 4) \
+  T (int16_t, vnx64hi, 28) \
+  T (int16_t, vnx64hi, 32) \
+  T (int16_t, vnx64hi, 60) \
+  T (int8_t, vnx128qi, 0) \
+  T (int8_t, vnx128qi, 4) \
+  T (int8_t, vnx128qi, 30) \
+  T (int8_t, vnx128qi, 60) \
+  T (int8_t, vnx128qi, 64) \
+  T (int8_t, vnx128qi, 127) \
+
+int main ()
+{
+  RUN_ALL (RUN);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
new file mode 100644
index 00000000000..7acab5a6918
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx2di __attribute__((vector_size (16)));
+typedef int32_t vnx4si __attribute__((vector_size (16)));
+typedef int16_t vnx8hi __attribute__((vector_size (16)));
+typedef int8_t vnx16qi __attribute__((vector_size (16)));
+typedef double vnx2df __attribute__((vector_size (16)));
+typedef float vnx4sf __attribute__((vector_size (16)));
+
+#define VEC_SET(S,V,IDX) \
+  V \
+  __attribute__((noipa)) \
+  vec_set_##V##_##IDX (V v, S s) \
+  { \
+    v[IDX] = s; \
+    return v; \
+  }
+
+#define TEST_ALL1(T) \
+  T (int64_t, vnx2di, 0) \
+  T (int64_t, vnx2di, 1) \
+  T (int32_t, vnx4si, 0) \
+  T (int32_t, vnx4si, 1) \
+  T (int32_t, vnx4si, 3) \
+  T (int16_t, vnx8hi, 0) \
+  T (int16_t, vnx8hi, 2) \
+  T (int16_t, vnx8hi, 6) \
+  T (int8_t, vnx16qi, 0) \
+  T (int8_t, vnx16qi, 1) \
+  T (int8_t, vnx16qi, 7) \
+  T (int8_t, vnx16qi, 11) \
+  T (int8_t, vnx16qi, 15) \
+  T (float, vnx4sf, 0) \
+  T (float, vnx4sf, 1) \
+  T (float, vnx4sf, 3) \
+  T (double, vnx2df, 0) \
+  T (double, vnx2df, 1) \
+
+TEST_ALL1 (VEC_SET)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m1,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m1,\s*tu,\s*ma} 12 } } */
+
+/* { dg-final { scan-assembler-times {\tvmv.v.x} 9 } } */
+/* { dg-final { scan-assembler-times {\tvfmv.v.f} 3 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vi} 12 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.s.f} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmv.s.x} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
new file mode 100644
index 00000000000..6d29fc7354e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx4di __attribute__((vector_size (32)));
+typedef int32_t vnx8si __attribute__((vector_size (32)));
+typedef int16_t vnx16hi __attribute__((vector_size (32)));
+typedef int8_t vnx32qi __attribute__((vector_size (32)));
+typedef double vnx4df __attribute__((vector_size (32)));
+typedef float vnx8sf __attribute__((vector_size (32)));
+
+#define VEC_SET(S,V,IDX) \
+  V \
+  __attribute__((noipa)) \
+  vec_set_##V##_##IDX (V v, S s) \
+  { \
+    v[IDX] = s; \
+    return v; \
+  }
+
+#define TEST_ALL2(T) \
+  T (float, vnx8sf, 0) \
+  T (float, vnx8sf, 1) \
+  T (float, vnx8sf, 3) \
+  T (float, vnx8sf, 4) \
+  T (float, vnx8sf, 7) \
+  T (double, vnx4df, 0) \
+  T (double, vnx4df, 1) \
+  T (double, vnx4df, 2) \
+  T (double, vnx4df, 3) \
+  T (int64_t, vnx4di, 0) \
+  T (int64_t, vnx4di, 1) \
+  T (int64_t, vnx4di, 2) \
+  T (int64_t, vnx4di, 3) \
+  T (int32_t, vnx8si, 0) \
+  T (int32_t, vnx8si, 1) \
+  T (int32_t, vnx8si, 3) \
+  T (int32_t, vnx8si, 4) \
+  T (int32_t, vnx8si, 7) \
+  T (int16_t, vnx16hi, 0) \
+  T (int16_t, vnx16hi, 1) \
+  T (int16_t, vnx16hi, 7) \
+  T (int16_t, vnx16hi, 8) \
+  T (int16_t, vnx16hi, 15) \
+  T (int8_t, vnx32qi, 0) \
+  T (int8_t, vnx32qi, 1) \
+  T (int8_t, vnx32qi, 15) \
+  T (int8_t, vnx32qi, 16) \
+  T (int8_t, vnx32qi, 31) \
+
+TEST_ALL2 (VEC_SET)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m2,\s*tu,\s*ma} 22 } } */
+
+/* { dg-final { scan-assembler-times {\tvmv.v.x} 15 } } */
+/* { dg-final { scan-assembler-times {\tvfmv.v.f} 7 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vi} 22 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.s.f} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmv.s.x} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
new file mode 100644
index 00000000000..a5df294f71b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx8di __attribute__((vector_size (64)));
+typedef int32_t vnx16si __attribute__((vector_size (64)));
+typedef int16_t vnx32hi __attribute__((vector_size (64)));
+typedef int8_t vnx64qi __attribute__((vector_size (64)));
+typedef double vnx8df __attribute__((vector_size (64)));
+typedef float vnx16sf __attribute__((vector_size (64)));
+
+#define VEC_SET(S,V,IDX) \
+  V \
+  __attribute__((noipa)) \
+  vec_set_##V##_##IDX (V v, S s) \
+  { \
+    v[IDX] = s; \
+    return v; \
+  }
+
+#define TEST_ALL3(T) \
+  T (float, vnx16sf, 0) \
+  T (float, vnx16sf, 2) \
+  T (float, vnx16sf, 6) \
+  T (float, vnx16sf, 8) \
+  T (float, vnx16sf, 14) \
+  T (double, vnx8df, 0) \
+  T (double, vnx8df, 2) \
+  T (double, vnx8df, 4) \
+  T (double, vnx8df, 6) \
+  T (int64_t, vnx8di, 0) \
+  T (int64_t, vnx8di, 2) \
+  T (int64_t, vnx8di, 4) \
+  T (int64_t, vnx8di, 6) \
+  T (int32_t, vnx16si, 0) \
+  T (int32_t, vnx16si, 2) \
+  T (int32_t, vnx16si, 6) \
+  T (int32_t, vnx16si, 8) \
+  T (int32_t, vnx16si, 14) \
+  T (int16_t, vnx32hi, 0) \
+  T (int16_t, vnx32hi, 2) \
+  T (int16_t, vnx32hi, 14) \
+  T (int16_t, vnx32hi, 16) \
+  T (int16_t, vnx32hi, 30) \
+  T (int8_t, vnx64qi, 0) \
+  T (int8_t, vnx64qi, 2) \
+  T (int8_t, vnx64qi, 30) \
+  T (int8_t, vnx64qi, 32) \
+  T (int8_t, vnx64qi, 63) \
+
+TEST_ALL3 (VEC_SET)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m4,\s*tu,\s*ma} 22 } } */
+
+/* { dg-final { scan-assembler-times {\tvmv.v.x} 15 } } */
+/* { dg-final { scan-assembler-times {\tvfmv.v.f} 7 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vi} 20 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vx} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.s.f} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmv.s.x} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
new file mode 100644
index 00000000000..4d14c7d6ee7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx16di __attribute__((vector_size (128)));
+typedef int32_t vnx32si __attribute__((vector_size (128)));
+typedef int16_t vnx64hi __attribute__((vector_size (128)));
+typedef int8_t vnx128qi __attribute__((vector_size (128)));
+typedef double vnx16df __attribute__((vector_size (128)));
+typedef float vnx32sf __attribute__((vector_size (128)));
+
+#define VEC_SET(S,V,IDX) \
+  V \
+  __attribute__((noipa)) \
+  vec_set_##V##_##IDX (V v, S s) \
+  { \
+    v[IDX] = s; \
+    return v; \
+  }
+
+#define TEST_ALL4(T) \
+  T (float, vnx32sf, 0) \
+  T (float, vnx32sf, 3) \
+  T (float, vnx32sf, 12) \
+  T (float, vnx32sf, 17) \
+  T (float, vnx32sf, 14) \
+  T (double, vnx16df, 0) \
+  T (double, vnx16df, 4) \
+  T (double, vnx16df, 8) \
+  T (double, vnx16df, 12) \
+  T (int64_t, vnx16di, 0) \
+  T (int64_t, vnx16di, 4) \
+  T (int64_t, vnx16di, 8) \
+  T (int64_t, vnx16di, 12) \
+  T (int32_t, vnx32si, 0) \
+  T (int32_t, vnx32si, 4) \
+  T (int32_t, vnx32si, 12) \
+  T (int32_t, vnx32si, 16) \
+  T (int32_t, vnx32si, 28) \
+  T (int16_t, vnx64hi, 0) \
+  T (int16_t, vnx64hi, 4) \
+  T (int16_t, vnx64hi, 28) \
+  T (int16_t, vnx64hi, 32) \
+  T (int16_t, vnx64hi, 60) \
+  T (int8_t, vnx128qi, 0) \
+  T (int8_t, vnx128qi, 4) \
+  T (int8_t, vnx128qi, 30) \
+  T (int8_t, vnx128qi, 60) \
+  T (int8_t, vnx128qi, 64) \
+  T (int8_t, vnx128qi, 127) \
+
+TEST_ALL4 (VEC_SET)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m8,\s*tu,\s*ma} 23 } } */
+
+/* { dg-final { scan-assembler-times {\tvmv.v.x} 16 } } */
+/* { dg-final { scan-assembler-times {\tvfmv.v.f} 7 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vi} 18 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vx} 5 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.s.f} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmv.s.x} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
new file mode 100644
index 00000000000..8500cc7b029
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
@@ -0,0 +1,230 @@
+/* { dg-do run } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <assert.h>
+
+#include "vec_set-1.c"
+#include "vec_set-2.c"
+#include "vec_set-3.c"
+#include "vec_set-4.c"
+
+#define CHECK(S, V, IDX) \
+void check_##V##_##IDX () \
+  { \
+    V v; \
+    for (int i = 0; i < sizeof (V) / sizeof (S); i++) \
+      v[i] = i; \
+    V res = vec_set_##V##_##IDX (v, 77); \
+    for (int i = 0; i < sizeof (V) / sizeof (S); i++) \
+      assert (res[i] == (i == IDX ? 77 : i)); \
+  }
+
+#define CHECK_ALL(T) \
+  T (int64_t, vnx2di, 0) \
+  T (int64_t, vnx2di, 1) \
+  T (int32_t, vnx4si, 0) \
+  T (int32_t, vnx4si, 1) \
+  T (int32_t, vnx4si, 3) \
+  T (int16_t, vnx8hi, 0) \
+  T (int16_t, vnx8hi, 2) \
+  T (int16_t, vnx8hi, 6) \
+  T (int8_t, vnx16qi, 0) \
+  T (int8_t, vnx16qi, 1) \
+  T (int8_t, vnx16qi, 7) \
+  T (int8_t, vnx16qi, 11) \
+  T (int8_t, vnx16qi, 15) \
+  T (float, vnx8sf, 0) \
+  T (float, vnx8sf, 1) \
+  T (float, vnx8sf, 3) \
+  T (float, vnx8sf, 4) \
+  T (float, vnx8sf, 7) \
+  T (double, vnx4df, 0) \
+  T (double, vnx4df, 1) \
+  T (double, vnx4df, 2) \
+  T (double, vnx4df, 3) \
+  T (int64_t, vnx4di, 0) \
+  T (int64_t, vnx4di, 1) \
+  T (int64_t, vnx4di, 2) \
+  T (int64_t, vnx4di, 3) \
+  T (int32_t, vnx8si, 0) \
+  T (int32_t, vnx8si, 1) \
+  T (int32_t, vnx8si, 3) \
+  T (int32_t, vnx8si, 4) \
+  T (int32_t, vnx8si, 7) \
+  T (int16_t, vnx16hi, 0) \
+  T (int16_t, vnx16hi, 1) \
+  T (int16_t, vnx16hi, 7) \
+  T (int16_t, vnx16hi, 8) \
+  T (int16_t, vnx16hi, 15) \
+  T (int8_t, vnx32qi, 0) \
+  T (int8_t, vnx32qi, 1) \
+  T (int8_t, vnx32qi, 15) \
+  T (int8_t, vnx32qi, 16) \
+  T (int8_t, vnx32qi, 31) \
+  T (float, vnx16sf, 0) \
+  T (float, vnx16sf, 2) \
+  T (float, vnx16sf, 6) \
+  T (float, vnx16sf, 8) \
+  T (float, vnx16sf, 14) \
+  T (double, vnx8df, 0) \
+  T (double, vnx8df, 2) \
+  T (double, vnx8df, 4) \
+  T (double, vnx8df, 6) \
+  T (int64_t, vnx8di, 0) \
+  T (int64_t, vnx8di, 2) \
+  T (int64_t, vnx8di, 4) \
+  T (int64_t, vnx8di, 6) \
+  T (int32_t, vnx16si, 0) \
+  T (int32_t, vnx16si, 2) \
+  T (int32_t, vnx16si, 6) \
+  T (int32_t, vnx16si, 8) \
+  T (int32_t, vnx16si, 14) \
+  T (int16_t, vnx32hi, 0) \
+  T (int16_t, vnx32hi, 2) \
+  T (int16_t, vnx32hi, 14) \
+  T (int16_t, vnx32hi, 16) \
+  T (int16_t, vnx32hi, 30) \
+  T (int8_t, vnx64qi, 0) \
+  T (int8_t, vnx64qi, 2) \
+  T (int8_t, vnx64qi, 30) \
+  T (int8_t, vnx64qi, 32) \
+  T (int8_t, vnx64qi, 63) \
+  T (float, vnx32sf, 0) \
+  T (float, vnx32sf, 3) \
+  T (float, vnx32sf, 12) \
+  T (float, vnx32sf, 17) \
+  T (float, vnx32sf, 14) \
+  T (double, vnx16df, 0) \
+  T (double, vnx16df, 4) \
+  T (double, vnx16df, 8) \
+  T (double, vnx16df, 12) \
+  T (int64_t, vnx16di, 0) \
+  T (int64_t, vnx16di, 4) \
+  T (int64_t, vnx16di, 8) \
+  T (int64_t, vnx16di, 12) \
+  T (int32_t, vnx32si, 0) \
+  T (int32_t, vnx32si, 4) \
+  T (int32_t, vnx32si, 12) \
+  T (int32_t, vnx32si, 16) \
+  T (int32_t, vnx32si, 28) \
+  T (int16_t, vnx64hi, 0) \
+  T (int16_t, vnx64hi, 4) \
+  T (int16_t, vnx64hi, 28) \
+  T (int16_t, vnx64hi, 32) \
+  T (int16_t, vnx64hi, 60) \
+  T (int8_t, vnx128qi, 0) \
+  T (int8_t, vnx128qi, 4) \
+  T (int8_t, vnx128qi, 30) \
+  T (int8_t, vnx128qi, 60) \
+  T (int8_t, vnx128qi, 64) \
+  T (int8_t, vnx128qi, 127) \
+
+CHECK_ALL (CHECK)
+
+#define RUN(S, V, IDX) \
+  check_##V##_##IDX ();
+
+#define RUN_ALL(T) \
+  T (int64_t, vnx2di, 0) \
+  T (int64_t, vnx2di, 1) \
+  T (int32_t, vnx4si, 0) \
+  T (int32_t, vnx4si, 1) \
+  T (int32_t, vnx4si, 3) \
+  T (int16_t, vnx8hi, 0) \
+  T (int16_t, vnx8hi, 2) \
+  T (int16_t, vnx8hi, 6) \
+  T (int8_t, vnx16qi, 0) \
+  T (int8_t, vnx16qi, 1) \
+  T (int8_t, vnx16qi, 7) \
+  T (int8_t, vnx16qi, 11) \
+  T (int8_t, vnx16qi, 15) \
+  T (float, vnx8sf, 0) \
+  T (float, vnx8sf, 1) \
+  T (float, vnx8sf, 3) \
+  T (float, vnx8sf, 4) \
+  T (float, vnx8sf, 7) \
+  T (double, vnx4df, 0) \
+  T (double, vnx4df, 1) \
+  T (double, vnx4df, 2) \
+  T (double, vnx4df, 3) \
+  T (int64_t, vnx4di, 0) \
+  T (int64_t, vnx4di, 1) \
+  T (int64_t, vnx4di, 2) \
+  T (int64_t, vnx4di, 3) \
+  T (int32_t, vnx8si, 0) \
+  T (int32_t, vnx8si, 1) \
+  T (int32_t, vnx8si, 3) \
+  T (int32_t, vnx8si, 4) \
+  T (int32_t, vnx8si, 7) \
+  T (int16_t, vnx16hi, 0) \
+  T (int16_t, vnx16hi, 1) \
+  T (int16_t, vnx16hi, 7) \
+  T (int16_t, vnx16hi, 8) \
+  T (int16_t, vnx16hi, 15) \
+  T (int8_t, vnx32qi, 0) \
+  T (int8_t, vnx32qi, 1) \
+  T (int8_t, vnx32qi, 15) \
+  T (int8_t, vnx32qi, 16) \
+  T (int8_t, vnx32qi, 31) \
+  T (float, vnx16sf, 0) \
+  T (float, vnx16sf, 2) \
+  T (float, vnx16sf, 6) \
+  T (float, vnx16sf, 8) \
+  T (float, vnx16sf, 14) \
+  T (double, vnx8df, 0) \
+  T (double, vnx8df, 2) \
+  T (double, vnx8df, 4) \
+  T (double, vnx8df, 6) \
+  T (int64_t, vnx8di, 0) \
+  T (int64_t, vnx8di, 2) \
+  T (int64_t, vnx8di, 4) \
+  T (int64_t, vnx8di, 6) \
+  T (int32_t, vnx16si, 0) \
+  T (int32_t, vnx16si, 2) \
+  T (int32_t, vnx16si, 6) \
+  T (int32_t, vnx16si, 8) \
+  T (int32_t, vnx16si, 14) \
+  T (int16_t, vnx32hi, 0) \
+  T (int16_t, vnx32hi, 2) \
+  T (int16_t, vnx32hi, 14) \
+  T (int16_t, vnx32hi, 16) \
+  T (int16_t, vnx32hi, 30) \
+  T (int8_t, vnx64qi, 0) \
+  T (int8_t, vnx64qi, 2) \
+  T (int8_t, vnx64qi, 30) \
+  T (int8_t, vnx64qi, 32) \
+  T (int8_t, vnx64qi, 63) \
+  T (float, vnx32sf, 0) \
+  T (float, vnx32sf, 3) \
+  T (float, vnx32sf, 12) \
+  T (float, vnx32sf, 17) \
+  T (float, vnx32sf, 14) \
+  T (double, vnx16df, 0) \
+  T (double, vnx16df, 4) \
+  T (double, vnx16df, 8) \
+  T (double, vnx16df, 12) \
+  T (int64_t, vnx16di, 0) \
+  T (int64_t, vnx16di, 4) \
+  T (int64_t, vnx16di, 8) \
+  T (int64_t, vnx16di, 12) \
+  T (int32_t, vnx32si, 0) \
+  T (int32_t, vnx32si, 4) \
+  T (int32_t, vnx32si, 12) \
+  T (int32_t, vnx32si, 16) \
+  T (int32_t, vnx32si, 28) \
+  T (int16_t, vnx64hi, 0) \
+  T (int16_t, vnx64hi, 4) \
+  T (int16_t, vnx64hi, 28) \
+  T (int16_t, vnx64hi, 32) \
+  T (int16_t, vnx64hi, 60) \
+  T (int8_t, vnx128qi, 0) \
+  T (int8_t, vnx128qi, 4) \
+  T (int8_t, vnx128qi, 30) \
+  T (int8_t, vnx128qi, 60) \
+  T (int8_t, vnx128qi, 64) \
+  T (int8_t, vnx128qi, 127) \
+
+int main ()
+{
+  RUN_ALL (RUN);
+}
-- 
2.40.1
  
Robin Dapp June 12, 2023, 3:26 p.m. UTC | #2
> +  /* If the slide offset fits into 5 bits we can
> +     use the immediate variant instead of the register variant.
> +     The expander's operand[2] is ops[3] here. */
> +  if (!satisfies_constraint_K (ops[3]))
> +    ops[3] = force_reg (Pmode, ops[3]);
> 
> I don't think we need this. maybe_expand_insn should be able to handle this.

Yes, removed it locally and retested, clean.

Regards
 Robin
  
Jeff Law June 12, 2023, 7:16 p.m. UTC | #3
On 6/12/23 08:55, Robin Dapp wrote:
> Hi,
> 
> this implements the vec_set and vec_extract patterns for integer and
> floating-point data types.  For vec_set we broadcast the insert value to
> a vector register and then perform a vslideup with effective length 1 to
> the requested index.
> 
> vec_extract is done by sliding down the requested element to index 0
> and v(f)mv.[xf].s to a scalar register.
> 
> The patch does not include vector-vector extraction which
> will be done at a later time.
> 
> The vec_set tests required a vector calling convention/ABI because
> a vector is being returned.  I'm currently experimenting with adding
> preliminary vector ABI support locally and still finishing some tests
> after discussing with Juzhe.  Consequently, I would not push this
> before ABI support is upstream.
I'm not sure how fast the vector ABI stuff is going to move.  So I'd be 
a bit leery of tying this work to the ABI effort, particularly if it's 
just to test.

Could you use asms to force vec_set/vec_extract to cross register files?

Jeff
  
钟居哲 June 13, 2023, 6:49 a.m. UTC | #4
I suggest we implement vector calling convention even though it is not ratified yet.
We can allow calling convention to be enabled only when --param=riscv-autovec-preference=fixed-vlmax.
We have such issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110119 
if we don't have calling convention for fixed-vlmax.



juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-06-13 03:16
To: Robin Dapp; gcc-patches; palmer; Kito Cheng; juzhe.zhong@rivai.ai
Subject: Re: [PATCH] RISC-V: Implement vec_set and vec_extract.
 
 
On 6/12/23 08:55, Robin Dapp wrote:
> Hi,
> 
> this implements the vec_set and vec_extract patterns for integer and
> floating-point data types.  For vec_set we broadcast the insert value to
> a vector register and then perform a vslideup with effective length 1 to
> the requested index.
> 
> vec_extract is done by sliding down the requested element to index 0
> and v(f)mv.[xf].s to a scalar register.
> 
> The patch does not include vector-vector extraction which
> will be done at a later time.
> 
> The vec_set tests required a vector calling convention/ABI because
> a vector is being returned.  I'm currently experimenting with adding
> preliminary vector ABI support locally and still finishing some tests
> after discussing with Juzhe.  Consequently, I would not push this
> before ABI support is upstream.
I'm not sure how fast the vector ABI stuff is going to move.  So I'd be 
a bit leery of tying this work to the ABI effort, particularly if it's 
just to test.
 
Could you use asms to force vec_set/vec_extract to cross register files?
 
Jeff
  
Robin Dapp June 13, 2023, 6:50 a.m. UTC | #5
> I suggest we implement vector calling convention even though it is not ratified yet.
> We can allow calling convention to be enabled only when --param=riscv-autovec-preference=fixed-vlmax.
> We have such issue:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110119 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110119> 
> if we don't have calling convention for fixed-vlmax.

Let's discuss this in the patchwork sync meeting later.

Regards
 Robin
  
Jeff Law June 13, 2023, 2:10 p.m. UTC | #6
On 6/13/23 00:50, Robin Dapp wrote:
>> I suggest we implement vector calling convention even though it is not ratified yet.
>> We can allow calling convention to be enabled only when --param=riscv-autovec-preference=fixed-vlmax.
>> We have such issue:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110119 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110119>
>> if we don't have calling convention for fixed-vlmax.
> 
> Let's discuss this in the patchwork sync meeting later.
In fact I'd ask y'all start with this since my contribution would be 
minimal and I'll be in the car for the first ~30 minutes.

jeff
  

Patch

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b7070099f29..9cfa48f94b5 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -640,3 +640,82 @@  (define_expand "select_vl<mode>"
   riscv_vector::expand_select_vl (operands);
   DONE;
 })
+
+;; -------------------------------------------------------------------------
+;; ---- [INT,FP] Insert a vector element.
+;; -------------------------------------------------------------------------
+
+(define_expand "vec_set<mode>"
+  [(match_operand:V	0 "register_operand")
+   (match_operand:<VEL> 1 "register_operand")
+   (match_operand	2 "immediate_operand")]
+  "TARGET_VECTOR"
+{
+  /* If we set the first element, emit an v(f)mv.s.[xf].  */
+  if (operands[2] == const0_rtx)
+    {
+      rtx ops[] = {operands[0], riscv_vector::gen_scalar_move_mask (<VM>mode),
+		   RVV_VUNDEF (<MODE>mode), operands[1]};
+      riscv_vector::emit_scalar_move_insn
+	  (code_for_pred_broadcast (<MODE>mode), ops);
+    }
+  else
+    {
+      /* Move the desired value into a vector register and insert
+	 it at the proper position using vslideup with an
+	 "effective length" of 1 i.e. a VL 1 past the offset.  */
+
+      /* Slide offset = element index.  */
+      int offset = INTVAL (operands[2]);
+
+      /* Only insert one element, i.e. VL = offset + 1.  */
+      rtx length = gen_reg_rtx (Pmode);
+      emit_move_insn (length, GEN_INT (offset + 1));
+
+      /* Move operands[1] into a vector register via vmv.v.x using the same
+	 VL we need for the slide.  */
+      rtx tmp = gen_reg_rtx (<MODE>mode);
+      rtx ops1[] = {tmp, operands[1]};
+      riscv_vector::emit_nonvlmax_integer_move_insn
+	(code_for_pred_broadcast (<MODE>mode), ops1, length);
+
+      /* Slide exactly one element up leaving the tail elements
+	 unchanged.  */
+      rtx ops2[] = {operands[0], operands[0], tmp, operands[2]};
+      riscv_vector::emit_nonvlmax_slide_tu_insn
+	(code_for_pred_slide (UNSPEC_VSLIDEUP, <MODE>mode), ops2, length);
+    }
+  DONE;
+})
+
+;; -------------------------------------------------------------------------
+;; ---- [INT,FP] Extract a vector element.
+;; -------------------------------------------------------------------------
+(define_expand "vec_extract<mode><vel>"
+  [(set (match_operand:<VEL>	  0 "register_operand")
+     (vec_select:<VEL>
+       (match_operand:V		  1 "register_operand")
+       (parallel
+	 [(match_operand	  2 "nonmemory_operand")])))]
+  "TARGET_VECTOR"
+{
+  /* Element extraction can be done by sliding down the requested element
+     to index 0 and then v(f)mv.[xf].s it to a scalar register.  */
+
+  /* When extracting any other than the first element we need to slide
+     it down.  */
+  rtx tmp = NULL_RTX;
+  if (operands[2] != const0_rtx)
+    {
+      /* Emit the slide down to index 0 in a new vector.  */
+      tmp = gen_reg_rtx (<MODE>mode);
+      rtx ops[] = {tmp, RVV_VUNDEF (<MODE>mode), operands[1], operands[2]};
+      riscv_vector::emit_vlmax_slide_insn
+	(code_for_pred_slide (UNSPEC_VSLIDEDOWN, <MODE>mode), ops);
+    }
+
+  /* Emit v(f)mv.[xf].s.  */
+  emit_insn (gen_pred_extract_first (<MODE>mode, operands[0],
+				     tmp ? tmp : operands[1]));
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6db3a46c682..7b327047ad5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -146,6 +146,7 @@  enum insn_type
   RVV_TERNOP = 5,
   RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
+  RVV_SLIDE_OP = 4,      /* Dest, VUNDEF, source and offset.  */
 };
 enum vlmul_type
 {
@@ -186,10 +187,14 @@  void emit_hard_vlmax_vsetvl (machine_mode, rtx);
 void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0);
 void emit_vlmax_ternary_insn (unsigned, int, rtx *, rtx = 0);
 void emit_nonvlmax_insn (unsigned, int, rtx *, rtx);
+void emit_vlmax_slide_insn (unsigned, rtx *);
+void emit_nonvlmax_slide_tu_insn (unsigned, rtx *, rtx);
 void emit_vlmax_merge_insn (unsigned, int, rtx *);
 void emit_vlmax_cmp_insn (unsigned, rtx *);
 void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
 void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
+void emit_scalar_move_insn (unsigned, rtx *);
+void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e1b85a5af91..0ecf338eba8 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -685,6 +685,64 @@  emit_nonvlmax_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
   e.emit_insn ((enum insn_code) icode, ops);
 }
 
+/* This function emits a {NONVLMAX, TAIL_UNDISTURBED, MASK_ANY} vsetvli
+   followed by a vslide insn (with real merge operand).  */
+void
+emit_vlmax_slide_insn (unsigned icode, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_SLIDE_OP,
+					  /* HAS_DEST_P */ true,
+					  /* FULLY_UNMASKED_P */ true,
+					  /* USE_REAL_MERGE_P */ true,
+					  /* HAS_AVL_P */ true,
+					  /* VLMAX_P */ true,
+					  dest_mode,
+					  mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.set_policy (MASK_ANY);
+
+  /* If the slide offset fits into 5 bits we can
+     use the immediate variant instead of the register variant.
+     The expander's operand[2] is ops[3] here. */
+  if (!satisfies_constraint_K (ops[3]))
+    ops[3] = force_reg (Pmode, ops[3]);
+
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* This function emits a {NONVLMAX, TAIL_UNDISTURBED, MASK_ANY} vsetvli
+   followed by a vslide insn (with real merge operand).  */
+void
+emit_nonvlmax_slide_tu_insn (unsigned icode, rtx *ops, rtx avl)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_SLIDE_OP,
+					  /* HAS_DEST_P */ true,
+					  /* FULLY_UNMASKED_P */ true,
+					  /* USE_REAL_MERGE_P */ true,
+					  /* HAS_AVL_P */ true,
+					  /* VLMAX_P */ false,
+					  dest_mode,
+					  mask_mode);
+
+  e.set_policy (TAIL_UNDISTURBED);
+  e.set_policy (MASK_ANY);
+  e.set_vl (avl);
+
+  /* If the slide offset fits into 5 bits we can
+     use the immediate variant instead of the register variant.
+     The expander's operand[2] is ops[3] here. */
+  if (!satisfies_constraint_K (ops[3]))
+    ops[3] = force_reg (Pmode, ops[3]);
+
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+
 /* This function emits merge instruction.  */
 void
 emit_vlmax_merge_insn (unsigned icode, int op_num, rtx *ops)
@@ -758,7 +816,7 @@  emit_vlmax_masked_mu_insn (unsigned icode, int op_num, rtx *ops)
 
 /* Emit vmv.s.x instruction.  */
 
-static void
+void
 emit_scalar_move_insn (unsigned icode, rtx *ops)
 {
   machine_mode dest_mode = GET_MODE (ops[0]);
@@ -788,7 +846,7 @@  emit_vlmax_integer_move_insn (unsigned icode, rtx *ops, rtx vl)
 
 /* Emit vmv.v.x instruction with nonvlmax.  */
 
-static void
+void
 emit_nonvlmax_integer_move_insn (unsigned icode, rtx *ops, rtx avl)
 {
   emit_nonvlmax_insn (icode, riscv_vector::RVV_UNOP, ops, avl);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
new file mode 100644
index 00000000000..b631fdb9cc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
@@ -0,0 +1,49 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx2di __attribute__((vector_size (16)));
+typedef int32_t vnx4si __attribute__((vector_size (16)));
+typedef int16_t vnx8hi __attribute__((vector_size (16)));
+typedef int8_t vnx16qi __attribute__((vector_size (16)));
+typedef double vnx2df __attribute__((vector_size (16)));
+typedef float vnx4sf __attribute__((vector_size (16)));
+
+
+#define VEC_EXTRACT(S,V,IDX)			\
+  S						\
+  __attribute__((noipa))			\
+  vec_extract_##V##_##IDX (V v)			\
+  {						\
+    return v[IDX];				\
+  }
+
+#define TEST_ALL1(T)				\
+  T (int64_t, vnx2di, 0)			\
+  T (int64_t, vnx2di, 1)			\
+  T (int32_t, vnx4si, 0)			\
+  T (int32_t, vnx4si, 1)			\
+  T (int32_t, vnx4si, 3)			\
+  T (int16_t, vnx8hi, 0)			\
+  T (int16_t, vnx8hi, 2)			\
+  T (int16_t, vnx8hi, 6)			\
+  T (int8_t, vnx16qi, 0)			\
+  T (int8_t, vnx16qi, 1)			\
+  T (int8_t, vnx16qi, 7)			\
+  T (int8_t, vnx16qi, 11)			\
+  T (int8_t, vnx16qi, 15)			\
+  T (float, vnx4sf, 0)				\
+  T (float, vnx4sf, 1)				\
+  T (float, vnx4sf, 3)				\
+  T (double, vnx2df, 0)				\
+  T (double, vnx2df, 1)				\
+
+TEST_ALL1 (VEC_EXTRACT)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9]+,\s*[a-z0-9]+,\s*e[1-8]+,\s*m1,\s*ta,\s*ma} 18 } } */
+
+/* { dg-final { scan-assembler-times {\tvslidedown.vi} 12 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.f.s} 5 } } */
+/* { dg-final { scan-assembler-times {\tvmv.x.s} 13 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
new file mode 100644
index 00000000000..0a93752bd4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
@@ -0,0 +1,58 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx4di __attribute__((vector_size (32)));
+typedef int32_t vnx8si __attribute__((vector_size (32)));
+typedef int16_t vnx16hi __attribute__((vector_size (32)));
+typedef int8_t vnx32qi __attribute__((vector_size (32)));
+typedef double vnx4df __attribute__((vector_size (32)));
+typedef float vnx8sf __attribute__((vector_size (32)));
+
+#define VEC_EXTRACT(S,V,IDX)			\
+  S						\
+  __attribute__((noipa))			\
+  vec_extract_##V##_##IDX (V v)			\
+  {						\
+    return v[IDX];				\
+  }
+
+#define TEST_ALL2(T)				\
+  T (float, vnx8sf, 0)				\
+  T (float, vnx8sf, 1)				\
+  T (float, vnx8sf, 3)				\
+  T (float, vnx8sf, 4)				\
+  T (float, vnx8sf, 7)				\
+  T (double, vnx4df, 0)				\
+  T (double, vnx4df, 1)				\
+  T (double, vnx4df, 2)				\
+  T (double, vnx4df, 3)				\
+  T (int64_t, vnx4di, 0)			\
+  T (int64_t, vnx4di, 1)			\
+  T (int64_t, vnx4di, 2)			\
+  T (int64_t, vnx4di, 3)			\
+  T (int32_t, vnx8si, 0)			\
+  T (int32_t, vnx8si, 1)			\
+  T (int32_t, vnx8si, 3)			\
+  T (int32_t, vnx8si, 4)			\
+  T (int32_t, vnx8si, 7)			\
+  T (int16_t, vnx16hi, 0)			\
+  T (int16_t, vnx16hi, 1)			\
+  T (int16_t, vnx16hi, 7)			\
+  T (int16_t, vnx16hi, 8)			\
+  T (int16_t, vnx16hi, 15)			\
+  T (int8_t, vnx32qi, 0)			\
+  T (int8_t, vnx32qi, 1)			\
+  T (int8_t, vnx32qi, 15)			\
+  T (int8_t, vnx32qi, 16)			\
+  T (int8_t, vnx32qi, 31)			\
+
+TEST_ALL2 (VEC_EXTRACT)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9]+,\s*[a-z0-9]+,\s*e[1-8]+,\s*m2,\s*ta,\s*ma} 28 } } */
+
+/* { dg-final { scan-assembler-times {\tvslidedown.vi} 22 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.f.s} 9 } } */
+/* { dg-final { scan-assembler-times {\tvmv.x.s} 19 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
new file mode 100644
index 00000000000..24c39168578
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
@@ -0,0 +1,59 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx8di __attribute__((vector_size (64)));
+typedef int32_t vnx16si __attribute__((vector_size (64)));
+typedef int16_t vnx32hi __attribute__((vector_size (64)));
+typedef int8_t vnx64qi __attribute__((vector_size (64)));
+typedef double vnx8df __attribute__((vector_size (64)));
+typedef float vnx16sf __attribute__((vector_size (64)));
+
+#define VEC_EXTRACT(S,V,IDX)			\
+  S						\
+  __attribute__((noipa))			\
+  vec_extract_##V##_##IDX (V v)			\
+  {						\
+    return v[IDX];				\
+  }
+
+#define TEST_ALL3(T)				\
+  T (float, vnx16sf, 0)				\
+  T (float, vnx16sf, 2)				\
+  T (float, vnx16sf, 6)				\
+  T (float, vnx16sf, 8)				\
+  T (float, vnx16sf, 14)			\
+  T (double, vnx8df, 0)				\
+  T (double, vnx8df, 2)				\
+  T (double, vnx8df, 4)				\
+  T (double, vnx8df, 6)				\
+  T (int64_t, vnx8di, 0)			\
+  T (int64_t, vnx8di, 2)			\
+  T (int64_t, vnx8di, 4)			\
+  T (int64_t, vnx8di, 6)			\
+  T (int32_t, vnx16si, 0)			\
+  T (int32_t, vnx16si, 2)			\
+  T (int32_t, vnx16si, 6)			\
+  T (int32_t, vnx16si, 8)			\
+  T (int32_t, vnx16si, 14)			\
+  T (int16_t, vnx32hi, 0)			\
+  T (int16_t, vnx32hi, 2)			\
+  T (int16_t, vnx32hi, 14)			\
+  T (int16_t, vnx32hi, 16)			\
+  T (int16_t, vnx32hi, 30)			\
+  T (int8_t, vnx64qi, 0)			\
+  T (int8_t, vnx64qi, 2)			\
+  T (int8_t, vnx64qi, 30)			\
+  T (int8_t, vnx64qi, 32)			\
+  T (int8_t, vnx64qi, 63)			\
+
+TEST_ALL3 (VEC_EXTRACT)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9]+,\s*[a-z0-9]+,\s*e[1-8]+,\s*m4,\s*ta,\s*ma} 28 } } */
+
+/* { dg-final { scan-assembler-times {\tvslidedown.vi} 20 } } */
+/* { dg-final { scan-assembler-times {\tvslidedown.vx} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.f.s} 9 } } */
+/* { dg-final { scan-assembler-times {\tvmv.x.s} 19 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
new file mode 100644
index 00000000000..e3d29cab628
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
@@ -0,0 +1,60 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx16di __attribute__((vector_size (128)));
+typedef int32_t vnx32si __attribute__((vector_size (128)));
+typedef int16_t vnx64hi __attribute__((vector_size (128)));
+typedef int8_t vnx128qi __attribute__((vector_size (128)));
+typedef double vnx16df __attribute__((vector_size (128)));
+typedef float vnx32sf __attribute__((vector_size (128)));
+
+#define VEC_EXTRACT(S,V,IDX)			\
+  S						\
+  __attribute__((noipa))			\
+  vec_extract_##V##_##IDX (V v)			\
+  {						\
+    return v[IDX];				\
+  }
+
+#define TEST_ALL4(T)				\
+  T (float, vnx32sf, 0)				\
+  T (float, vnx32sf, 3)				\
+  T (float, vnx32sf, 12)			\
+  T (float, vnx32sf, 17)			\
+  T (float, vnx32sf, 14)			\
+  T (double, vnx16df, 0)			\
+  T (double, vnx16df, 4)			\
+  T (double, vnx16df, 8)			\
+  T (double, vnx16df, 12)			\
+  T (int64_t, vnx16di, 0)			\
+  T (int64_t, vnx16di, 4)			\
+  T (int64_t, vnx16di, 8)			\
+  T (int64_t, vnx16di, 12)			\
+  T (int32_t, vnx32si, 0)			\
+  T (int32_t, vnx32si, 4)			\
+  T (int32_t, vnx32si, 12)			\
+  T (int32_t, vnx32si, 16)			\
+  T (int32_t, vnx32si, 28)			\
+  T (int16_t, vnx64hi, 0)			\
+  T (int16_t, vnx64hi, 4)			\
+  T (int16_t, vnx64hi, 28)			\
+  T (int16_t, vnx64hi, 32)			\
+  T (int16_t, vnx64hi, 60)			\
+  T (int8_t, vnx128qi, 0)			\
+  T (int8_t, vnx128qi, 4)			\
+  T (int8_t, vnx128qi, 30)			\
+  T (int8_t, vnx128qi, 60)			\
+  T (int8_t, vnx128qi, 64)			\
+  T (int8_t, vnx128qi, 127)			\
+
+TEST_ALL4 (VEC_EXTRACT)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-the slide offset fits into 5 bits we can
+     use the immediate variant instead of the register variant.
+     The expander's operand[2] is ops[3] here. */9]+,\s*[a-z0-9]+,\s*e[1-8]+,\s*m8,\s*ta,\s*ma} 29 } } */
+
+/* { dg-final { scan-assembler-times {\tvslidedown.vi} 18 } } */
+/* { dg-final { scan-assembler-times {\tvslidedown.vx} 5 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.f.s} 9 } } */
+/* { dg-final { scan-assembler-times {\tvmv.x.s} 20 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
new file mode 100644
index 00000000000..534eb19f613
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
@@ -0,0 +1,230 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdlib.h>
+#include <assert.h>
+
+#include "vec_extract-1.c"
+#include "vec_extract-2.c"
+#include "vec_extract-3.c"
+#include "vec_extract-4.c"
+
+#define CHECK(S, V, IDX)				\
+void check_##V##_##IDX ()				\
+  {							\
+    V v;						\
+    for (int i = 0; i < sizeof (V) / sizeof (S); i++)	\
+      v[i] = i;						\
+    S res = vec_extract_##V##_##IDX (v);		\
+    assert (res == v[IDX]);				\
+  }
+
+#define CHECK_ALL(T)					\
+  T (int64_t, vnx2di, 0)				\
+  T (int64_t, vnx2di, 1)				\
+  T (int32_t, vnx4si, 0)				\
+  T (int32_t, vnx4si, 1)				\
+  T (int32_t, vnx4si, 3)				\
+  T (int16_t, vnx8hi, 0)				\
+  T (int16_t, vnx8hi, 2)				\
+  T (int16_t, vnx8hi, 6)				\
+  T (int8_t, vnx16qi, 0)				\
+  T (int8_t, vnx16qi, 1)				\
+  T (int8_t, vnx16qi, 7)				\
+  T (int8_t, vnx16qi, 11)				\
+  T (int8_t, vnx16qi, 15)				\
+  T (float, vnx8sf, 0)					\
+  T (float, vnx8sf, 1)					\
+  T (float, vnx8sf, 3)					\
+  T (float, vnx8sf, 4)					\
+  T (float, vnx8sf, 7)					\
+  T (double, vnx4df, 0)					\
+  T (double, vnx4df, 1)					\
+  T (double, vnx4df, 2)					\
+  T (double, vnx4df, 3)					\
+  T (int64_t, vnx4di, 0)				\
+  T (int64_t, vnx4di, 1)				\
+  T (int64_t, vnx4di, 2)				\
+  T (int64_t, vnx4di, 3)				\
+  T (int32_t, vnx8si, 0)				\
+  T (int32_t, vnx8si, 1)				\
+  T (int32_t, vnx8si, 3)				\
+  T (int32_t, vnx8si, 4)				\
+  T (int32_t, vnx8si, 7)				\
+  T (int16_t, vnx16hi, 0)				\
+  T (int16_t, vnx16hi, 1)				\
+  T (int16_t, vnx16hi, 7)				\
+  T (int16_t, vnx16hi, 8)				\
+  T (int16_t, vnx16hi, 15)				\
+  T (int8_t, vnx32qi, 0)				\
+  T (int8_t, vnx32qi, 1)				\
+  T (int8_t, vnx32qi, 15)				\
+  T (int8_t, vnx32qi, 16)				\
+  T (int8_t, vnx32qi, 31)				\
+  T (float, vnx16sf, 0)					\
+  T (float, vnx16sf, 2)					\
+  T (float, vnx16sf, 6)					\
+  T (float, vnx16sf, 8)					\
+  T (float, vnx16sf, 14)				\
+  T (double, vnx8df, 0)					\
+  T (double, vnx8df, 2)					\
+  T (double, vnx8df, 4)					\
+  T (double, vnx8df, 6)					\
+  T (int64_t, vnx8di, 0)				\
+  T (int64_t, vnx8di, 2)				\
+  T (int64_t, vnx8di, 4)				\
+  T (int64_t, vnx8di, 6)				\
+  T (int32_t, vnx16si, 0)				\
+  T (int32_t, vnx16si, 2)				\
+  T (int32_t, vnx16si, 6)				\
+  T (int32_t, vnx16si, 8)				\
+  T (int32_t, vnx16si, 14)				\
+  T (int16_t, vnx32hi, 0)				\
+  T (int16_t, vnx32hi, 2)				\
+  T (int16_t, vnx32hi, 14)				\
+  T (int16_t, vnx32hi, 16)				\
+  T (int16_t, vnx32hi, 30)				\
+  T (int8_t, vnx64qi, 0)				\
+  T (int8_t, vnx64qi, 2)				\
+  T (int8_t, vnx64qi, 30)				\
+  T (int8_t, vnx64qi, 32)				\
+  T (int8_t, vnx64qi, 63)				\
+  T (float, vnx32sf, 0)					\
+  T (float, vnx32sf, 3)					\
+  T (float, vnx32sf, 12)				\
+  T (float, vnx32sf, 17)				\
+  T (float, vnx32sf, 14)				\
+  T (double, vnx16df, 0)				\
+  T (double, vnx16df, 4)				\
+  T (double, vnx16df, 8)				\
+  T (double, vnx16df, 12)				\
+  T (int64_t, vnx16di, 0)				\
+  T (int64_t, vnx16di, 4)				\
+  T (int64_t, vnx16di, 8)				\
+  T (int64_t, vnx16di, 12)				\
+  T (int32_t, vnx32si, 0)				\
+  T (int32_t, vnx32si, 4)				\
+  T (int32_t, vnx32si, 12)				\
+  T (int32_t, vnx32si, 16)				\
+  T (int32_t, vnx32si, 28)				\
+  T (int16_t, vnx64hi, 0)				\
+  T (int16_t, vnx64hi, 4)				\
+  T (int16_t, vnx64hi, 28)				\
+  T (int16_t, vnx64hi, 32)				\
+  T (int16_t, vnx64hi, 60)				\
+  T (int8_t, vnx128qi, 0)				\
+  T (int8_t, vnx128qi, 4)				\
+  T (int8_t, vnx128qi, 30)				\
+  T (int8_t, vnx128qi, 60)				\
+  T (int8_t, vnx128qi, 64)				\
+  T (int8_t, vnx128qi, 127)				\
+
+CHECK_ALL (CHECK)
+
+#define RUN(S, V, IDX)					\
+  check_##V##_##IDX ();
+
+#define RUN_ALL(T)					\
+  T (int64_t, vnx2di, 0)				\
+  T (int64_t, vnx2di, 1)				\
+  T (int32_t, vnx4si, 0)				\
+  T (int32_t, vnx4si, 1)				\
+  T (int32_t, vnx4si, 3)				\
+  T (int16_t, vnx8hi, 0)				\
+  T (int16_t, vnx8hi, 2)				\
+  T (int16_t, vnx8hi, 6)				\
+  T (int8_t, vnx16qi, 0)				\
+  T (int8_t, vnx16qi, 1)				\
+  T (int8_t, vnx16qi, 7)				\
+  T (int8_t, vnx16qi, 11)				\
+  T (int8_t, vnx16qi, 15)				\
+  T (float, vnx8sf, 0)					\
+  T (float, vnx8sf, 1)					\
+  T (float, vnx8sf, 3)					\
+  T (float, vnx8sf, 4)					\
+  T (float, vnx8sf, 7)					\
+  T (double, vnx4df, 0)					\
+  T (double, vnx4df, 1)					\
+  T (double, vnx4df, 2)					\
+  T (double, vnx4df, 3)					\
+  T (int64_t, vnx4di, 0)				\
+  T (int64_t, vnx4di, 1)				\
+  T (int64_t, vnx4di, 2)				\
+  T (int64_t, vnx4di, 3)				\
+  T (int32_t, vnx8si, 0)				\
+  T (int32_t, vnx8si, 1)				\
+  T (int32_t, vnx8si, 3)				\
+  T (int32_t, vnx8si, 4)				\
+  T (int32_t, vnx8si, 7)				\
+  T (int16_t, vnx16hi, 0)				\
+  T (int16_t, vnx16hi, 1)				\
+  T (int16_t, vnx16hi, 7)				\
+  T (int16_t, vnx16hi, 8)				\
+  T (int16_t, vnx16hi, 15)				\
+  T (int8_t, vnx32qi, 0)				\
+  T (int8_t, vnx32qi, 1)				\
+  T (int8_t, vnx32qi, 15)				\
+  T (int8_t, vnx32qi, 16)				\
+  T (int8_t, vnx32qi, 31)				\
+  T (float, vnx16sf, 0)					\
+  T (float, vnx16sf, 2)					\
+  T (float, vnx16sf, 6)					\
+  T (float, vnx16sf, 8)					\
+  T (float, vnx16sf, 14)				\
+  T (double, vnx8df, 0)					\
+  T (double, vnx8df, 2)					\
+  T (double, vnx8df, 4)					\
+  T (double, vnx8df, 6)					\
+  T (int64_t, vnx8di, 0)				\
+  T (int64_t, vnx8di, 2)				\
+  T (int64_t, vnx8di, 4)				\
+  T (int64_t, vnx8di, 6)				\
+  T (int32_t, vnx16si, 0)				\
+  T (int32_t, vnx16si, 2)				\
+  T (int32_t, vnx16si, 6)				\
+  T (int32_t, vnx16si, 8)				\
+  T (int32_t, vnx16si, 14)				\
+  T (int16_t, vnx32hi, 0)				\
+  T (int16_t, vnx32hi, 2)				\
+  T (int16_t, vnx32hi, 14)				\
+  T (int16_t, vnx32hi, 16)				\
+  T (int16_t, vnx32hi, 30)				\
+  T (int8_t, vnx64qi, 0)				\
+  T (int8_t, vnx64qi, 2)				\
+  T (int8_t, vnx64qi, 30)				\
+  T (int8_t, vnx64qi, 32)				\
+  T (int8_t, vnx64qi, 63)				\
+  T (float, vnx32sf, 0)					\
+  T (float, vnx32sf, 3)					\
+  T (float, vnx32sf, 12)				\
+  T (float, vnx32sf, 17)				\
+  T (float, vnx32sf, 14)				\
+  T (double, vnx16df, 0)				\
+  T (double, vnx16df, 4)				\
+  T (double, vnx16df, 8)				\
+  T (double, vnx16df, 12)				\
+  T (int64_t, vnx16di, 0)				\
+  T (int64_t, vnx16di, 4)				\
+  T (int64_t, vnx16di, 8)				\
+  T (int64_t, vnx16di, 12)				\
+  T (int32_t, vnx32si, 0)				\
+  T (int32_t, vnx32si, 4)				\
+  T (int32_t, vnx32si, 12)				\
+  T (int32_t, vnx32si, 16)				\
+  T (int32_t, vnx32si, 28)				\
+  T (int16_t, vnx64hi, 0)				\
+  T (int16_t, vnx64hi, 4)				\
+  T (int16_t, vnx64hi, 28)				\
+  T (int16_t, vnx64hi, 32)				\
+  T (int16_t, vnx64hi, 60)				\
+  T (int8_t, vnx128qi, 0)				\
+  T (int8_t, vnx128qi, 4)				\
+  T (int8_t, vnx128qi, 30)				\
+  T (int8_t, vnx128qi, 60)				\
+  T (int8_t, vnx128qi, 64)				\
+  T (int8_t, vnx128qi, 127)				\
+
+int main ()
+{
+  RUN_ALL (RUN);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
new file mode 100644
index 00000000000..7acab5a6918
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
@@ -0,0 +1,52 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx2di __attribute__((vector_size (16)));
+typedef int32_t vnx4si __attribute__((vector_size (16)));
+typedef int16_t vnx8hi __attribute__((vector_size (16)));
+typedef int8_t vnx16qi __attribute__((vector_size (16)));
+typedef double vnx2df __attribute__((vector_size (16)));
+typedef float vnx4sf __attribute__((vector_size (16)));
+
+#define VEC_SET(S,V,IDX)			\
+  V						\
+  __attribute__((noipa))			\
+  vec_set_##V##_##IDX (V v, S s)		\
+  {						\
+    v[IDX] = s;					\
+    return v;					\
+  }
+
+#define TEST_ALL1(T)				\
+  T (int64_t, vnx2di, 0)			\
+  T (int64_t, vnx2di, 1)			\
+  T (int32_t, vnx4si, 0)			\
+  T (int32_t, vnx4si, 1)			\
+  T (int32_t, vnx4si, 3)			\
+  T (int16_t, vnx8hi, 0)			\
+  T (int16_t, vnx8hi, 2)			\
+  T (int16_t, vnx8hi, 6)			\
+  T (int8_t, vnx16qi, 0)			\
+  T (int8_t, vnx16qi, 1)			\
+  T (int8_t, vnx16qi, 7)			\
+  T (int8_t, vnx16qi, 11)			\
+  T (int8_t, vnx16qi, 15)			\
+  T (float, vnx4sf, 0)				\
+  T (float, vnx4sf, 1)				\
+  T (float, vnx4sf, 3)				\
+  T (double, vnx2df, 0)				\
+  T (double, vnx2df, 1)				\
+
+TEST_ALL1 (VEC_SET)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m1,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m1,\s*tu,\s*ma} 12 } } */
+
+/* { dg-final { scan-assembler-times {\tvmv.v.x} 9 } } */
+/* { dg-final { scan-assembler-times {\tvfmv.v.f} 3 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vi} 12 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.s.f} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmv.s.x} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
new file mode 100644
index 00000000000..6d29fc7354e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
@@ -0,0 +1,62 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx4di __attribute__((vector_size (32)));
+typedef int32_t vnx8si __attribute__((vector_size (32)));
+typedef int16_t vnx16hi __attribute__((vector_size (32)));
+typedef int8_t vnx32qi __attribute__((vector_size (32)));
+typedef double vnx4df __attribute__((vector_size (32)));
+typedef float vnx8sf __attribute__((vector_size (32)));
+
+#define VEC_SET(S,V,IDX)			\
+  V						\
+  __attribute__((noipa))			\
+  vec_set_##V##_##IDX (V v, S s)		\
+  {						\
+    v[IDX] = s;					\
+    return v;					\
+  }
+
+#define TEST_ALL2(T)				\
+  T (float, vnx8sf, 0)				\
+  T (float, vnx8sf, 1)				\
+  T (float, vnx8sf, 3)				\
+  T (float, vnx8sf, 4)				\
+  T (float, vnx8sf, 7)				\
+  T (double, vnx4df, 0)				\
+  T (double, vnx4df, 1)				\
+  T (double, vnx4df, 2)				\
+  T (double, vnx4df, 3)				\
+  T (int64_t, vnx4di, 0)			\
+  T (int64_t, vnx4di, 1)			\
+  T (int64_t, vnx4di, 2)			\
+  T (int64_t, vnx4di, 3)			\
+  T (int32_t, vnx8si, 0)			\
+  T (int32_t, vnx8si, 1)			\
+  T (int32_t, vnx8si, 3)			\
+  T (int32_t, vnx8si, 4)			\
+  T (int32_t, vnx8si, 7)			\
+  T (int16_t, vnx16hi, 0)			\
+  T (int16_t, vnx16hi, 1)			\
+  T (int16_t, vnx16hi, 7)			\
+  T (int16_t, vnx16hi, 8)			\
+  T (int16_t, vnx16hi, 15)			\
+  T (int8_t, vnx32qi, 0)			\
+  T (int8_t, vnx32qi, 1)			\
+  T (int8_t, vnx32qi, 15)			\
+  T (int8_t, vnx32qi, 16)			\
+  T (int8_t, vnx32qi, 31)			\
+
+TEST_ALL2 (VEC_SET)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m2,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m2,\s*tu,\s*ma} 22 } } */
+
+/* { dg-final { scan-assembler-times {\tvmv.v.x} 15 } } */
+/* { dg-final { scan-assembler-times {\tvfmv.v.f} 7 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vi} 22 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.s.f} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmv.s.x} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
new file mode 100644
index 00000000000..a5df294f71b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
@@ -0,0 +1,63 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx8di __attribute__((vector_size (64)));
+typedef int32_t vnx16si __attribute__((vector_size (64)));
+typedef int16_t vnx32hi __attribute__((vector_size (64)));
+typedef int8_t vnx64qi __attribute__((vector_size (64)));
+typedef double vnx8df __attribute__((vector_size (64)));
+typedef float vnx16sf __attribute__((vector_size (64)));
+
+#define VEC_SET(S,V,IDX)			\
+  V						\
+  __attribute__((noipa))			\
+  vec_set_##V##_##IDX (V v, S s)		\
+  {						\
+    v[IDX] = s;					\
+    return v;					\
+  }
+
+#define TEST_ALL3(T)				\
+  T (float, vnx16sf, 0)				\
+  T (float, vnx16sf, 2)				\
+  T (float, vnx16sf, 6)				\
+  T (float, vnx16sf, 8)				\
+  T (float, vnx16sf, 14)			\
+  T (double, vnx8df, 0)				\
+  T (double, vnx8df, 2)				\
+  T (double, vnx8df, 4)				\
+  T (double, vnx8df, 6)				\
+  T (int64_t, vnx8di, 0)			\
+  T (int64_t, vnx8di, 2)			\
+  T (int64_t, vnx8di, 4)			\
+  T (int64_t, vnx8di, 6)			\
+  T (int32_t, vnx16si, 0)			\
+  T (int32_t, vnx16si, 2)			\
+  T (int32_t, vnx16si, 6)			\
+  T (int32_t, vnx16si, 8)			\
+  T (int32_t, vnx16si, 14)			\
+  T (int16_t, vnx32hi, 0)			\
+  T (int16_t, vnx32hi, 2)			\
+  T (int16_t, vnx32hi, 14)			\
+  T (int16_t, vnx32hi, 16)			\
+  T (int16_t, vnx32hi, 30)			\
+  T (int8_t, vnx64qi, 0)			\
+  T (int8_t, vnx64qi, 2)			\
+  T (int8_t, vnx64qi, 30)			\
+  T (int8_t, vnx64qi, 32)			\
+  T (int8_t, vnx64qi, 63)			\
+
+TEST_ALL3 (VEC_SET)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m4,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m4,\s*tu,\s*ma} 22 } } */
+
+/* { dg-final { scan-assembler-times {\tvmv.v.x} 15 } } */
+/* { dg-final { scan-assembler-times {\tvfmv.v.f} 7 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vi} 20 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vx} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.s.f} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmv.s.x} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
new file mode 100644
index 00000000000..4d14c7d6ee7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
@@ -0,0 +1,64 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <stdint-gcc.h>
+
+typedef int64_t vnx16di __attribute__((vector_size (128)));
+typedef int32_t vnx32si __attribute__((vector_size (128)));
+typedef int16_t vnx64hi __attribute__((vector_size (128)));
+typedef int8_t vnx128qi __attribute__((vector_size (128)));
+typedef double vnx16df __attribute__((vector_size (128)));
+typedef float vnx32sf __attribute__((vector_size (128)));
+
+#define VEC_SET(S,V,IDX)			\
+  V						\
+  __attribute__((noipa))			\
+  vec_set_##V##_##IDX (V v, S s)		\
+  {						\
+    v[IDX] = s;					\
+    return v;					\
+  }
+
+#define TEST_ALL4(T)				\
+  T (float, vnx32sf, 0)				\
+  T (float, vnx32sf, 3)				\
+  T (float, vnx32sf, 12)			\
+  T (float, vnx32sf, 17)			\
+  T (float, vnx32sf, 14)			\
+  T (double, vnx16df, 0)			\
+  T (double, vnx16df, 4)			\
+  T (double, vnx16df, 8)			\
+  T (double, vnx16df, 12)			\
+  T (int64_t, vnx16di, 0)			\
+  T (int64_t, vnx16di, 4)			\
+  T (int64_t, vnx16di, 8)			\
+  T (int64_t, vnx16di, 12)			\
+  T (int32_t, vnx32si, 0)			\
+  T (int32_t, vnx32si, 4)			\
+  T (int32_t, vnx32si, 12)			\
+  T (int32_t, vnx32si, 16)			\
+  T (int32_t, vnx32si, 28)			\
+  T (int16_t, vnx64hi, 0)			\
+  T (int16_t, vnx64hi, 4)			\
+  T (int16_t, vnx64hi, 28)			\
+  T (int16_t, vnx64hi, 32)			\
+  T (int16_t, vnx64hi, 60)			\
+  T (int8_t, vnx128qi, 0)			\
+  T (int8_t, vnx128qi, 4)			\
+  T (int8_t, vnx128qi, 30)			\
+  T (int8_t, vnx128qi, 60)			\
+  T (int8_t, vnx128qi, 64)			\
+  T (int8_t, vnx128qi, 127)			\
+
+TEST_ALL4 (VEC_SET)
+
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m8,\s*ta,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times {vset[i]*vli\s+[a-z0-9,]+,\s*e[1-8]+,\s*m8,\s*tu,\s*ma} 23 } } */
+
+/* { dg-final { scan-assembler-times {\tvmv.v.x} 16 } } */
+/* { dg-final { scan-assembler-times {\tvfmv.v.f} 7 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vi} 18 } } */
+/* { dg-final { scan-assembler-times {\tvslideup.vx} 5 } } */
+
+/* { dg-final { scan-assembler-times {\tvfmv.s.f} 2 } } */
+/* { dg-final { scan-assembler-times {\tvmv.s.x} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
new file mode 100644
index 00000000000..8500cc7b029
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
@@ -0,0 +1,230 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv" } */
+
+#include <assert.h>
+
+#include "vec_set-1.c"
+#include "vec_set-2.c"
+#include "vec_set-3.c"
+#include "vec_set-4.c"
+
+#define CHECK(S, V, IDX)				\
+void check_##V##_##IDX ()				\
+  {							\
+    V v;						\
+    for (int i = 0; i < sizeof (V) / sizeof (S); i++)	\
+      v[i] = i;						\
+    V res = vec_set_##V##_##IDX (v, 77);		\
+    for (int i = 0; i < sizeof (V) / sizeof (S); i++)	\
+      assert (res[i] == (i == IDX ? 77 : i));		\
+  }
+
+#define CHECK_ALL(T)					\
+  T (int64_t, vnx2di, 0)				\
+  T (int64_t, vnx2di, 1)				\
+  T (int32_t, vnx4si, 0)				\
+  T (int32_t, vnx4si, 1)				\
+  T (int32_t, vnx4si, 3)				\
+  T (int16_t, vnx8hi, 0)				\
+  T (int16_t, vnx8hi, 2)				\
+  T (int16_t, vnx8hi, 6)				\
+  T (int8_t, vnx16qi, 0)				\
+  T (int8_t, vnx16qi, 1)				\
+  T (int8_t, vnx16qi, 7)				\
+  T (int8_t, vnx16qi, 11)				\
+  T (int8_t, vnx16qi, 15)				\
+  T (float, vnx8sf, 0)					\
+  T (float, vnx8sf, 1)					\
+  T (float, vnx8sf, 3)					\
+  T (float, vnx8sf, 4)					\
+  T (float, vnx8sf, 7)					\
+  T (double, vnx4df, 0)					\
+  T (double, vnx4df, 1)					\
+  T (double, vnx4df, 2)					\
+  T (double, vnx4df, 3)					\
+  T (int64_t, vnx4di, 0)				\
+  T (int64_t, vnx4di, 1)				\
+  T (int64_t, vnx4di, 2)				\
+  T (int64_t, vnx4di, 3)				\
+  T (int32_t, vnx8si, 0)				\
+  T (int32_t, vnx8si, 1)				\
+  T (int32_t, vnx8si, 3)				\
+  T (int32_t, vnx8si, 4)				\
+  T (int32_t, vnx8si, 7)				\
+  T (int16_t, vnx16hi, 0)				\
+  T (int16_t, vnx16hi, 1)				\
+  T (int16_t, vnx16hi, 7)				\
+  T (int16_t, vnx16hi, 8)				\
+  T (int16_t, vnx16hi, 15)				\
+  T (int8_t, vnx32qi, 0)				\
+  T (int8_t, vnx32qi, 1)				\
+  T (int8_t, vnx32qi, 15)				\
+  T (int8_t, vnx32qi, 16)				\
+  T (int8_t, vnx32qi, 31)				\
+  T (float, vnx16sf, 0)					\
+  T (float, vnx16sf, 2)					\
+  T (float, vnx16sf, 6)					\
+  T (float, vnx16sf, 8)					\
+  T (float, vnx16sf, 14)				\
+  T (double, vnx8df, 0)					\
+  T (double, vnx8df, 2)					\
+  T (double, vnx8df, 4)					\
+  T (double, vnx8df, 6)					\
+  T (int64_t, vnx8di, 0)				\
+  T (int64_t, vnx8di, 2)				\
+  T (int64_t, vnx8di, 4)				\
+  T (int64_t, vnx8di, 6)				\
+  T (int32_t, vnx16si, 0)				\
+  T (int32_t, vnx16si, 2)				\
+  T (int32_t, vnx16si, 6)				\
+  T (int32_t, vnx16si, 8)				\
+  T (int32_t, vnx16si, 14)				\
+  T (int16_t, vnx32hi, 0)				\
+  T (int16_t, vnx32hi, 2)				\
+  T (int16_t, vnx32hi, 14)				\
+  T (int16_t, vnx32hi, 16)				\
+  T (int16_t, vnx32hi, 30)				\
+  T (int8_t, vnx64qi, 0)				\
+  T (int8_t, vnx64qi, 2)				\
+  T (int8_t, vnx64qi, 30)				\
+  T (int8_t, vnx64qi, 32)				\
+  T (int8_t, vnx64qi, 63)				\
+  T (float, vnx32sf, 0)					\
+  T (float, vnx32sf, 3)					\
+  T (float, vnx32sf, 12)				\
+  T (float, vnx32sf, 17)				\
+  T (float, vnx32sf, 14)				\
+  T (double, vnx16df, 0)				\
+  T (double, vnx16df, 4)				\
+  T (double, vnx16df, 8)				\
+  T (double, vnx16df, 12)				\
+  T (int64_t, vnx16di, 0)				\
+  T (int64_t, vnx16di, 4)				\
+  T (int64_t, vnx16di, 8)				\
+  T (int64_t, vnx16di, 12)				\
+  T (int32_t, vnx32si, 0)				\
+  T (int32_t, vnx32si, 4)				\
+  T (int32_t, vnx32si, 12)				\
+  T (int32_t, vnx32si, 16)				\
+  T (int32_t, vnx32si, 28)				\
+  T (int16_t, vnx64hi, 0)				\
+  T (int16_t, vnx64hi, 4)				\
+  T (int16_t, vnx64hi, 28)				\
+  T (int16_t, vnx64hi, 32)				\
+  T (int16_t, vnx64hi, 60)				\
+  T (int8_t, vnx128qi, 0)				\
+  T (int8_t, vnx128qi, 4)				\
+  T (int8_t, vnx128qi, 30)				\
+  T (int8_t, vnx128qi, 60)				\
+  T (int8_t, vnx128qi, 64)				\
+  T (int8_t, vnx128qi, 127)				\
+
+CHECK_ALL (CHECK)
+
+#define RUN(S, V, IDX)					\
+  check_##V##_##IDX ();
+
+#define RUN_ALL(T)					\
+  T (int64_t, vnx2di, 0)				\
+  T (int64_t, vnx2di, 1)				\
+  T (int32_t, vnx4si, 0)				\
+  T (int32_t, vnx4si, 1)				\
+  T (int32_t, vnx4si, 3)				\
+  T (int16_t, vnx8hi, 0)				\
+  T (int16_t, vnx8hi, 2)				\
+  T (int16_t, vnx8hi, 6)				\
+  T (int8_t, vnx16qi, 0)				\
+  T (int8_t, vnx16qi, 1)				\
+  T (int8_t, vnx16qi, 7)				\
+  T (int8_t, vnx16qi, 11)				\
+  T (int8_t, vnx16qi, 15)				\
+  T (float, vnx8sf, 0)					\
+  T (float, vnx8sf, 1)					\
+  T (float, vnx8sf, 3)					\
+  T (float, vnx8sf, 4)					\
+  T (float, vnx8sf, 7)					\
+  T (double, vnx4df, 0)					\
+  T (double, vnx4df, 1)					\
+  T (double, vnx4df, 2)					\
+  T (double, vnx4df, 3)					\
+  T (int64_t, vnx4di, 0)				\
+  T (int64_t, vnx4di, 1)				\
+  T (int64_t, vnx4di, 2)				\
+  T (int64_t, vnx4di, 3)				\
+  T (int32_t, vnx8si, 0)				\
+  T (int32_t, vnx8si, 1)				\
+  T (int32_t, vnx8si, 3)				\
+  T (int32_t, vnx8si, 4)				\
+  T (int32_t, vnx8si, 7)				\
+  T (int16_t, vnx16hi, 0)				\
+  T (int16_t, vnx16hi, 1)				\
+  T (int16_t, vnx16hi, 7)				\
+  T (int16_t, vnx16hi, 8)				\
+  T (int16_t, vnx16hi, 15)				\
+  T (int8_t, vnx32qi, 0)				\
+  T (int8_t, vnx32qi, 1)				\
+  T (int8_t, vnx32qi, 15)				\
+  T (int8_t, vnx32qi, 16)				\
+  T (int8_t, vnx32qi, 31)				\
+  T (float, vnx16sf, 0)					\
+  T (float, vnx16sf, 2)					\
+  T (float, vnx16sf, 6)					\
+  T (float, vnx16sf, 8)					\
+  T (float, vnx16sf, 14)				\
+  T (double, vnx8df, 0)					\
+  T (double, vnx8df, 2)					\
+  T (double, vnx8df, 4)					\
+  T (double, vnx8df, 6)					\
+  T (int64_t, vnx8di, 0)				\
+  T (int64_t, vnx8di, 2)				\
+  T (int64_t, vnx8di, 4)				\
+  T (int64_t, vnx8di, 6)				\
+  T (int32_t, vnx16si, 0)				\
+  T (int32_t, vnx16si, 2)				\
+  T (int32_t, vnx16si, 6)				\
+  T (int32_t, vnx16si, 8)				\
+  T (int32_t, vnx16si, 14)				\
+  T (int16_t, vnx32hi, 0)				\
+  T (int16_t, vnx32hi, 2)				\
+  T (int16_t, vnx32hi, 14)				\
+  T (int16_t, vnx32hi, 16)				\
+  T (int16_t, vnx32hi, 30)				\
+  T (int8_t, vnx64qi, 0)				\
+  T (int8_t, vnx64qi, 2)				\
+  T (int8_t, vnx64qi, 30)				\
+  T (int8_t, vnx64qi, 32)				\
+  T (int8_t, vnx64qi, 63)				\
+  T (float, vnx32sf, 0)					\
+  T (float, vnx32sf, 3)					\
+  T (float, vnx32sf, 12)				\
+  T (float, vnx32sf, 17)				\
+  T (float, vnx32sf, 14)				\
+  T (double, vnx16df, 0)				\
+  T (double, vnx16df, 4)				\
+  T (double, vnx16df, 8)				\
+  T (double, vnx16df, 12)				\
+  T (int64_t, vnx16di, 0)				\
+  T (int64_t, vnx16di, 4)				\
+  T (int64_t, vnx16di, 8)				\
+  T (int64_t, vnx16di, 12)				\
+  T (int32_t, vnx32si, 0)				\
+  T (int32_t, vnx32si, 4)				\
+  T (int32_t, vnx32si, 12)				\
+  T (int32_t, vnx32si, 16)				\
+  T (int32_t, vnx32si, 28)				\
+  T (int16_t, vnx64hi, 0)				\
+  T (int16_t, vnx64hi, 4)				\
+  T (int16_t, vnx64hi, 28)				\
+  T (int16_t, vnx64hi, 32)				\
+  T (int16_t, vnx64hi, 60)				\
+  T (int8_t, vnx128qi, 0)				\
+  T (int8_t, vnx128qi, 4)				\
+  T (int8_t, vnx128qi, 30)				\
+  T (int8_t, vnx128qi, 60)				\
+  T (int8_t, vnx128qi, 64)				\
+  T (int8_t, vnx128qi, 127)				\
+
+int main ()
+{
+  RUN_ALL (RUN);
+}