aarch64: suppress duplication into sub-64-bit AdvSIMD vectors [PR125538]
Checks
| Context |
Check |
Description |
| linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Test passed
|
| linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_simplebootstrap_build--master-aarch64-bootstrap |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_simplebootstrap_build--master-arm-bootstrap |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
success
|
Test passed
|
Commit Message
As we don't have RTL support for duplicating values into partial AdvSIMD
vector modes, any expression like (vec_duplicate:V4QI (reg:QI)) is going
to be malformed. The ICE reported in PR125538 occurred because in
r17-897-g4ddae2a94a032d we started generating such expressions when doing
a splat of the most common element at aarch64.cc:25876.
To address the problem, this patch introduces the
aarch64_gen_vec_duplicate () wrapper, which handles the case of a
sub-64-bit destination mode by duplicating the source value into 64 bits
and wrapping that into a SUBREG expression. The alternative here would be
to add some more vec_duplicate RTL patterns, but that would lead to some
code churn in aarch64-simd.md and break a long-standing invariant for no
obvious benefit.
I've added the reduced testcase from the PR, as well as appended some
similar tests to vec_init_5.c and vec-init-23.c.
Regtested and bootstrapped on aarch64-linux-gnu.
PR target/125538
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_gen_vec_duplicate):
Declare new function.
* config/aarch64/aarch64.cc (aarch64_gen_vec_duplicate): Define
it.
(aarch64_expand_vector_init_fallback): Use
aarch64_gen_vec_duplicate () instead of gen_vec_duplicate ().
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/vec_init_5.c: Add new 8/16-bit testcases.
* gcc.target/aarch64/vec-init-23.c: Likewise.
* gcc.target/aarch64/pr125538.c: New test.
---
gcc/config/aarch64/aarch64-protos.h | 1 +
gcc/config/aarch64/aarch64.cc | 36 ++++++++--
gcc/testsuite/gcc.target/aarch64/pr125538.c | 20 ++++++
.../gcc.target/aarch64/sve/vec_init_5.c | 69 ++++++++++++++++++
.../gcc.target/aarch64/vec-init-23.c | 71 ++++++++++++++++++-
5 files changed, 191 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr125538.c
Comments
On Wed, Jun 3, 2026 at 4:56 AM Artemiy Volkov <artemiy.volkov@arm.com> wrote:
>
> As we don't have RTL support for duplicating values into partial AdvSIMD
> vector modes, any expression like (vec_duplicate:V4QI (reg:QI)) is going
> to be malformed. The ICE reported in PR125538 occurred because in
> r17-897-g4ddae2a94a032d we started generating such expressions when doing
> a splat of the most common element at aarch64.cc:25876.
>
> To address the problem, this patch introduces the
> aarch64_gen_vec_duplicate () wrapper, which handles the case of a
> sub-64-bit destination mode by duplicating the source value into 64 bits
> and wrapping that into a SUBREG expression. The alternative here would be
> to add some more vec_duplicate RTL patterns, but that would lead to some
> code churn in aarch64-simd.md and break a long-standing invariant for no
> obvious benefit.
I am not so sure there on no obvious benefit.
Take:
```
#define vect4 __attribute__((vector_size(4)))
void f(vect4 signed char *a, signed char b)
{
*a = (vect4 signed char){b,b,b,b};
}
void f1(signed char *a, signed char b)
{
a[0] = b;
a[1] = b;
a[2] = b;
a[3] = b;
}
```
These could use a benifit of having a vec_dup of V4QI.
And it would be a good step forward of having V4QI/V2HI as not just a
container for initializations.
Thanks,
Andrea
>
> I've added the reduced testcase from the PR, as well as appended some
> similar tests to vec_init_5.c and vec-init-23.c.
>
> Regtested and bootstrapped on aarch64-linux-gnu.
>
> PR target/125538
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-protos.h (aarch64_gen_vec_duplicate):
> Declare new function.
> * config/aarch64/aarch64.cc (aarch64_gen_vec_duplicate): Define
> it.
> (aarch64_expand_vector_init_fallback): Use
> aarch64_gen_vec_duplicate () instead of gen_vec_duplicate ().
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/vec_init_5.c: Add new 8/16-bit testcases.
> * gcc.target/aarch64/vec-init-23.c: Likewise.
> * gcc.target/aarch64/pr125538.c: New test.
> ---
> gcc/config/aarch64/aarch64-protos.h | 1 +
> gcc/config/aarch64/aarch64.cc | 36 ++++++++--
> gcc/testsuite/gcc.target/aarch64/pr125538.c | 20 ++++++
> .../gcc.target/aarch64/sve/vec_init_5.c | 69 ++++++++++++++++++
> .../gcc.target/aarch64/vec-init-23.c | 71 ++++++++++++++++++-
> 5 files changed, 191 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr125538.c
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 513b556398f..3e679f6d36a 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1014,6 +1014,7 @@ rtx aarch64_mask_from_zextract_ops (rtx, rtx);
> rtx aarch64_return_addr_rtx (void);
> rtx aarch64_return_addr (int, rtx);
> rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
> +rtx aarch64_gen_vec_duplicate (machine_mode, rtx);
> rtx aarch64_gen_shareable_zero (machine_mode);
> bool aarch64_split_simd_shift_p (rtx_insn *);
> bool aarch64_simd_mem_operand_p (rtx);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5a859e12b1a..4c7173c162a 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -25039,6 +25039,30 @@ aarch64_gen_shareable_zero (machine_mode mode)
> return lowpart_subreg (mode, reg, GET_MODE (reg));
> }
>
> +/* Duplicate value X into a vector of type MODE. In case MODE is a
> + sub-64-bit mode and the result isn't a const_vector, duplicate into a
> + full register and take a SUBREG of that. */
> +
> +rtx
> +aarch64_gen_vec_duplicate (machine_mode mode, rtx x)
> +{
> + gcc_assert (VECTOR_MODE_P (mode));
> +
> + if (!aarch64_advsimd_sub_dword_mode_p (mode))
> + return gen_vec_duplicate (mode, x);
> +
> + if (valid_for_const_vector_p (mode, x))
> + return gen_const_vec_duplicate (mode, x);
> +
> + machine_mode dup_mode = mode_for_vector (GET_MODE_INNER (mode),
> + 64 / GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
> + .require ();
> +
> + rtx reg = gen_reg_rtx (dup_mode);
> + aarch64_emit_move (reg, gen_rtx_VEC_DUPLICATE (dup_mode, x));
> + return lowpart_subreg (mode, reg, dup_mode);
> +}
> +
> /* INSN is some form of extension or shift that can be split into a
> permutation involving a shared zero. Return true if we should
> perform such a split.
> @@ -25699,7 +25723,7 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
> 2 * GET_MODE_SIZE (narrow_mode)));
> if (rtx_equal_p (v0, v1))
> aarch64_emit_move (target,
> - gen_vec_duplicate (mode,
> + aarch64_gen_vec_duplicate (mode,
> force_reg (narrow_mode, v0)));
> else
> emit_insn (gen_aarch64_vec_concat (narrow_mode, target,
> @@ -25733,7 +25757,7 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
> if (all_same)
> {
> rtx x = force_reg (inner_mode, v0);
> - aarch64_emit_move (target, gen_vec_duplicate (mode, x));
> + aarch64_emit_move (target, aarch64_gen_vec_duplicate (mode, x));
> return;
> }
>
> @@ -25769,7 +25793,8 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
> RTVEC_ELT (new_vals, i) = XVECEXP (vals, 0, i);
> aarch64_expand_vector_init (new_target,
> gen_rtx_PARALLEL (subv_mode, new_vals));
> - aarch64_emit_move (target, gen_vec_duplicate (mode, new_target));
> + aarch64_emit_move (target,
> + aarch64_gen_vec_duplicate (mode, new_target));
> return;
> }
> }
> @@ -25862,7 +25887,8 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
> if (const_elem)
> {
> maxelement = const_elem_pos;
> - aarch64_emit_move (target, gen_vec_duplicate (mode, const_elem));
> + aarch64_emit_move (target,
> + aarch64_gen_vec_duplicate (mode, const_elem));
> }
> else
> {
> @@ -25873,7 +25899,7 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
> else
> {
> rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> - aarch64_emit_move (target, gen_vec_duplicate (mode, x));
> + aarch64_emit_move (target, aarch64_gen_vec_duplicate (mode, x));
> }
>
> /* Insert the rest. */
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr125538.c b/gcc/testsuite/gcc.target/aarch64/pr125538.c
> new file mode 100644
> index 00000000000..f0cdcd58dfb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr125538.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv9.5-a" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#define vect16 __attribute__((vector_size(16)))
> +
> +vect16 char fff(char _292, char _145, char _231)
> +{
> + return (vect16 char) {_292, _145, _145, _231, _292, _145, _145, _231, _292, _145, _145, _231, _292, _145, _145, _231};
> +}
> +
> +/*
> +** fff:
> +** bfi w0, w1, 8, 8
> +** bfi w1, w2, 8, 8
> +** dup v31\.4h, w0
> +** dup v0\.4h, w1
> +** zip1 v0\.16b, v31\.16b, v0\.16b
> +** ret
> +*/
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> index 99e04aac265..112a0eafc7a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> @@ -152,6 +152,27 @@
> ** ret
> */
>
> +/*
> +** test_int8_14:
> +** bfi w0, w1, 8, 8
> +** bfi w1, w2, 8, 8
> +** dup v31\.4h, w0
> +** dup v30\.4h, w1
> +** zip1 v31\.16b, v31\.16b, v30\.16b
> +** dup z0\.q, z31.q\[0\]
> +** ret
> +*/
> +
> +/*
> +** test_int8_15:
> +** bfi w0, w2, 8, 8
> +** dup v30\.8b, w1
> +** dup v31\.4h, w0
> +** zip1 v31\.16b, v31\.16b, v30\.16b
> +** dup z0\.q, z31\.q\[0\]
> +** ret
> +*/
> +
> /*
> ** test_float16_1:
> ** fcvt h0, s0
> @@ -236,6 +257,33 @@
> ** ret
> */
>
> +/*
> +** test_float16_9:
> +** fcvt h1, s1
> +** fcvt h2, s2
> +** fcvt h0, s0
> +** uzp1 v0\.4h, v0\.4h, v1\.4h
> +** uzp1 v1\.4h, v1\.4h, v2\.4h
> +** dup v0\.2s, v0\.s\[0\]
> +** dup v1\.2s, v1\.s\[0\]
> +** zip1 v0\.8h, v0\.8h, v1\.8h
> +** dup z0\.q, z0.q\[0\]
> +** ret
> +*/
> +
> +/*
> +** test_float16_10:
> +** fcvt h2, s2
> +** fcvt h0, s0
> +** fcvt h1, s1
> +** uzp1 v0\.4h, v0\.4h, v2\.4h
> +** dup v1\.4h, v1\.h\[0\]
> +** dup v0\.2s, v0\.s\[0\]
> +** zip1 v0\.8h, v0\.8h, v1\.8h
> +** dup z0\.q, z0.q\[0\]
> +** ret
> +*/
> +
> /*
> ** test_int16_1:
> ** mov z0\.h, w0
> @@ -310,6 +358,27 @@
> ** ret
> */
>
> +/*
> +** test_int16_9:
> +** bfi w0, w1, 16, 16
> +** bfi w1, w2, 16, 16
> +** dup v31\.2s, w0
> +** dup v30\.2s, w1
> +** zip1 v31\.8h, v31\.8h, v30\.8h
> +** dup z0\.q, z31\.q\[0\]
> +** ret
> +*/
> +
> +/*
> +** test_int16_10:
> +** bfi w0, w2, 16, 16
> +** dup v30\.4h, w1
> +** dup v31\.2s, w0
> +** zip1 v31\.8h, v31\.8h, v30\.8h
> +** dup z0\.q, z31\.q\[0\]
> +** ret
> +*/
> +
> /*
> ** test_float32_1:
> ** mov z0\.s, s0
> diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-23.c b/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> index 8c154f3680d..4721b068366 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> @@ -41,7 +41,11 @@
> TESTCASE (TYPE, ETYPE, T, 8, 12, x16, x0, x1, 0, 1, x2, x3, 2, 3,\
> x0, x1, 0, 1, x2, x3, 2, 3)\
> TESTCASE (TYPE, ETYPE, T, 8, 13, x16, 0, 1, x0, x1, 2, 3, x2, x3,\
> - 0, 1, x0, x1, 2, 3, x2, x3)
> + 0, 1, x0, x1, 2, 3, x2, x3) \
> + TESTCASE (TYPE, ETYPE, T, 8, 14, x16, x0, x1, x1, x2, x0, x1, x1, x2,\
> + x0, x1, x1, x2, x0, x1, x1, x2) \
> + TESTCASE (TYPE, ETYPE, T, 8, 15, x16, x0, x1, x2, x1, x0, x1, x2, x1,\
> + x0, x1, x2, x1, x0, x1, x2, x1)
>
> #define TEST_16(TYPE, ETYPE, T)\
> TESTCASE (TYPE, ETYPE, T, 16, 1, x8, x0, x0, x0, x0, x0, x0, x0, x0)\
> @@ -52,6 +56,8 @@
> TESTCASE (TYPE, ETYPE, T, 16, 6, x8, x0, x1, 0, 1, x0, x1, 0, 1)\
> TESTCASE (TYPE, ETYPE, T, 16, 7, x8, 0, 1, x0, x1, 0, 1, x0, x1)\
> TESTCASE (TYPE, ETYPE, T, 16, 8, x8, 0, x0, 1, x1, 0, x0, 1, x1)\
> + TESTCASE (TYPE, ETYPE, T, 16, 9, x8, x0, x1, x1, x2, x0, x1, x1, x2)\
> + TESTCASE (TYPE, ETYPE, T, 16, 10, x8, x0, x1, x2, x1, x0, x1, x2, x1)
>
> #define TEST_32(TYPE, ETYPE, T)\
> TESTCASE (TYPE, ETYPE, T, 32, 1, x4, x0, x0, x0, x0)\
> @@ -205,6 +211,25 @@ TEST_64(int, int64_t, s)
> ** ret
> */
>
> +/*
> +** test_int8_14:
> +** bfi w0, w1, 8, 8
> +** bfi w1, w2, 8, 8
> +** dup v31\.4h, w0
> +** dup v0\.4h, w1
> +** zip1 v0\.16b, v31\.16b, v0\.16b
> +** ret
> +*/
> +
> +/*
> +** test_int8_15:
> +** bfi w0, w2, 8, 8
> +** dup v0.8b, w1
> +** dup v31.4h, w0
> +** zip1 v0.16b, v31.16b, v0.16b
> +** ret
> +*/
> +
> /*
> ** test_float16_1:
> ** fcvt h0, s0
> @@ -286,6 +311,31 @@ TEST_64(int, int64_t, s)
> ** ret
> */
>
> +/*
> +** test_float16_9:
> +** fcvt h1, s1
> +** fcvt h2, s2
> +** fcvt h0, s0
> +** uzp1 v0\.4h, v0\.4h, v1\.4h
> +** uzp1 v1\.4h, v1\.4h, v2\.4h
> +** dup v0\.2s, v0\.s\[0\]
> +** dup v1\.2s, v1\.s\[0\]
> +** zip1 v0\.8h, v0\.8h, v1\.8h
> +** ret
> +*/
> +
> +/*
> +** test_float16_10:
> +** fcvt h2, s2
> +** fcvt h0, s0
> +** fcvt h1, s1
> +** uzp1 v0\.4h, v0\.4h, v2\.4h
> +** dup v1\.4h, v1\.h\[0\]
> +** dup v0\.2s, v0\.s\[0\]
> +** zip1 v0\.8h, v0\.8h, v1\.8h
> +** ret
> +*/
> +
> /*
> ** test_int16_1:
> ** dup v0\.8h, w0
> @@ -356,6 +406,25 @@ TEST_64(int, int64_t, s)
> ** ret
> */
>
> +/*
> +** test_int16_9:
> +** bfi w0, w1, 16, 16
> +** bfi w1, w2, 16, 16
> +** dup v31\.2s, w0
> +** dup v0\.2s, w1
> +** zip1 v0\.8h, v31\.8h, v0\.8h
> +** ret
> +*/
> +
> +/*
> +** test_int16_10:
> +** bfi w0, w2, 16, 16
> +** dup v0\.4h, w1
> +** dup v31\.2s, w0
> +** zip1 v0\.8h, v31\.8h, v0\.8h
> +** ret
> +*/
> +
> /*
> ** test_float32_1:
> ** dup v0\.4s, v0\.s\[0\]
> --
> 2.43.0
>
> -----Original Message-----
> From: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
> Sent: 03 June 2026 20:58
> To: Artemiy Volkov <Artemiy.Volkov@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Tamar Christina <Tamar.Christina@arm.com>;
> Wilco Dijkstra <Wilco.Dijkstra@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; ktkachov@nvidia.com; Alice Carlotti
> <Alice.Carlotti@arm.com>; Alex Coplan <Alex.Coplan@arm.com>
> Subject: Re: [PATCH] aarch64: suppress duplication into sub-64-bit AdvSIMD
> vectors [PR125538]
>
> On Wed, Jun 3, 2026 at 4:56 AM Artemiy Volkov <artemiy.volkov@arm.com>
> wrote:
> >
> > As we don't have RTL support for duplicating values into partial AdvSIMD
> > vector modes, any expression like (vec_duplicate:V4QI (reg:QI)) is going
> > to be malformed. The ICE reported in PR125538 occurred because in
> > r17-897-g4ddae2a94a032d we started generating such expressions when
> doing
> > a splat of the most common element at aarch64.cc:25876.
> >
> > To address the problem, this patch introduces the
> > aarch64_gen_vec_duplicate () wrapper, which handles the case of a
> > sub-64-bit destination mode by duplicating the source value into 64 bits
> > and wrapping that into a SUBREG expression. The alternative here would be
> > to add some more vec_duplicate RTL patterns, but that would lead to some
> > code churn in aarch64-simd.md and break a long-standing invariant for no
> > obvious benefit.
>
> I am not so sure there on no obvious benefit.
> Take:
> ```
> #define vect4 __attribute__((vector_size(4)))
>
> void f(vect4 signed char *a, signed char b)
> {
> *a = (vect4 signed char){b,b,b,b};
> }
> void f1(signed char *a, signed char b)
> {
> a[0] = b;
> a[1] = b;
> a[2] = b;
> a[3] = b;
> }
> ```
> These could use a benifit of having a vec_dup of V4QI.
> And it would be a good step forward of having V4QI/V2HI as not just a
> container for initializations.
I was busy writing an elaborate response to this but Pinski beat me to it.
I would also indeed rather have you just add the RTL patterns for vec_duplate
of the partial modes. I don't think they really require that much churn.
Whether you take a subreg or just use a 128-bit/64-bit vector dup the semantic
remains the same, as long as we access the register in their intended modes
the result is sound.
So I think you should really just define the vec_duplicates and you only really need
2 new RTL patterns.
Additionally I'd not want to lose that gen_vec_duplicate just works. Requiring a target
specific function is always annoying. It works better if we can just support the generic
abstractions.
Thanks,
Tamar
>
> Thanks,
> Andrea
>
>
>
> >
> > I've added the reduced testcase from the PR, as well as appended some
> > similar tests to vec_init_5.c and vec-init-23.c.
> >
> > Regtested and bootstrapped on aarch64-linux-gnu.
> >
> > PR target/125538
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-protos.h (aarch64_gen_vec_duplicate):
> > Declare new function.
> > * config/aarch64/aarch64.cc (aarch64_gen_vec_duplicate): Define
> > it.
> > (aarch64_expand_vector_init_fallback): Use
> > aarch64_gen_vec_duplicate () instead of gen_vec_duplicate ().
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/sve/vec_init_5.c: Add new 8/16-bit testcases.
> > * gcc.target/aarch64/vec-init-23.c: Likewise.
> > * gcc.target/aarch64/pr125538.c: New test.
> > ---
> > gcc/config/aarch64/aarch64-protos.h | 1 +
> > gcc/config/aarch64/aarch64.cc | 36 ++++++++--
> > gcc/testsuite/gcc.target/aarch64/pr125538.c | 20 ++++++
> > .../gcc.target/aarch64/sve/vec_init_5.c | 69 ++++++++++++++++++
> > .../gcc.target/aarch64/vec-init-23.c | 71 ++++++++++++++++++-
> > 5 files changed, 191 insertions(+), 6 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr125538.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> > index 513b556398f..3e679f6d36a 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -1014,6 +1014,7 @@ rtx aarch64_mask_from_zextract_ops (rtx, rtx);
> > rtx aarch64_return_addr_rtx (void);
> > rtx aarch64_return_addr (int, rtx);
> > rtx aarch64_simd_gen_const_vector_dup (machine_mode,
> HOST_WIDE_INT);
> > +rtx aarch64_gen_vec_duplicate (machine_mode, rtx);
> > rtx aarch64_gen_shareable_zero (machine_mode);
> > bool aarch64_split_simd_shift_p (rtx_insn *);
> > bool aarch64_simd_mem_operand_p (rtx);
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 5a859e12b1a..4c7173c162a 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -25039,6 +25039,30 @@ aarch64_gen_shareable_zero
> (machine_mode mode)
> > return lowpart_subreg (mode, reg, GET_MODE (reg));
> > }
> >
> > +/* Duplicate value X into a vector of type MODE. In case MODE is a
> > + sub-64-bit mode and the result isn't a const_vector, duplicate into a
> > + full register and take a SUBREG of that. */
> > +
> > +rtx
> > +aarch64_gen_vec_duplicate (machine_mode mode, rtx x)
> > +{
> > + gcc_assert (VECTOR_MODE_P (mode));
> > +
> > + if (!aarch64_advsimd_sub_dword_mode_p (mode))
> > + return gen_vec_duplicate (mode, x);
> > +
> > + if (valid_for_const_vector_p (mode, x))
> > + return gen_const_vec_duplicate (mode, x);
> > +
> > + machine_mode dup_mode = mode_for_vector (GET_MODE_INNER
> (mode),
> > + 64 / GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
> > + .require ();
> > +
> > + rtx reg = gen_reg_rtx (dup_mode);
> > + aarch64_emit_move (reg, gen_rtx_VEC_DUPLICATE (dup_mode, x));
> > + return lowpart_subreg (mode, reg, dup_mode);
> > +}
> > +
> > /* INSN is some form of extension or shift that can be split into a
> > permutation involving a shared zero. Return true if we should
> > perform such a split.
> > @@ -25699,7 +25723,7 @@ aarch64_expand_vector_init_fallback (rtx
> target, rtx vals)
> > 2 * GET_MODE_SIZE (narrow_mode)));
> > if (rtx_equal_p (v0, v1))
> > aarch64_emit_move (target,
> > - gen_vec_duplicate (mode,
> > + aarch64_gen_vec_duplicate (mode,
> > force_reg (narrow_mode, v0)));
> > else
> > emit_insn (gen_aarch64_vec_concat (narrow_mode, target,
> > @@ -25733,7 +25757,7 @@ aarch64_expand_vector_init_fallback (rtx
> target, rtx vals)
> > if (all_same)
> > {
> > rtx x = force_reg (inner_mode, v0);
> > - aarch64_emit_move (target, gen_vec_duplicate (mode, x));
> > + aarch64_emit_move (target, aarch64_gen_vec_duplicate (mode, x));
> > return;
> > }
> >
> > @@ -25769,7 +25793,8 @@ aarch64_expand_vector_init_fallback (rtx
> target, rtx vals)
> > RTVEC_ELT (new_vals, i) = XVECEXP (vals, 0, i);
> > aarch64_expand_vector_init (new_target,
> > gen_rtx_PARALLEL (subv_mode, new_vals));
> > - aarch64_emit_move (target, gen_vec_duplicate (mode, new_target));
> > + aarch64_emit_move (target,
> > + aarch64_gen_vec_duplicate (mode, new_target));
> > return;
> > }
> > }
> > @@ -25862,7 +25887,8 @@ aarch64_expand_vector_init_fallback (rtx
> target, rtx vals)
> > if (const_elem)
> > {
> > maxelement = const_elem_pos;
> > - aarch64_emit_move (target, gen_vec_duplicate (mode,
> const_elem));
> > + aarch64_emit_move (target,
> > + aarch64_gen_vec_duplicate (mode, const_elem));
> > }
> > else
> > {
> > @@ -25873,7 +25899,7 @@ aarch64_expand_vector_init_fallback (rtx
> target, rtx vals)
> > else
> > {
> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> > - aarch64_emit_move (target, gen_vec_duplicate (mode, x));
> > + aarch64_emit_move (target, aarch64_gen_vec_duplicate (mode, x));
> > }
> >
> > /* Insert the rest. */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr125538.c
> b/gcc/testsuite/gcc.target/aarch64/pr125538.c
> > new file mode 100644
> > index 00000000000..f0cdcd58dfb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr125538.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=armv9.5-a" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +#define vect16 __attribute__((vector_size(16)))
> > +
> > +vect16 char fff(char _292, char _145, char _231)
> > +{
> > + return (vect16 char) {_292, _145, _145, _231, _292, _145, _145, _231,
> _292, _145, _145, _231, _292, _145, _145, _231};
> > +}
> > +
> > +/*
> > +** fff:
> > +** bfi w0, w1, 8, 8
> > +** bfi w1, w2, 8, 8
> > +** dup v31\.4h, w0
> > +** dup v0\.4h, w1
> > +** zip1 v0\.16b, v31\.16b, v0\.16b
> > +** ret
> > +*/
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> > index 99e04aac265..112a0eafc7a 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> > @@ -152,6 +152,27 @@
> > ** ret
> > */
> >
> > +/*
> > +** test_int8_14:
> > +** bfi w0, w1, 8, 8
> > +** bfi w1, w2, 8, 8
> > +** dup v31\.4h, w0
> > +** dup v30\.4h, w1
> > +** zip1 v31\.16b, v31\.16b, v30\.16b
> > +** dup z0\.q, z31.q\[0\]
> > +** ret
> > +*/
> > +
> > +/*
> > +** test_int8_15:
> > +** bfi w0, w2, 8, 8
> > +** dup v30\.8b, w1
> > +** dup v31\.4h, w0
> > +** zip1 v31\.16b, v31\.16b, v30\.16b
> > +** dup z0\.q, z31\.q\[0\]
> > +** ret
> > +*/
> > +
> > /*
> > ** test_float16_1:
> > ** fcvt h0, s0
> > @@ -236,6 +257,33 @@
> > ** ret
> > */
> >
> > +/*
> > +** test_float16_9:
> > +** fcvt h1, s1
> > +** fcvt h2, s2
> > +** fcvt h0, s0
> > +** uzp1 v0\.4h, v0\.4h, v1\.4h
> > +** uzp1 v1\.4h, v1\.4h, v2\.4h
> > +** dup v0\.2s, v0\.s\[0\]
> > +** dup v1\.2s, v1\.s\[0\]
> > +** zip1 v0\.8h, v0\.8h, v1\.8h
> > +** dup z0\.q, z0.q\[0\]
> > +** ret
> > +*/
> > +
> > +/*
> > +** test_float16_10:
> > +** fcvt h2, s2
> > +** fcvt h0, s0
> > +** fcvt h1, s1
> > +** uzp1 v0\.4h, v0\.4h, v2\.4h
> > +** dup v1\.4h, v1\.h\[0\]
> > +** dup v0\.2s, v0\.s\[0\]
> > +** zip1 v0\.8h, v0\.8h, v1\.8h
> > +** dup z0\.q, z0.q\[0\]
> > +** ret
> > +*/
> > +
> > /*
> > ** test_int16_1:
> > ** mov z0\.h, w0
> > @@ -310,6 +358,27 @@
> > ** ret
> > */
> >
> > +/*
> > +** test_int16_9:
> > +** bfi w0, w1, 16, 16
> > +** bfi w1, w2, 16, 16
> > +** dup v31\.2s, w0
> > +** dup v30\.2s, w1
> > +** zip1 v31\.8h, v31\.8h, v30\.8h
> > +** dup z0\.q, z31\.q\[0\]
> > +** ret
> > +*/
> > +
> > +/*
> > +** test_int16_10:
> > +** bfi w0, w2, 16, 16
> > +** dup v30\.4h, w1
> > +** dup v31\.2s, w0
> > +** zip1 v31\.8h, v31\.8h, v30\.8h
> > +** dup z0\.q, z31\.q\[0\]
> > +** ret
> > +*/
> > +
> > /*
> > ** test_float32_1:
> > ** mov z0\.s, s0
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> b/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> > index 8c154f3680d..4721b068366 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> > @@ -41,7 +41,11 @@
> > TESTCASE (TYPE, ETYPE, T, 8, 12, x16, x0, x1, 0, 1, x2, x3, 2, 3,\
> > x0, x1, 0, 1, x2, x3, 2, 3)\
> > TESTCASE (TYPE, ETYPE, T, 8, 13, x16, 0, 1, x0, x1, 2, 3, x2, x3,\
> > - 0, 1, x0, x1, 2, 3, x2, x3)
> > + 0, 1, x0, x1, 2, 3, x2, x3) \
> > + TESTCASE (TYPE, ETYPE, T, 8, 14, x16, x0, x1, x1, x2, x0, x1, x1, x2,\
> > + x0, x1, x1, x2, x0, x1, x1, x2) \
> > + TESTCASE (TYPE, ETYPE, T, 8, 15, x16, x0, x1, x2, x1, x0, x1, x2, x1,\
> > + x0, x1, x2, x1, x0, x1, x2, x1)
> >
> > #define TEST_16(TYPE, ETYPE, T)\
> > TESTCASE (TYPE, ETYPE, T, 16, 1, x8, x0, x0, x0, x0, x0, x0, x0, x0)\
> > @@ -52,6 +56,8 @@
> > TESTCASE (TYPE, ETYPE, T, 16, 6, x8, x0, x1, 0, 1, x0, x1, 0, 1)\
> > TESTCASE (TYPE, ETYPE, T, 16, 7, x8, 0, 1, x0, x1, 0, 1, x0, x1)\
> > TESTCASE (TYPE, ETYPE, T, 16, 8, x8, 0, x0, 1, x1, 0, x0, 1, x1)\
> > + TESTCASE (TYPE, ETYPE, T, 16, 9, x8, x0, x1, x1, x2, x0, x1, x1, x2)\
> > + TESTCASE (TYPE, ETYPE, T, 16, 10, x8, x0, x1, x2, x1, x0, x1, x2, x1)
> >
> > #define TEST_32(TYPE, ETYPE, T)\
> > TESTCASE (TYPE, ETYPE, T, 32, 1, x4, x0, x0, x0, x0)\
> > @@ -205,6 +211,25 @@ TEST_64(int, int64_t, s)
> > ** ret
> > */
> >
> > +/*
> > +** test_int8_14:
> > +** bfi w0, w1, 8, 8
> > +** bfi w1, w2, 8, 8
> > +** dup v31\.4h, w0
> > +** dup v0\.4h, w1
> > +** zip1 v0\.16b, v31\.16b, v0\.16b
> > +** ret
> > +*/
> > +
> > +/*
> > +** test_int8_15:
> > +** bfi w0, w2, 8, 8
> > +** dup v0.8b, w1
> > +** dup v31.4h, w0
> > +** zip1 v0.16b, v31.16b, v0.16b
> > +** ret
> > +*/
> > +
> > /*
> > ** test_float16_1:
> > ** fcvt h0, s0
> > @@ -286,6 +311,31 @@ TEST_64(int, int64_t, s)
> > ** ret
> > */
> >
> > +/*
> > +** test_float16_9:
> > +** fcvt h1, s1
> > +** fcvt h2, s2
> > +** fcvt h0, s0
> > +** uzp1 v0\.4h, v0\.4h, v1\.4h
> > +** uzp1 v1\.4h, v1\.4h, v2\.4h
> > +** dup v0\.2s, v0\.s\[0\]
> > +** dup v1\.2s, v1\.s\[0\]
> > +** zip1 v0\.8h, v0\.8h, v1\.8h
> > +** ret
> > +*/
> > +
> > +/*
> > +** test_float16_10:
> > +** fcvt h2, s2
> > +** fcvt h0, s0
> > +** fcvt h1, s1
> > +** uzp1 v0\.4h, v0\.4h, v2\.4h
> > +** dup v1\.4h, v1\.h\[0\]
> > +** dup v0\.2s, v0\.s\[0\]
> > +** zip1 v0\.8h, v0\.8h, v1\.8h
> > +** ret
> > +*/
> > +
> > /*
> > ** test_int16_1:
> > ** dup v0\.8h, w0
> > @@ -356,6 +406,25 @@ TEST_64(int, int64_t, s)
> > ** ret
> > */
> >
> > +/*
> > +** test_int16_9:
> > +** bfi w0, w1, 16, 16
> > +** bfi w1, w2, 16, 16
> > +** dup v31\.2s, w0
> > +** dup v0\.2s, w1
> > +** zip1 v0\.8h, v31\.8h, v0\.8h
> > +** ret
> > +*/
> > +
> > +/*
> > +** test_int16_10:
> > +** bfi w0, w2, 16, 16
> > +** dup v0\.4h, w1
> > +** dup v31\.2s, w0
> > +** zip1 v0\.8h, v31\.8h, v0\.8h
> > +** ret
> > +*/
> > +
> > /*
> > ** test_float32_1:
> > ** dup v0\.4s, v0\.s\[0\]
> > --
> > 2.43.0
> >
On Wed, Jun 03, 2026 at 09:23:54PM +0100, Tamar Christina wrote:
> > -----Original Message-----
> > From: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
> > Sent: 03 June 2026 20:58
> > To: Artemiy Volkov <Artemiy.Volkov@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; Tamar Christina <Tamar.Christina@arm.com>;
> > Wilco Dijkstra <Wilco.Dijkstra@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; ktkachov@nvidia.com; Alice Carlotti
> > <Alice.Carlotti@arm.com>; Alex Coplan <Alex.Coplan@arm.com>
> > Subject: Re: [PATCH] aarch64: suppress duplication into sub-64-bit AdvSIMD
> > vectors [PR125538]
> >
> > On Wed, Jun 3, 2026 at 4:56 AM Artemiy Volkov <artemiy.volkov@arm.com>
> > wrote:
> > >
> > > As we don't have RTL support for duplicating values into partial AdvSIMD
> > > vector modes, any expression like (vec_duplicate:V4QI (reg:QI)) is going
> > > to be malformed. The ICE reported in PR125538 occurred because in
> > > r17-897-g4ddae2a94a032d we started generating such expressions when
> > doing
> > > a splat of the most common element at aarch64.cc:25876.
> > >
> > > To address the problem, this patch introduces the
> > > aarch64_gen_vec_duplicate () wrapper, which handles the case of a
> > > sub-64-bit destination mode by duplicating the source value into 64 bits
> > > and wrapping that into a SUBREG expression. The alternative here would be
> > > to add some more vec_duplicate RTL patterns, but that would lead to some
> > > code churn in aarch64-simd.md and break a long-standing invariant for no
> > > obvious benefit.
> >
> > I am not so sure there on no obvious benefit.
> > Take:
> > ```
> > #define vect4 __attribute__((vector_size(4)))
> >
> > void f(vect4 signed char *a, signed char b)
> > {
> > *a = (vect4 signed char){b,b,b,b};
> > }
> > void f1(signed char *a, signed char b)
> > {
> > a[0] = b;
> > a[1] = b;
> > a[2] = b;
> > a[3] = b;
> > }
> > ```
> > These could use a benifit of having a vec_dup of V4QI.
> > And it would be a good step forward of having V4QI/V2HI as not just a
> > container for initializations.
>
> I was busy writing an elaborate response to this but Pinski beat me to it.
>
> I would also indeed rather have you just add the RTL patterns for vec_duplate
> of the partial modes. I don't think they really require that much churn.
>
> Whether you take a subreg or just use a 128-bit/64-bit vector dup the semantic
> remains the same, as long as we access the register in their intended modes
> the result is sound.
>
> So I think you should really just define the vec_duplicates and you only really need
> 2 new RTL patterns.
>
> Additionally I'd not want to lose that gen_vec_duplicate just works. Requiring a target
> specific function is always annoying. It works better if we can just support the generic
> abstractions.
>
> Thanks,
> Tamar
Thank you both, it looks like my intuition was completely wrong on that
one.
Will post the alternative solution shortly.
Kind regards,
Artemiy
>
> >
> > Thanks,
> > Andrea
> >
> >
> >
> > >
> > > I've added the reduced testcase from the PR, as well as appended some
> > > similar tests to vec_init_5.c and vec-init-23.c.
> > >
> > > Regtested and bootstrapped on aarch64-linux-gnu.
> > >
> > > PR target/125538
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/aarch64/aarch64-protos.h (aarch64_gen_vec_duplicate):
> > > Declare new function.
> > > * config/aarch64/aarch64.cc (aarch64_gen_vec_duplicate): Define
> > > it.
> > > (aarch64_expand_vector_init_fallback): Use
> > > aarch64_gen_vec_duplicate () instead of gen_vec_duplicate ().
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/sve/vec_init_5.c: Add new 8/16-bit testcases.
> > > * gcc.target/aarch64/vec-init-23.c: Likewise.
> > > * gcc.target/aarch64/pr125538.c: New test.
> > > ---
> > > gcc/config/aarch64/aarch64-protos.h | 1 +
> > > gcc/config/aarch64/aarch64.cc | 36 ++++++++--
> > > gcc/testsuite/gcc.target/aarch64/pr125538.c | 20 ++++++
> > > .../gcc.target/aarch64/sve/vec_init_5.c | 69 ++++++++++++++++++
> > > .../gcc.target/aarch64/vec-init-23.c | 71 ++++++++++++++++++-
> > > 5 files changed, 191 insertions(+), 6 deletions(-)
> > > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr125538.c
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-protos.h
> > b/gcc/config/aarch64/aarch64-protos.h
> > > index 513b556398f..3e679f6d36a 100644
> > > --- a/gcc/config/aarch64/aarch64-protos.h
> > > +++ b/gcc/config/aarch64/aarch64-protos.h
> > > @@ -1014,6 +1014,7 @@ rtx aarch64_mask_from_zextract_ops (rtx, rtx);
> > > rtx aarch64_return_addr_rtx (void);
> > > rtx aarch64_return_addr (int, rtx);
> > > rtx aarch64_simd_gen_const_vector_dup (machine_mode,
> > HOST_WIDE_INT);
> > > +rtx aarch64_gen_vec_duplicate (machine_mode, rtx);
> > > rtx aarch64_gen_shareable_zero (machine_mode);
> > > bool aarch64_split_simd_shift_p (rtx_insn *);
> > > bool aarch64_simd_mem_operand_p (rtx);
> > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > > index 5a859e12b1a..4c7173c162a 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -25039,6 +25039,30 @@ aarch64_gen_shareable_zero
> > (machine_mode mode)
> > > return lowpart_subreg (mode, reg, GET_MODE (reg));
> > > }
> > >
> > > +/* Duplicate value X into a vector of type MODE. In case MODE is a
> > > + sub-64-bit mode and the result isn't a const_vector, duplicate into a
> > > + full register and take a SUBREG of that. */
> > > +
> > > +rtx
> > > +aarch64_gen_vec_duplicate (machine_mode mode, rtx x)
> > > +{
> > > + gcc_assert (VECTOR_MODE_P (mode));
> > > +
> > > + if (!aarch64_advsimd_sub_dword_mode_p (mode))
> > > + return gen_vec_duplicate (mode, x);
> > > +
> > > + if (valid_for_const_vector_p (mode, x))
> > > + return gen_const_vec_duplicate (mode, x);
> > > +
> > > + machine_mode dup_mode = mode_for_vector (GET_MODE_INNER
> > (mode),
> > > + 64 / GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
> > > + .require ();
> > > +
> > > + rtx reg = gen_reg_rtx (dup_mode);
> > > + aarch64_emit_move (reg, gen_rtx_VEC_DUPLICATE (dup_mode, x));
> > > + return lowpart_subreg (mode, reg, dup_mode);
> > > +}
> > > +
> > > /* INSN is some form of extension or shift that can be split into a
> > > permutation involving a shared zero. Return true if we should
> > > perform such a split.
> > > @@ -25699,7 +25723,7 @@ aarch64_expand_vector_init_fallback (rtx
> > target, rtx vals)
> > > 2 * GET_MODE_SIZE (narrow_mode)));
> > > if (rtx_equal_p (v0, v1))
> > > aarch64_emit_move (target,
> > > - gen_vec_duplicate (mode,
> > > + aarch64_gen_vec_duplicate (mode,
> > > force_reg (narrow_mode, v0)));
> > > else
> > > emit_insn (gen_aarch64_vec_concat (narrow_mode, target,
> > > @@ -25733,7 +25757,7 @@ aarch64_expand_vector_init_fallback (rtx
> > target, rtx vals)
> > > if (all_same)
> > > {
> > > rtx x = force_reg (inner_mode, v0);
> > > - aarch64_emit_move (target, gen_vec_duplicate (mode, x));
> > > + aarch64_emit_move (target, aarch64_gen_vec_duplicate (mode, x));
> > > return;
> > > }
> > >
> > > @@ -25769,7 +25793,8 @@ aarch64_expand_vector_init_fallback (rtx
> > target, rtx vals)
> > > RTVEC_ELT (new_vals, i) = XVECEXP (vals, 0, i);
> > > aarch64_expand_vector_init (new_target,
> > > gen_rtx_PARALLEL (subv_mode, new_vals));
> > > - aarch64_emit_move (target, gen_vec_duplicate (mode, new_target));
> > > + aarch64_emit_move (target,
> > > + aarch64_gen_vec_duplicate (mode, new_target));
> > > return;
> > > }
> > > }
> > > @@ -25862,7 +25887,8 @@ aarch64_expand_vector_init_fallback (rtx
> > target, rtx vals)
> > > if (const_elem)
> > > {
> > > maxelement = const_elem_pos;
> > > - aarch64_emit_move (target, gen_vec_duplicate (mode,
> > const_elem));
> > > + aarch64_emit_move (target,
> > > + aarch64_gen_vec_duplicate (mode, const_elem));
> > > }
> > > else
> > > {
> > > @@ -25873,7 +25899,7 @@ aarch64_expand_vector_init_fallback (rtx
> > target, rtx vals)
> > > else
> > > {
> > > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> > > - aarch64_emit_move (target, gen_vec_duplicate (mode, x));
> > > + aarch64_emit_move (target, aarch64_gen_vec_duplicate (mode, x));
> > > }
> > >
> > > /* Insert the rest. */
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr125538.c
> > b/gcc/testsuite/gcc.target/aarch64/pr125538.c
> > > new file mode 100644
> > > index 00000000000..f0cdcd58dfb
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/pr125538.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -march=armv9.5-a" } */
> > > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > > +
> > > +#define vect16 __attribute__((vector_size(16)))
> > > +
> > > +vect16 char fff(char _292, char _145, char _231)
> > > +{
> > > + return (vect16 char) {_292, _145, _145, _231, _292, _145, _145, _231,
> > _292, _145, _145, _231, _292, _145, _145, _231};
> > > +}
> > > +
> > > +/*
> > > +** fff:
> > > +** bfi w0, w1, 8, 8
> > > +** bfi w1, w2, 8, 8
> > > +** dup v31\.4h, w0
> > > +** dup v0\.4h, w1
> > > +** zip1 v0\.16b, v31\.16b, v0\.16b
> > > +** ret
> > > +*/
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> > > index 99e04aac265..112a0eafc7a 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> > > @@ -152,6 +152,27 @@
> > > ** ret
> > > */
> > >
> > > +/*
> > > +** test_int8_14:
> > > +** bfi w0, w1, 8, 8
> > > +** bfi w1, w2, 8, 8
> > > +** dup v31\.4h, w0
> > > +** dup v30\.4h, w1
> > > +** zip1 v31\.16b, v31\.16b, v30\.16b
> > > +** dup z0\.q, z31.q\[0\]
> > > +** ret
> > > +*/
> > > +
> > > +/*
> > > +** test_int8_15:
> > > +** bfi w0, w2, 8, 8
> > > +** dup v30\.8b, w1
> > > +** dup v31\.4h, w0
> > > +** zip1 v31\.16b, v31\.16b, v30\.16b
> > > +** dup z0\.q, z31\.q\[0\]
> > > +** ret
> > > +*/
> > > +
> > > /*
> > > ** test_float16_1:
> > > ** fcvt h0, s0
> > > @@ -236,6 +257,33 @@
> > > ** ret
> > > */
> > >
> > > +/*
> > > +** test_float16_9:
> > > +** fcvt h1, s1
> > > +** fcvt h2, s2
> > > +** fcvt h0, s0
> > > +** uzp1 v0\.4h, v0\.4h, v1\.4h
> > > +** uzp1 v1\.4h, v1\.4h, v2\.4h
> > > +** dup v0\.2s, v0\.s\[0\]
> > > +** dup v1\.2s, v1\.s\[0\]
> > > +** zip1 v0\.8h, v0\.8h, v1\.8h
> > > +** dup z0\.q, z0.q\[0\]
> > > +** ret
> > > +*/
> > > +
> > > +/*
> > > +** test_float16_10:
> > > +** fcvt h2, s2
> > > +** fcvt h0, s0
> > > +** fcvt h1, s1
> > > +** uzp1 v0\.4h, v0\.4h, v2\.4h
> > > +** dup v1\.4h, v1\.h\[0\]
> > > +** dup v0\.2s, v0\.s\[0\]
> > > +** zip1 v0\.8h, v0\.8h, v1\.8h
> > > +** dup z0\.q, z0.q\[0\]
> > > +** ret
> > > +*/
> > > +
> > > /*
> > > ** test_int16_1:
> > > ** mov z0\.h, w0
> > > @@ -310,6 +358,27 @@
> > > ** ret
> > > */
> > >
> > > +/*
> > > +** test_int16_9:
> > > +** bfi w0, w1, 16, 16
> > > +** bfi w1, w2, 16, 16
> > > +** dup v31\.2s, w0
> > > +** dup v30\.2s, w1
> > > +** zip1 v31\.8h, v31\.8h, v30\.8h
> > > +** dup z0\.q, z31\.q\[0\]
> > > +** ret
> > > +*/
> > > +
> > > +/*
> > > +** test_int16_10:
> > > +** bfi w0, w2, 16, 16
> > > +** dup v30\.4h, w1
> > > +** dup v31\.2s, w0
> > > +** zip1 v31\.8h, v31\.8h, v30\.8h
> > > +** dup z0\.q, z31\.q\[0\]
> > > +** ret
> > > +*/
> > > +
> > > /*
> > > ** test_float32_1:
> > > ** mov z0\.s, s0
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> > b/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> > > index 8c154f3680d..4721b068366 100644
> > > --- a/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> > > +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-23.c
> > > @@ -41,7 +41,11 @@
> > > TESTCASE (TYPE, ETYPE, T, 8, 12, x16, x0, x1, 0, 1, x2, x3, 2, 3,\
> > > x0, x1, 0, 1, x2, x3, 2, 3)\
> > > TESTCASE (TYPE, ETYPE, T, 8, 13, x16, 0, 1, x0, x1, 2, 3, x2, x3,\
> > > - 0, 1, x0, x1, 2, 3, x2, x3)
> > > + 0, 1, x0, x1, 2, 3, x2, x3) \
> > > + TESTCASE (TYPE, ETYPE, T, 8, 14, x16, x0, x1, x1, x2, x0, x1, x1, x2,\
> > > + x0, x1, x1, x2, x0, x1, x1, x2) \
> > > + TESTCASE (TYPE, ETYPE, T, 8, 15, x16, x0, x1, x2, x1, x0, x1, x2, x1,\
> > > + x0, x1, x2, x1, x0, x1, x2, x1)
> > >
> > > #define TEST_16(TYPE, ETYPE, T)\
> > > TESTCASE (TYPE, ETYPE, T, 16, 1, x8, x0, x0, x0, x0, x0, x0, x0, x0)\
> > > @@ -52,6 +56,8 @@
> > > TESTCASE (TYPE, ETYPE, T, 16, 6, x8, x0, x1, 0, 1, x0, x1, 0, 1)\
> > > TESTCASE (TYPE, ETYPE, T, 16, 7, x8, 0, 1, x0, x1, 0, 1, x0, x1)\
> > > TESTCASE (TYPE, ETYPE, T, 16, 8, x8, 0, x0, 1, x1, 0, x0, 1, x1)\
> > > + TESTCASE (TYPE, ETYPE, T, 16, 9, x8, x0, x1, x1, x2, x0, x1, x1, x2)\
> > > + TESTCASE (TYPE, ETYPE, T, 16, 10, x8, x0, x1, x2, x1, x0, x1, x2, x1)
> > >
> > > #define TEST_32(TYPE, ETYPE, T)\
> > > TESTCASE (TYPE, ETYPE, T, 32, 1, x4, x0, x0, x0, x0)\
> > > @@ -205,6 +211,25 @@ TEST_64(int, int64_t, s)
> > > ** ret
> > > */
> > >
> > > +/*
> > > +** test_int8_14:
> > > +** bfi w0, w1, 8, 8
> > > +** bfi w1, w2, 8, 8
> > > +** dup v31\.4h, w0
> > > +** dup v0\.4h, w1
> > > +** zip1 v0\.16b, v31\.16b, v0\.16b
> > > +** ret
> > > +*/
> > > +
> > > +/*
> > > +** test_int8_15:
> > > +** bfi w0, w2, 8, 8
> > > +** dup v0.8b, w1
> > > +** dup v31.4h, w0
> > > +** zip1 v0.16b, v31.16b, v0.16b
> > > +** ret
> > > +*/
> > > +
> > > /*
> > > ** test_float16_1:
> > > ** fcvt h0, s0
> > > @@ -286,6 +311,31 @@ TEST_64(int, int64_t, s)
> > > ** ret
> > > */
> > >
> > > +/*
> > > +** test_float16_9:
> > > +** fcvt h1, s1
> > > +** fcvt h2, s2
> > > +** fcvt h0, s0
> > > +** uzp1 v0\.4h, v0\.4h, v1\.4h
> > > +** uzp1 v1\.4h, v1\.4h, v2\.4h
> > > +** dup v0\.2s, v0\.s\[0\]
> > > +** dup v1\.2s, v1\.s\[0\]
> > > +** zip1 v0\.8h, v0\.8h, v1\.8h
> > > +** ret
> > > +*/
> > > +
> > > +/*
> > > +** test_float16_10:
> > > +** fcvt h2, s2
> > > +** fcvt h0, s0
> > > +** fcvt h1, s1
> > > +** uzp1 v0\.4h, v0\.4h, v2\.4h
> > > +** dup v1\.4h, v1\.h\[0\]
> > > +** dup v0\.2s, v0\.s\[0\]
> > > +** zip1 v0\.8h, v0\.8h, v1\.8h
> > > +** ret
> > > +*/
> > > +
> > > /*
> > > ** test_int16_1:
> > > ** dup v0\.8h, w0
> > > @@ -356,6 +406,25 @@ TEST_64(int, int64_t, s)
> > > ** ret
> > > */
> > >
> > > +/*
> > > +** test_int16_9:
> > > +** bfi w0, w1, 16, 16
> > > +** bfi w1, w2, 16, 16
> > > +** dup v31\.2s, w0
> > > +** dup v0\.2s, w1
> > > +** zip1 v0\.8h, v31\.8h, v0\.8h
> > > +** ret
> > > +*/
> > > +
> > > +/*
> > > +** test_int16_10:
> > > +** bfi w0, w2, 16, 16
> > > +** dup v0\.4h, w1
> > > +** dup v31\.2s, w0
> > > +** zip1 v0\.8h, v31\.8h, v0\.8h
> > > +** ret
> > > +*/
> > > +
> > > /*
> > > ** test_float32_1:
> > > ** dup v0\.4s, v0\.s\[0\]
> > > --
> > > 2.43.0
> > >
@@ -1014,6 +1014,7 @@ rtx aarch64_mask_from_zextract_ops (rtx, rtx);
rtx aarch64_return_addr_rtx (void);
rtx aarch64_return_addr (int, rtx);
rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
+rtx aarch64_gen_vec_duplicate (machine_mode, rtx);
rtx aarch64_gen_shareable_zero (machine_mode);
bool aarch64_split_simd_shift_p (rtx_insn *);
bool aarch64_simd_mem_operand_p (rtx);
@@ -25039,6 +25039,30 @@ aarch64_gen_shareable_zero (machine_mode mode)
return lowpart_subreg (mode, reg, GET_MODE (reg));
}
+/* Duplicate value X into a vector of type MODE. In case MODE is a
+ sub-64-bit mode and the result isn't a const_vector, duplicate into a
+ full register and take a SUBREG of that. */
+
+rtx
+aarch64_gen_vec_duplicate (machine_mode mode, rtx x)
+{
+ gcc_assert (VECTOR_MODE_P (mode));
+
+ if (!aarch64_advsimd_sub_dword_mode_p (mode))
+ return gen_vec_duplicate (mode, x);
+
+ if (valid_for_const_vector_p (mode, x))
+ return gen_const_vec_duplicate (mode, x);
+
+ machine_mode dup_mode = mode_for_vector (GET_MODE_INNER (mode),
+ 64 / GET_MODE_BITSIZE (GET_MODE_INNER (mode)))
+ .require ();
+
+ rtx reg = gen_reg_rtx (dup_mode);
+ aarch64_emit_move (reg, gen_rtx_VEC_DUPLICATE (dup_mode, x));
+ return lowpart_subreg (mode, reg, dup_mode);
+}
+
/* INSN is some form of extension or shift that can be split into a
permutation involving a shared zero. Return true if we should
perform such a split.
@@ -25699,7 +25723,7 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
2 * GET_MODE_SIZE (narrow_mode)));
if (rtx_equal_p (v0, v1))
aarch64_emit_move (target,
- gen_vec_duplicate (mode,
+ aarch64_gen_vec_duplicate (mode,
force_reg (narrow_mode, v0)));
else
emit_insn (gen_aarch64_vec_concat (narrow_mode, target,
@@ -25733,7 +25757,7 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
if (all_same)
{
rtx x = force_reg (inner_mode, v0);
- aarch64_emit_move (target, gen_vec_duplicate (mode, x));
+ aarch64_emit_move (target, aarch64_gen_vec_duplicate (mode, x));
return;
}
@@ -25769,7 +25793,8 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
RTVEC_ELT (new_vals, i) = XVECEXP (vals, 0, i);
aarch64_expand_vector_init (new_target,
gen_rtx_PARALLEL (subv_mode, new_vals));
- aarch64_emit_move (target, gen_vec_duplicate (mode, new_target));
+ aarch64_emit_move (target,
+ aarch64_gen_vec_duplicate (mode, new_target));
return;
}
}
@@ -25862,7 +25887,8 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
if (const_elem)
{
maxelement = const_elem_pos;
- aarch64_emit_move (target, gen_vec_duplicate (mode, const_elem));
+ aarch64_emit_move (target,
+ aarch64_gen_vec_duplicate (mode, const_elem));
}
else
{
@@ -25873,7 +25899,7 @@ aarch64_expand_vector_init_fallback (rtx target, rtx vals)
else
{
rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
- aarch64_emit_move (target, gen_vec_duplicate (mode, x));
+ aarch64_emit_move (target, aarch64_gen_vec_duplicate (mode, x));
}
/* Insert the rest. */
new file mode 100644
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv9.5-a" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define vect16 __attribute__((vector_size(16)))
+
+vect16 char fff(char _292, char _145, char _231)
+{
+ return (vect16 char) {_292, _145, _145, _231, _292, _145, _145, _231, _292, _145, _145, _231, _292, _145, _145, _231};
+}
+
+/*
+** fff:
+** bfi w0, w1, 8, 8
+** bfi w1, w2, 8, 8
+** dup v31\.4h, w0
+** dup v0\.4h, w1
+** zip1 v0\.16b, v31\.16b, v0\.16b
+** ret
+*/
@@ -152,6 +152,27 @@
** ret
*/
+/*
+** test_int8_14:
+** bfi w0, w1, 8, 8
+** bfi w1, w2, 8, 8
+** dup v31\.4h, w0
+** dup v30\.4h, w1
+** zip1 v31\.16b, v31\.16b, v30\.16b
+** dup z0\.q, z31.q\[0\]
+** ret
+*/
+
+/*
+** test_int8_15:
+** bfi w0, w2, 8, 8
+** dup v30\.8b, w1
+** dup v31\.4h, w0
+** zip1 v31\.16b, v31\.16b, v30\.16b
+** dup z0\.q, z31\.q\[0\]
+** ret
+*/
+
/*
** test_float16_1:
** fcvt h0, s0
@@ -236,6 +257,33 @@
** ret
*/
+/*
+** test_float16_9:
+** fcvt h1, s1
+** fcvt h2, s2
+** fcvt h0, s0
+** uzp1 v0\.4h, v0\.4h, v1\.4h
+** uzp1 v1\.4h, v1\.4h, v2\.4h
+** dup v0\.2s, v0\.s\[0\]
+** dup v1\.2s, v1\.s\[0\]
+** zip1 v0\.8h, v0\.8h, v1\.8h
+** dup z0\.q, z0.q\[0\]
+** ret
+*/
+
+/*
+** test_float16_10:
+** fcvt h2, s2
+** fcvt h0, s0
+** fcvt h1, s1
+** uzp1 v0\.4h, v0\.4h, v2\.4h
+** dup v1\.4h, v1\.h\[0\]
+** dup v0\.2s, v0\.s\[0\]
+** zip1 v0\.8h, v0\.8h, v1\.8h
+** dup z0\.q, z0.q\[0\]
+** ret
+*/
+
/*
** test_int16_1:
** mov z0\.h, w0
@@ -310,6 +358,27 @@
** ret
*/
+/*
+** test_int16_9:
+** bfi w0, w1, 16, 16
+** bfi w1, w2, 16, 16
+** dup v31\.2s, w0
+** dup v30\.2s, w1
+** zip1 v31\.8h, v31\.8h, v30\.8h
+** dup z0\.q, z31\.q\[0\]
+** ret
+*/
+
+/*
+** test_int16_10:
+** bfi w0, w2, 16, 16
+** dup v30\.4h, w1
+** dup v31\.2s, w0
+** zip1 v31\.8h, v31\.8h, v30\.8h
+** dup z0\.q, z31\.q\[0\]
+** ret
+*/
+
/*
** test_float32_1:
** mov z0\.s, s0
@@ -41,7 +41,11 @@
TESTCASE (TYPE, ETYPE, T, 8, 12, x16, x0, x1, 0, 1, x2, x3, 2, 3,\
x0, x1, 0, 1, x2, x3, 2, 3)\
TESTCASE (TYPE, ETYPE, T, 8, 13, x16, 0, 1, x0, x1, 2, 3, x2, x3,\
- 0, 1, x0, x1, 2, 3, x2, x3)
+ 0, 1, x0, x1, 2, 3, x2, x3) \
+ TESTCASE (TYPE, ETYPE, T, 8, 14, x16, x0, x1, x1, x2, x0, x1, x1, x2,\
+ x0, x1, x1, x2, x0, x1, x1, x2) \
+ TESTCASE (TYPE, ETYPE, T, 8, 15, x16, x0, x1, x2, x1, x0, x1, x2, x1,\
+ x0, x1, x2, x1, x0, x1, x2, x1)
#define TEST_16(TYPE, ETYPE, T)\
TESTCASE (TYPE, ETYPE, T, 16, 1, x8, x0, x0, x0, x0, x0, x0, x0, x0)\
@@ -52,6 +56,8 @@
TESTCASE (TYPE, ETYPE, T, 16, 6, x8, x0, x1, 0, 1, x0, x1, 0, 1)\
TESTCASE (TYPE, ETYPE, T, 16, 7, x8, 0, 1, x0, x1, 0, 1, x0, x1)\
TESTCASE (TYPE, ETYPE, T, 16, 8, x8, 0, x0, 1, x1, 0, x0, 1, x1)\
+ TESTCASE (TYPE, ETYPE, T, 16, 9, x8, x0, x1, x1, x2, x0, x1, x1, x2)\
+ TESTCASE (TYPE, ETYPE, T, 16, 10, x8, x0, x1, x2, x1, x0, x1, x2, x1)
#define TEST_32(TYPE, ETYPE, T)\
TESTCASE (TYPE, ETYPE, T, 32, 1, x4, x0, x0, x0, x0)\
@@ -205,6 +211,25 @@ TEST_64(int, int64_t, s)
** ret
*/
+/*
+** test_int8_14:
+** bfi w0, w1, 8, 8
+** bfi w1, w2, 8, 8
+** dup v31\.4h, w0
+** dup v0\.4h, w1
+** zip1 v0\.16b, v31\.16b, v0\.16b
+** ret
+*/
+
+/*
+** test_int8_15:
+** bfi w0, w2, 8, 8
+** dup v0.8b, w1
+** dup v31.4h, w0
+** zip1 v0.16b, v31.16b, v0.16b
+** ret
+*/
+
/*
** test_float16_1:
** fcvt h0, s0
@@ -286,6 +311,31 @@ TEST_64(int, int64_t, s)
** ret
*/
+/*
+** test_float16_9:
+** fcvt h1, s1
+** fcvt h2, s2
+** fcvt h0, s0
+** uzp1 v0\.4h, v0\.4h, v1\.4h
+** uzp1 v1\.4h, v1\.4h, v2\.4h
+** dup v0\.2s, v0\.s\[0\]
+** dup v1\.2s, v1\.s\[0\]
+** zip1 v0\.8h, v0\.8h, v1\.8h
+** ret
+*/
+
+/*
+** test_float16_10:
+** fcvt h2, s2
+** fcvt h0, s0
+** fcvt h1, s1
+** uzp1 v0\.4h, v0\.4h, v2\.4h
+** dup v1\.4h, v1\.h\[0\]
+** dup v0\.2s, v0\.s\[0\]
+** zip1 v0\.8h, v0\.8h, v1\.8h
+** ret
+*/
+
/*
** test_int16_1:
** dup v0\.8h, w0
@@ -356,6 +406,25 @@ TEST_64(int, int64_t, s)
** ret
*/
+/*
+** test_int16_9:
+** bfi w0, w1, 16, 16
+** bfi w1, w2, 16, 16
+** dup v31\.2s, w0
+** dup v0\.2s, w1
+** zip1 v0\.8h, v31\.8h, v0\.8h
+** ret
+*/
+
+/*
+** test_int16_10:
+** bfi w0, w2, 16, 16
+** dup v0\.4h, w1
+** dup v31\.2s, w0
+** zip1 v0\.8h, v31\.8h, v0\.8h
+** ret
+*/
+
/*
** test_float32_1:
** dup v0\.4s, v0\.s\[0\]