From patchwork Fri Aug  5 12:53:07 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
X-Patchwork-Id: 56559
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id DEB493857C58
	for <patchwork@sourceware.org>; Fri,  5 Aug 2022 12:53:43 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DEB493857C58
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1659704023;
	bh=pQOjoOiP09kxV643OhLiGH4qegREcqiUiK8X8XmQfBQ=;
	h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=HIEkvaav7yyEI0l3pL13Q2QqNY740xWRrnC0sZggHuHf6IED8nNo4SoJm6/iTVU+r
	 pw1uUm1NbzbRR/ictSDNCnyPtb5aohE6DfSa2odxkJe6vdtRVnbSr/g1f4sEjvqs3t
	 9RT7BogR8lCCO7wvLqMvTgL4NKQG6yBzHMNqLPBk=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 3207E3858C53
 for <gcc-patches@gcc.gnu.org>; Fri,  5 Aug 2022 12:53:15 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3207E3858C53
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 596C7106F;
 Fri,  5 Aug 2022 05:53:15 -0700 (PDT)
Received: from [10.57.14.36] (unknown [10.57.14.36])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E25473F73B;
 Fri,  5 Aug 2022 05:53:13 -0700 (PDT)
Message-ID: <6bdb70e9-8c02-2c91-9ec3-33004a67c3ed@arm.com>
Date: Fri, 5 Aug 2022 13:53:07 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
Subject: [PATCH 1/4] aarch64: encourage use of GPR input for SIMD inserts
Content-Language: en-US
To: gcc-patches@gcc.gnu.org
References: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
In-Reply-To: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
X-Spam-Status: No, score=-22.5 required=5.0 tests=BAYES_00, BODY_8BITS,
 GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi,

This enables and makes it more likely the compiler is able to use GPR 
input for SIMD inserts. I believe this is some outdated hack we used to 
prevent costly GPR<->SIMD register file swaps. This patch is required 
for better codegen in situations like the test case 'int8_3' in the next 
patch in this series.

Bootstrapped and regression tested together with the next patch on 
aarch64-none-linux-gnu.

gcc/ChangeLog:

2022-08-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set<mode>): 
Remove '?' modifier.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 587a45d77721e1b39accbad7dbeca4d741eccb10..51eab5a872ade7b70268676346e8be7c9c6c8e3a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1038,7 +1038,7 @@
   [(set (match_operand:VALL_F16 0 "register_operand" "=w,w,w")
 	(vec_merge:VALL_F16
 	    (vec_duplicate:VALL_F16
-		(match_operand:<VEL> 1 "aarch64_simd_nonimmediate_operand" "w,?r,Utv"))
+		(match_operand:<VEL> 1 "aarch64_simd_nonimmediate_operand" "w,r,Utv"))
 	    (match_operand:VALL_F16 3 "register_operand" "0,0,0")
 	    (match_operand:SI 2 "immediate_operand" "i,i,i")))]
   "TARGET_SIMD"

From patchwork Fri Aug  5 12:55:02 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
X-Patchwork-Id: 56560
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 1AE553858028
	for <patchwork@sourceware.org>; Fri,  5 Aug 2022 12:55:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1AE553858028
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1659704140;
	bh=B+kbS9OP+XO/2R0NXPswln9W/MU+vpYo1eBnAr1o4Uw=;
	h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=caxRIMxo3HlNTg5LppyFzlQGc3q9k5tBnaxfQ3BbsjDoymZkKtQNugeftxQnJQCl9
	 ZII9UjQFUuC9aH0xlUA8q0z5JtzX8O2VF9alxvARHgcIWPEYBo4soudi9totF3PKXm
	 90uH0ZhrkHDVoht2dSiQaOSzqR0H6b43kFm/unVc=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id B0A873858C53
 for <gcc-patches@gcc.gnu.org>; Fri,  5 Aug 2022 12:55:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B0A873858C53
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B620B106F;
 Fri,  5 Aug 2022 05:55:09 -0700 (PDT)
Received: from [10.57.14.36] (unknown [10.57.14.36])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 209933F73B;
 Fri,  5 Aug 2022 05:55:07 -0700 (PDT)
Message-ID: <317d0d74-e7e1-05e8-45d3-98bbc929a922@arm.com>
Date: Fri, 5 Aug 2022 13:55:02 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
Subject: [PATCH 2/4]aarch64: Change aarch64_expand_vector_init to use
 rtx_vector_builder
Content-Language: en-US
To: gcc-patches@gcc.gnu.org
References: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
In-Reply-To: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
X-Spam-Status: No, score=-22.1 required=5.0 tests=BAYES_00, BODY_8BITS,
 GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY,
 KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi,

This patch changes aarch64_expand_vector_init to use rtx_vector_builder,
exploiting it's internal pattern detection to find 'dup' patterns.

Bootstrapped and regression tested on aarch64-none-linux-gnu.

Is this OK for trunk or should we wait for the rest of the series?

gcc/ChangeLog:
2022-08-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * config/aarch64/aarch64.cc (aarch64_vec_duplicate): New.
          (aarch64_expand_vector_init): Make the existing variant construct
          a rtx_vector_builder from the list of elements and use this to 
detect
          duplicate patterns.

gcc/testesuite/ChangeLog:
2022-08-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * gcc.target/aarch64/ldp_stp_16.c: Modify to reflect code change.

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 4b486aeea90ea2afb9cdd96a4dbe15c5bb2abd7a..a08043e18d609e258ebfe033875201163d129aba 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -305,6 +305,7 @@ static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
 static bool aarch64_print_address_internal (FILE*, machine_mode, rtx,
 					    aarch64_addr_query_type);
 static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val);
+static void aarch64_expand_vector_init (rtx, rtx_vector_builder&);
 
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
@@ -21804,55 +21805,96 @@ aarch64_simd_make_constant (rtx vals)
     return NULL_RTX;
 }
 
+static void
+aarch64_vec_duplicate (rtx target, machine_mode mode, machine_mode element_mode,
+		       int narrow_n_elts)
+{
+  poly_uint64 size = narrow_n_elts * GET_MODE_BITSIZE (element_mode);
+  scalar_mode i_mode = int_mode_for_size (size, 0).require ();
+  machine_mode o_mode;
+  if (aarch64_sve_mode_p (mode))
+    o_mode = aarch64_full_sve_mode (i_mode).require ();
+  else
+    o_mode
+      = aarch64_simd_container_mode (i_mode,
+				     GET_MODE_BITSIZE (mode));
+  rtx input = simplify_gen_subreg (i_mode, target, mode, 0);
+  rtx output = simplify_gen_subreg (o_mode, target, mode, 0);
+  aarch64_emit_move (output, gen_vec_duplicate (o_mode, input));
+}
+
+
 /* Expand a vector initialisation sequence, such that TARGET is
    initialised to contain VALS.  */
 
 void
 aarch64_expand_vector_init (rtx target, rtx vals)
 {
-  machine_mode mode = GET_MODE (target);
-  scalar_mode inner_mode = GET_MODE_INNER (mode);
   /* The number of vector elements.  */
   int n_elts = XVECLEN (vals, 0);
-  /* The number of vector elements which are not constant.  */
-  int n_var = 0;
-  rtx any_const = NULL_RTX;
+  machine_mode mode = GET_MODE (target);
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
   /* The first element of vals.  */
   rtx v0 = XVECEXP (vals, 0, 0);
-  bool all_same = true;
 
   /* This is a special vec_init<M><N> where N is not an element mode but a
      vector mode with half the elements of M.  We expect to find two entries
      of mode N in VALS and we must put their concatentation into TARGET.  */
-  if (XVECLEN (vals, 0) == 2 && VECTOR_MODE_P (GET_MODE (XVECEXP (vals, 0, 0))))
+  if (n_elts == 2
+      && VECTOR_MODE_P (GET_MODE (v0)))
     {
-      machine_mode narrow_mode = GET_MODE (XVECEXP (vals, 0, 0));
+      machine_mode narrow_mode = GET_MODE (v0);
       gcc_assert (GET_MODE_INNER (narrow_mode) == inner_mode
 		  && known_eq (GET_MODE_SIZE (mode),
 			       2 * GET_MODE_SIZE (narrow_mode)));
-      emit_insn (gen_aarch64_vec_concat (narrow_mode, target,
-					 XVECEXP (vals, 0, 0),
+      emit_insn (gen_aarch64_vec_concat (narrow_mode, target, v0,
 					 XVECEXP (vals, 0, 1)));
      return;
    }
 
-  /* Count the number of variable elements to initialise.  */
+  rtx_vector_builder builder (mode, n_elts, 1);
   for (int i = 0; i < n_elts; ++i)
+    builder.quick_push (XVECEXP (vals, 0, i));
+  builder.finalize ();
+
+  aarch64_expand_vector_init (target, builder);
+}
+
+static void
+aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
+{
+  machine_mode mode = GET_MODE (target);
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  /* The number of vector elements which are not constant.  */
+  unsigned n_var = 0;
+  rtx any_const = NULL_RTX;
+  /* The first element of vals.  */
+  rtx v0 = v.elt (0);
+  /* Get the number of elements to insert into an Advanced SIMD vector.
+     If we have more than one element per pattern then we use the constant
+     number of elements in a full vector.
+     If we only have one element per pattern we use the number of patterns as
+     this may be lower than the number of elements in a full vector, which
+     means they repeat and we should use a duplicate of the smaller vector.  */
+  unsigned n_elts
+    = v.nelts_per_pattern () == 1 ? v.npatterns ()
+				  : v.full_nelts ().coeffs[0];
+
+  /* Count the number of variable elements to initialise.  */
+  for (unsigned i = 0; i < n_elts ; ++i)
     {
-      rtx x = XVECEXP (vals, 0, i);
+      rtx x = v.elt (i);
       if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x)))
 	++n_var;
       else
 	any_const = x;
-
-      all_same &= rtx_equal_p (x, v0);
     }
 
   /* No variable elements, hand off to aarch64_simd_make_constant which knows
      how best to handle this.  */
   if (n_var == 0)
     {
-      rtx constant = aarch64_simd_make_constant (vals);
+      rtx constant = aarch64_simd_make_constant (v.build ());
       if (constant != NULL_RTX)
 	{
 	  emit_move_insn (target, constant);
@@ -21861,7 +21903,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
     }
 
   /* Splat a single non-constant element if we can.  */
-  if (all_same)
+  if (n_elts == 1)
     {
       rtx x = copy_to_mode_reg (inner_mode, v0);
       aarch64_emit_move (target, gen_vec_duplicate (mode, x));
@@ -21879,14 +21921,15 @@ aarch64_expand_vector_init (rtx target, rtx vals)
      and matches[X][1] with the count of duplicate elements (if X is the
      earliest element which has duplicates).  */
 
-  if (n_var == n_elts && n_elts <= 16)
+  if (n_var == n_elts)
     {
-      int matches[16][2] = {0};
-      for (int i = 0; i < n_elts; i++)
+      gcc_assert (n_elts <= 16);
+      unsigned matches[16][2] = {0};
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  for (int j = 0; j <= i; j++)
+	  for (unsigned j = 0; j <= i; j++)
 	    {
-	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+	      if (rtx_equal_p (v.elt (i), v.elt (j)))
 		{
 		  matches[i][0] = j;
 		  matches[j][1]++;
@@ -21894,9 +21937,9 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 		}
 	    }
 	}
-      int maxelement = 0;
-      int maxv = 0;
-      for (int i = 0; i < n_elts; i++)
+      unsigned maxelement = 0;
+      unsigned maxv = 0;
+      for (unsigned i = 0; i < n_elts; i++)
 	if (matches[i][1] > maxv)
 	  {
 	    maxelement = i;
@@ -21915,8 +21958,8 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 		  || inner_mode == E_DFmode))
 
 	    {
-	      rtx x0 = XVECEXP (vals, 0, 0);
-	      rtx x1 = XVECEXP (vals, 0, 1);
+	      rtx x0 = v.elt (0);
+	      rtx x1 = v.elt (1);
 	      /* Combine can pick up this case, but handling it directly
 		 here leaves clearer RTL.
 
@@ -21939,24 +21982,26 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	     vector register.  For big-endian we want that position to hold
 	     the last element of VALS.  */
 	  maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  rtx x = copy_to_mode_reg (inner_mode, v.elt (maxelement));
 	  aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
 	}
       else
 	{
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  rtx x = copy_to_mode_reg (inner_mode, v.elt (maxelement));
 	  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
 	}
 
       /* Insert the rest.  */
-      for (int i = 0; i < n_elts; i++)
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  rtx x = XVECEXP (vals, 0, i);
+	  rtx x = v.elt (i);
 	  if (matches[i][0] == maxelement)
 	    continue;
 	  x = copy_to_mode_reg (inner_mode, x);
 	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
 	}
+	if (!known_eq (v.full_nelts (), n_elts))
+	  aarch64_vec_duplicate (target, mode, GET_MODE (v0), n_elts);
       return;
     }
 
@@ -21965,19 +22010,19 @@ aarch64_expand_vector_init (rtx target, rtx vals)
      can.  */
   if (n_var != n_elts)
     {
-      rtx copy = copy_rtx (vals);
+      rtx copy = v.build ();
 
       /* Load constant part of vector.  We really don't care what goes into the
 	 parts we will overwrite, but we're more likely to be able to load the
 	 constant efficiently if it has fewer, larger, repeating parts
 	 (see aarch64_simd_valid_immediate).  */
-      for (int i = 0; i < n_elts; i++)
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  rtx x = XVECEXP (vals, 0, i);
+	  rtx x = XVECEXP (copy, 0, i);
 	  if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
 	    continue;
 	  rtx subst = any_const;
-	  for (int bit = n_elts / 2; bit > 0; bit /= 2)
+	  for (unsigned bit = n_elts / 2; bit > 0; bit /= 2)
 	    {
 	      /* Look in the copied vector, as more elements are const.  */
 	      rtx test = XVECEXP (copy, 0, i ^ bit);
@@ -21989,18 +22034,21 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	    }
 	  XVECEXP (copy, 0, i) = subst;
 	}
+      gcc_assert (GET_MODE (target) == GET_MODE (copy));
       aarch64_expand_vector_init (target, copy);
     }
 
   /* Insert the variable lanes directly.  */
-  for (int i = 0; i < n_elts; i++)
+  for (unsigned i = 0; i < n_elts; i++)
     {
-      rtx x = XVECEXP (vals, 0, i);
+      rtx x = v.elt (i);
       if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
 	continue;
       x = copy_to_mode_reg (inner_mode, x);
       emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
     }
+  if (!known_eq (v.full_nelts (), n_elts))
+    aarch64_vec_duplicate (target, mode, inner_mode, n_elts);
 }
 
 /* Emit RTL corresponding to:
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
index 8ab117c4dcd7a731abc7e1b039e1faf0dfa09a5d..b307d2791824dd9c30200931452b2636708b5035 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
@@ -96,8 +96,8 @@ CONS2_FN (4, float);
 
 /*
 ** cons2_8_float:
-**	dup	v([0-9]+)\.4s, .*
-**	...
+**	ins	v0\.s\[1\], v1\.s\[0\]
+**	dup	v([0-9]+)\.2d, v0\.d\[0\]
 **	stp	q\1, q\1, \[x0\]
 **	stp	q\1, q\1, \[x0, #?32\]
 **	ret
diff --git a/gcc/testsuite/gcc.target/aarch64/vect_init.c b/gcc/testsuite/gcc.target/aarch64/vect_init.c
new file mode 100644
index 0000000000000000000000000000000000000000..546e44e96f4db60d289b4bc0ebfecbe18c81b4cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect_init.c
@@ -0,0 +1,144 @@
+#include <arm_neon.h>
+
+/*
+** int32_0:
+**	fmov	s0, w0
+**	ins	v0.s\[1\], w1
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int32x4_t int32_0 (int a, int b)
+{
+  int32x4_t v = {a, b, a, b};
+  return v;
+}
+/*
+** int32_1:
+**	dup	v0.4s, w0
+**	ret
+*/
+
+int32x4_t int32_1 (int a)
+{
+  int32x4_t v = {a, a, a, a};
+  return v;
+}
+
+/*
+** int16_0:
+**	sxth	w0, w0
+**	fmov	s0, w0
+**	ins	v0.h\[1\], w1
+**	ins	v0.h\[2\], w2
+**	ins	v0.h\[3\], w3
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int16x8_t int16_0 (int16_t a, int16_t b, int16_t c, int16_t d)
+{
+  int16x8_t v = {a, b, c, d,
+		 a, b, c, d};
+  return v;
+}
+
+/*
+** int16_1:
+**	sxth	w0, w0
+**	fmov	s0, w0
+**	ins	v0.h\[1\], w1
+**	dup	v0.4s, v0.s\[0\]
+**	ret
+*/
+
+int16x8_t int16_1 (int16_t a, int16_t b)
+{
+  int16x8_t v = {a, b, a, b,
+		 a, b, a, b};
+  return v;
+}
+
+/*
+** int16_2:
+**	dup	v0.8h, w0
+**	ret
+*/
+
+int16x8_t int16_2 (int16_t a)
+{
+  int16x8_t v = {a, a, a, a,
+		 a, a, a, a};
+  return v;
+}
+
+/*
+** int8_0:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	ins	v0.b\[4\], w4
+**	ins	v0.b\[5\], w5
+**	ins	v0.b\[6\], w6
+**	ins	v0.b\[7\], w7
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int8x16_t int8_0 (int8_t a, int8_t b, int8_t c, int8_t d, int8_t e, int8_t f,
+		   int8_t g, int8_t h)
+{
+  int8x16_t v = {a, b, c, d, e, f, g, h,
+                 a, b, c, d, e, f, g, h};
+  return v;
+}
+
+/*
+** int8_1:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	dup	v0.4s, v0.s\[0\]
+**	ret
+*/
+
+int8x16_t int8_1 (int8_t a, int8_t b, int8_t c, int8_t d)
+{
+  int8x16_t v = {a, b, c, d, a, b, c, d,
+                 a, b, c, d, a, b, c, d};
+  return v;
+}
+
+/*
+** int8_2:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	dup	v0.8h, v0.h\[0\]
+**	ret
+*/
+
+int8x16_t int8_2 (int8_t a, int8_t b)
+{
+  int8x16_t v = {a, b, a, b, a, b, a, b,
+                 a, b, a, b, a, b, a, b};
+  return v;
+}
+
+/*
+** int8_3:
+**	dup	v0.16b, w0
+**	ret
+*/
+
+int8x16_t int8_3 (int8_t a)
+{
+  int8x16_t v = {a, a, a, a, a, a, a, a,
+                 a, a, a, a, a, a, a, a};
+  return v;
+}
+

From patchwork Fri Aug  5 12:56:48 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
X-Patchwork-Id: 56561
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 19F903858028
	for <patchwork@sourceware.org>; Fri,  5 Aug 2022 12:57:27 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 19F903858028
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1659704247;
	bh=/GYSMh8YUQHJWjKO4P4xa5kndn91rhpd2XxuGubsQb0=;
	h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=wRKlgdM5aiFBkH73N3Wmj5uglil0kFFYUtPPNuD5ROI3aICJjmbqpZi+6q/YzFzmy
	 JLRrnE9+s879BOz2G3BXn81SFAr1xxajIAqDyfwBB7KqDYOTf7rK3BHn130q6y5iZJ
	 VWaaiT/uv1cAJYRlU4nmIexY/KJx1fV3I6h0unYE=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id BFEC6385840B
 for <gcc-patches@gcc.gnu.org>; Fri,  5 Aug 2022 12:56:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BFEC6385840B
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 10D76113E;
 Fri,  5 Aug 2022 05:56:59 -0700 (PDT)
Received: from [10.57.14.36] (unknown [10.57.14.36])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A45E03F73B;
 Fri,  5 Aug 2022 05:56:57 -0700 (PDT)
Message-ID: <69a0dc52-4125-1d25-fa2b-4acf6cc3b80f@arm.com>
Date: Fri, 5 Aug 2022 13:56:48 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
Subject: [PATCH 3/4] match.pd: Teach forwprop to handle VLA VEC_PERM_EXPRs
 with VLS CONSTRUCTORs as arguments
Content-Language: en-US
To: gcc-patches@gcc.gnu.org
References: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
In-Reply-To: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
X-Spam-Status: No, score=-23.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE,
 SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

Hi,

This patch is part of the WIP patch that follows in this series. It's 
goal is to teach forwprop to handle VLA VEC_PERM_EXPRs with VLS 
CONSTRUCTORs as arguments as preparation for the 'VLA constructor' hook 
approach.

Kind Regards,
Andre

diff --git a/gcc/match.pd b/gcc/match.pd
index 9736393061aac61d4d53aaad6cf6b2c97a7d4679..3c3c0c6a88b35a6e42c506f6c4603680fe6e4318 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7852,14 +7852,24 @@ and,
     if (!tree_to_vec_perm_builder (&builder, op2))
       return NULL_TREE;
 
+    /* FIXME: disable folding of a VEC_PERM_EXPR with a VLA mask and VLS
+       CONSTRUCTORS, since that would yield a VLA CONSTRUCTOR which we
+       currently do not support.  */
+    if (!TYPE_VECTOR_SUBPARTS (type).is_constant ()
+	&& (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)).is_constant ()
+	    || TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)).is_constant ()))
+      return NULL_TREE;
+
     /* Create a vec_perm_indices for the integer vector.  */
     poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
     bool single_arg = (op0 == op1);
     vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
   }
-  (if (sel.series_p (0, 1, 0, 1))
+  (if (sel.series_p (0, 1, 0, 1)
+       && useless_type_conversion_p (type, TREE_TYPE (op0)))
    { op0; }
-   (if (sel.series_p (0, 1, nelts, 1))
+   (if (sel.series_p (0, 1, nelts, 1)
+	&& useless_type_conversion_p (type, TREE_TYPE (op1)))
     { op1; }
     (with
      {
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index fdc4bc8909d2763876550e53277ff2b3dcca796a..cda91c21c476ea8611e12c593bfa64e1d71dd29e 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2661,7 +2661,7 @@ simplify_permutation (gimple_stmt_iterator *gsi)
 
       /* Shuffle of a constructor.  */
       bool ret = false;
-      tree res_type = TREE_TYPE (arg0);
+      tree res_type = TREE_TYPE (gimple_get_lhs (stmt));
       tree opt = fold_ternary (VEC_PERM_EXPR, res_type, arg0, arg1, op2);
       if (!opt
 	  || (TREE_CODE (opt) != CONSTRUCTOR && TREE_CODE (opt) != VECTOR_CST))

From patchwork Fri Aug  5 12:58:16 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
X-Patchwork-Id: 56562
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 9E4883856944
	for <patchwork@sourceware.org>; Fri,  5 Aug 2022 12:58:53 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9E4883856944
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1659704333;
	bh=D+NHOzauTXLJ+7yGEG5GhWsLu9RQThInlMfH57fVPw4=;
	h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=BOt6qlrffK7MQFM47EUmoaxHFadzXoq68AgzF1rejFFoklPVQBAXQfN7D70vbAqDt
	 6a2o18mJMsIY9WMjGZ19oTkM5KMGpTuBDV1/2b+X7edpJwUlCWFnn9Ru5roIQOfQYv
	 t0WVlIkCbeK96MEkCrCLML5kKuns7CwPpBuOGiIA=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 4FEFC385840B
 for <gcc-patches@gcc.gnu.org>; Fri,  5 Aug 2022 12:58:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4FEFC385840B
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AB48E113E;
 Fri,  5 Aug 2022 05:58:23 -0700 (PDT)
Received: from [10.57.14.36] (unknown [10.57.14.36])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1796B3F73B;
 Fri,  5 Aug 2022 05:58:21 -0700 (PDT)
Message-ID: <3f90f079-8c12-2547-c925-a28779fdb267@arm.com>
Date: Fri, 5 Aug 2022 13:58:16 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
Subject: [PATCH 4/4][RFC] VLA Constructor
Content-Language: en-US
To: gcc-patches@gcc.gnu.org
References: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
In-Reply-To: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
X-Spam-Status: No, score=-22.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH,
 KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

This isn't really a 'PATCH' yet, it's something I was working on but had 
to put on hold. Feel free to re-use any bits or trash all of it if you'd 
like.

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 82f9eba5c397af04924bdebdc684a1d77682d3fd..08625aad7b1a8dc9c9f8c491cb13d8af0b46a946 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -842,13 +842,45 @@ public:
     for (unsigned int i = 0; i < nargs; ++i)
       {
 	tree elt = gimple_call_arg (f.call, i);
-	if (!CONSTANT_CLASS_P (elt))
-	  return NULL;
 	builder.quick_push (elt);
 	for (unsigned int j = 1; j < factor; ++j)
 	  builder.quick_push (build_zero_cst (TREE_TYPE (vec_type)));
       }
-    return gimple_build_assign (f.lhs, builder.build ());
+    builder.finalize ();
+    unsigned int n_elts
+      = builder.nelts_per_pattern () == 1 ? builder.npatterns ()
+					  : builder.full_nelts ().coeffs[0];
+
+    if (n_elts == 1)
+      return gimple_build_assign (f.lhs, build1 (VEC_DUPLICATE_EXPR, vec_type,
+						 builder.elt (0)));
+    tree list = NULL_TREE;
+    tree *pp = &list;
+    for (unsigned int i = 0; i < n_elts; ++i)
+      {
+	*pp = build_tree_list (NULL, builder.elt (i) PASS_MEM_STAT);
+	pp = &TREE_CHAIN (*pp);
+      }
+
+    poly_uint64 vec_len = TYPE_VECTOR_SUBPARTS (vec_type);
+    vec_perm_builder sel (vec_len, n_elts, 1);
+    for (unsigned int i = 0; i < n_elts; i++)
+      sel.quick_push (i);
+    vec_perm_indices indices (sel, 1, n_elts);
+
+    tree elt_type = TREE_TYPE (vec_type);
+
+    tree ctor_type = build_vector_type (elt_type, n_elts);
+    tree ctor = make_ssa_name_fn (cfun, ctor_type, 0);
+    gimple *ctor_stmt
+      = gimple_build_assign (ctor,
+			     build_constructor_from_list (ctor_type, list));
+    gsi_insert_before (f.gsi, ctor_stmt, GSI_SAME_STMT);
+
+    tree mask_type = build_vector_type (ssizetype, vec_len);
+    tree mask = vec_perm_indices_to_tree (mask_type, indices);
+    return gimple_build_assign (f.lhs, fold_build3 (VEC_PERM_EXPR, vec_type,
+						    ctor, ctor, mask));
   }
 
   rtx
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index bd60e65b0c3f05f1c931f03807170f3b9d699de5..dec935211e5a064239c858880a696e6ca3fe1ae2 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2544,6 +2544,17 @@
   }
 )
 
+;; Duplicate an Advanced SIMD vector to fill an SVE vector (LE version).
+(define_insn "*aarch64_vec_duplicate_reg<mode>_le"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w,w")
+	(vec_duplicate:SVE_FULL
+	  (match_operand:<VEL> 1 "register_operand" "w,r")))]
+  "TARGET_SVE && !BYTES_BIG_ENDIAN"
+  "@
+   mov\t%0.<Vetype>, %<vwcore>1
+   mov\t%0.<Vetype>, %<Vetype>1"
+)
+
 ;; Duplicate an Advanced SIMD vector to fill an SVE vector (BE version).
 ;; The SVE register layout puts memory lane N into (architectural)
 ;; register lane N, whereas the Advanced SIMD layout puts the memory
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a08043e18d609e258ebfe033875201163d129aba..9b118e4101d0a5995a833769433be49321ab2151 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -6033,7 +6033,6 @@ rtx
 aarch64_expand_sve_dupq (rtx target, machine_mode mode, rtx src)
 {
   machine_mode src_mode = GET_MODE (src);
-  gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER (src_mode));
   insn_code icode = (BYTES_BIG_ENDIAN
 		     ? code_for_aarch64_vec_duplicate_vq_be (mode)
 		     : code_for_aarch64_vec_duplicate_vq_le (mode));
@@ -21806,20 +21805,29 @@ aarch64_simd_make_constant (rtx vals)
 }
 
 static void
-aarch64_vec_duplicate (rtx target, machine_mode mode, machine_mode element_mode,
+aarch64_vec_duplicate (rtx target, rtx op, machine_mode mode, machine_mode element_mode,
 		       int narrow_n_elts)
 {
   poly_uint64 size = narrow_n_elts * GET_MODE_BITSIZE (element_mode);
-  scalar_mode i_mode = int_mode_for_size (size, 0).require ();
   machine_mode o_mode;
-  if (aarch64_sve_mode_p (mode))
-    o_mode = aarch64_full_sve_mode (i_mode).require ();
+  rtx input, output;
+  bool sve = aarch64_sve_mode_p (mode);
+  if (sve && known_eq (size, 128U))
+    {
+      o_mode = mode;
+      output = target;
+      input = op;
+    }
   else
-    o_mode
-      = aarch64_simd_container_mode (i_mode,
-				     GET_MODE_BITSIZE (mode));
-  rtx input = simplify_gen_subreg (i_mode, target, mode, 0);
-  rtx output = simplify_gen_subreg (o_mode, target, mode, 0);
+    {
+      scalar_mode i_mode = int_mode_for_size (size, 0).require ();
+      o_mode
+	= sve ? aarch64_full_sve_mode (i_mode).require ()
+	      : aarch64_simd_container_mode (i_mode,
+					     GET_MODE_BITSIZE (mode));
+      input = simplify_gen_subreg (i_mode, op, GET_MODE (op), 0);
+      output = simplify_gen_subreg (o_mode, target, mode, 0);
+    }
   aarch64_emit_move (output, gen_vec_duplicate (o_mode, input));
 }
 
@@ -21910,6 +21918,16 @@ aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
       return;
     }
 
+  /* We are constructing a VLS vector that we may later duplicate into a VLA
+     one.  Actually maybe split this into one for ASIMD and one for SVE? */
+  machine_mode real_mode = mode;
+  rtx real_target = target;
+  if (aarch64_sve_mode_p (real_mode))
+    {
+      mode = aarch64_vq_mode (GET_MODE_INNER (real_mode)).require ();
+      target = simplify_gen_subreg (mode, target, real_mode, 0);
+    }
+
   enum insn_code icode = optab_handler (vec_set_optab, mode);
   gcc_assert (icode != CODE_FOR_nothing);
 
@@ -22000,8 +22018,8 @@ aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
 	  x = copy_to_mode_reg (inner_mode, x);
 	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
 	}
-	if (!known_eq (v.full_nelts (), n_elts))
-	  aarch64_vec_duplicate (target, mode, GET_MODE (v0), n_elts);
+      if (!known_eq (v.full_nelts (), n_elts))
+	aarch64_vec_duplicate (real_target, target, real_mode, GET_MODE (v0), n_elts);
       return;
     }
 
@@ -22048,7 +22066,7 @@ aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
       emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
     }
   if (!known_eq (v.full_nelts (), n_elts))
-    aarch64_vec_duplicate (target, mode, inner_mode, n_elts);
+    aarch64_vec_duplicate (real_target, target, real_mode, inner_mode, n_elts);
 }
 
 /* Emit RTL corresponding to:
@@ -23947,11 +23965,7 @@ aarch64_evpc_sve_dup (struct expand_vec_perm_d *d)
   if (BYTES_BIG_ENDIAN
       || !d->one_vector_p
       || d->vec_flags != VEC_SVE_DATA
-      || d->op_vec_flags != VEC_ADVSIMD
-      || d->perm.encoding ().nelts_per_pattern () != 1
-      || !known_eq (d->perm.encoding ().npatterns (),
-		    GET_MODE_NUNITS (d->op_mode))
-      || !known_eq (GET_MODE_BITSIZE (d->op_mode), 128))
+      || d->perm.encoding ().nelts_per_pattern () != 1)
     return false;
 
   int npatterns = d->perm.encoding ().npatterns ();
@@ -23962,7 +23976,10 @@ aarch64_evpc_sve_dup (struct expand_vec_perm_d *d)
   if (d->testing_p)
     return true;
 
-  aarch64_expand_sve_dupq (d->target, GET_MODE (d->target), d->op0);
+  machine_mode mode = GET_MODE (d->target);
+  machine_mode element_mode = GET_MODE_INNER (mode);
+  aarch64_vec_duplicate (d->target, d->op0, mode, element_mode,
+			 d->perm.encoding ().npatterns ());
   return true;
 }
 
@@ -24194,6 +24211,15 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode,
   return ret;
 }
 
+/* Implement TARGET_VECTORIZE_VLA_CONSTRUCTOR.  */
+
+static bool
+aarch64_vectorize_vla_constructor (rtx target, rtx_vector_builder &builder)
+{
+  aarch64_expand_vector_init (target, builder);
+  return true;
+}
+
 /* Generate a byte permute mask for a register of mode MODE,
    which has NUNITS units.  */
 
@@ -27667,6 +27693,10 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_VECTORIZE_VEC_PERM_CONST \
   aarch64_vectorize_vec_perm_const
 
+#undef TARGET_VECTORIZE_VLA_CONSTRUCTOR
+#define TARGET_VECTORIZE_VLA_CONSTRUCTOR \
+  aarch64_vectorize_vla_constructor
+
 #undef TARGET_VECTORIZE_RELATED_MODE
 #define TARGET_VECTORIZE_RELATED_MODE aarch64_vectorize_related_mode
 #undef TARGET_VECTORIZE_GET_MASK_MODE
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b0ea39884aa3ced5c0ccc1e792088aa66997ec3b..eda3f014984f62d96d7fe0b3c0c439905375f25a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6112,6 +6112,11 @@ instruction pattern.  There is no need for the hook to handle these two
 implementation approaches itself.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_VLA_CONSTRUCTOR (rtx @var{target}, rtx_vector_builder @var{&builder})
+This hook is used to expand a vla constructor into @var{target}
+using the rtx_vector_builder @var{builder}.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
 This hook should return the decl of a function that implements the
 vectorized variant of the function with the @code{combined_fn} code
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f869ddd5e5b8b7acbd8e9765fb103af24a1085b6..07f4f77877b18a23f6fd205a8dd8daf1a03c2923 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4164,6 +4164,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
+@hook TARGET_VECTORIZE_VLA_CONSTRUCTOR
+
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
 @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
diff --git a/gcc/expr.cc b/gcc/expr.cc
index f9753d48245d56039206647be8576246a3b25ed3..b9eb550cac4c68464c95cffa8da19b3984b80782 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10264,6 +10264,44 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 
     case VEC_PERM_EXPR:
       {
+	if (TREE_CODE (treeop2) == VECTOR_CST
+	    && targetm.vectorize.vla_constructor)
+	  {
+	    tree ctor0, ctor1;
+	    if (TREE_CODE (treeop0) == SSA_NAME
+		&& is_gimple_assign (SSA_NAME_DEF_STMT (treeop0)))
+	      ctor0 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (treeop0));
+	    else
+	      ctor0 = treeop0;
+	    if (TREE_CODE (treeop1) == SSA_NAME
+		&& is_gimple_assign (SSA_NAME_DEF_STMT (treeop1)))
+	      ctor1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (treeop1));
+	    else
+	      ctor1 = treeop1;
+
+	    if (TREE_CODE (ctor0) == CONSTRUCTOR
+		&& TREE_CODE (ctor1) == CONSTRUCTOR)
+	      {
+
+		unsigned int nelts = vector_cst_encoded_nelts (treeop2);
+		unsigned int ctor_nelts = CONSTRUCTOR_NELTS (ctor0);
+		machine_mode mode = GET_MODE (target);
+		rtx_vector_builder builder (mode, nelts, 1);
+		for (unsigned int i = 0; i < nelts; ++i)
+		  {
+		    unsigned HOST_WIDE_INT index
+		      = tree_to_uhwi (VECTOR_CST_ENCODED_ELT (treeop2, i));
+		    tree op
+		      = index >= ctor_nelts
+			? CONSTRUCTOR_ELT (ctor1, index - ctor_nelts)->value
+			: CONSTRUCTOR_ELT (ctor0, index)->value;
+		    builder.quick_push (expand_normal (op));
+		  }
+		builder.finalize ();
+		if (targetm.vectorize.vla_constructor (target, builder))
+		  return target;
+	      }
+	  }
 	expand_operands (treeop0, treeop1, target, &op0, &op1, EXPAND_NORMAL);
 	vec_perm_builder sel;
 	if (TREE_CODE (treeop2) == VECTOR_CST
diff --git a/gcc/target.def b/gcc/target.def
index 2a7fa68f83dd15dcdd2c332e8431e6142ec7d305..3c219b6a90d9cc1a6393a3ebc24e54fcf14c6377 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1902,6 +1902,13 @@ implementation approaches itself.",
 	const vec_perm_indices &sel),
  NULL)
 
+DEFHOOK
+(vla_constructor,
+ "This hook is used to expand a vla constructor into @var{target}\n\
+using the rtx_vector_builder @var{builder}.",
+ bool, (rtx target, rtx_vector_builder &builder),
+ NULL)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
diff --git a/gcc/target.h b/gcc/target.h
index d6fa6931499d15edff3e5af3e429540d001c7058..b46b8f0d7a9c52f6efe6acf10f589703cec3bd08 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -262,6 +262,8 @@ enum poly_value_estimate_kind
 extern bool verify_type_context (location_t, type_context_kind, const_tree,
 				 bool = false);
 
+class rtx_vector_builder;
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_opt_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_opt_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01f652931555534f43e0487766c568c72a5df686
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_opt_1.c
@@ -0,0 +1,134 @@
+/* { dg-options { "-O2" } } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+#include <arm_sve.h>
+
+/*
+** test0:
+**	ins	v0.s\[1\], v1.s\[0\]
+**	mov	z0.d, d0
+**	ret
+*/
+svfloat32_t test0(float x, float y) {
+    return svdupq_n_f32(x, y, x, y);
+}
+/*
+** test1:
+**	mov	z0.s, s0
+**	ret
+*/
+
+svfloat32_t test1(float x) {
+    return svdupq_n_f32(x, x, x, x);
+}
+
+/*
+** test2:
+**	mov	z0.s, w0
+**	ret
+*/
+
+svint32_t test2(int x) {
+    return svdupq_n_s32(x, x, x, x);
+}
+
+/*
+** test3:
+**	sxth	w0, w0
+**	fmov	d0, x0
+**	ins	v0.h\[1\], w1
+**	ins	v0.h\[2\], w2
+**	ins	v0.h\[3\], w3
+**	mov	z0.d, d0
+**	ret
+*/
+
+svint16_t test3(short a, short b, short c, short d)
+{
+    return svdupq_n_s16(a, b, c, d, a, b, c, d);
+}
+
+/*
+** test4:
+**	dup	v0.4h, w0
+**	ins	v0.h\[1\], w1
+**	ins	v0.h\[3\], w1
+**	mov	z0.d, d0
+**	ret
+*/
+
+svint16_t test4(short a, short b)
+{
+    return svdupq_n_s16(a, b, a, b, a, b, a, b);
+}
+
+/*
+** test5:
+**	mov	z0.h, w0
+**	ret
+*/
+
+svint16_t test5(short a)
+{
+    return svdupq_n_s16(a, a, a, a, a, a, a, a);
+}
+/*
+** test6:
+**	sxtb	w0, w0
+**	fmov	d0, x0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	ins	v0.b\[4\], w4
+**	ins	v0.b\[5\], w5
+**	ins	v0.b\[6\], w6
+**	ins	v0.b\[7\], w7
+**	mov	z0.d, d0
+**	ret
+*/
+
+svint8_t test6(char a, char b, char c, char d, char e, char f, char g, char h)
+{
+    return svdupq_n_s8(a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h);
+}
+
+/*
+** test7:
+**	dup	v0.8b, w0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	mov	z0.s, s0
+**	ret
+*/
+
+svint8_t test7(char a, char b, char c, char d)
+{
+    return svdupq_n_s8(a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d);
+}
+
+
+// We can do better than this
+/*
+**	sxtb	w0, w0
+**	fmov	d0, x0
+**	ins	v0.d\[1\], x1
+**	ins	v0.b\[1\], w1
+**	mov	z0.h, h0
+**	ret
+*/
+
+svint8_t test8(char a, char b)
+{
+    return svdupq_n_s8(a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b);
+}
+
+/*
+** test9:
+**	mov	z0.b, w0
+**	ret
+*/
+
+svint8_t test9(char a)
+{
+    return svdupq_n_s8(a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a);
+}
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 350129555a0c71c0896c4f1003163f3b3557c11b..eaae1eefe02af3f51073310e7d17c33286b2bead 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1513,6 +1513,11 @@ lower_vec_perm (gimple_stmt_iterator *gsi)
   if (!TYPE_VECTOR_SUBPARTS (vect_type).is_constant (&elements))
     return;
 
+  /* It is possible to have a VEC_PERM_EXPR with a VLA mask and a VLS
+     CONSTRUCTOR, this should return a VLA type, so we can't lower it.  */
+  if (!TYPE_VECTOR_SUBPARTS (mask_type).is_constant ())
+    return;
+
   if (TREE_CODE (mask) == SSA_NAME)
     {
       gimple *def_stmt = SSA_NAME_DEF_STMT (mask);