From patchwork Tue Jun  7 00:55:00 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Meissner <meissner@linux.ibm.com>
X-Patchwork-Id: 54859
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 83BCC386F0D6
	for <patchwork@sourceware.org>; Tue,  7 Jun 2022 00:55:36 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 83BCC386F0D6
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1654563336;
	bh=Oqlk9pBp15OZgHGhMazP9Jjzn9zQt8jItwAnyxLhIys=;
	h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=SCA+ynkLddFWBBfyqnXhbbmJFOeeWTtU+fhfznfVNj7JnU8OmHMSPUYSndSYClKjk
	 xTQLss1Nvjz799DjnOGPxQjMnvu7SedfodZd3/owtSQLfd/t1l4MkFSDff6YoWF+u7
	 8m/8qBdfpCGDZPR7+CacCf1s/c4TjXOmoKdsSg08=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com
 [148.163.158.5])
 by sourceware.org (Postfix) with ESMTPS id 21272386F0C4
 for <gcc-patches@gcc.gnu.org>; Tue,  7 Jun 2022 00:55:05 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 21272386F0C4
Received: from pps.filterd (m0098421.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id
 256N1ukx022296;
 Tue, 7 Jun 2022 00:55:04 GMT
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ghtw8hcby-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 07 Jun 2022 00:55:04 +0000
Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1])
 by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2570cDp8016968;
 Tue, 7 Jun 2022 00:55:04 GMT
Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com
 [169.63.121.186])
 by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ghtw8hcbu-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 07 Jun 2022 00:55:04 +0000
Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1])
 by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2570oEA7013255;
 Tue, 7 Jun 2022 00:55:03 GMT
Received: from b03cxnp08027.gho.boulder.ibm.com
 (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19])
 by ppma03wdc.us.ibm.com with ESMTP id 3gfy19gygx-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 07 Jun 2022 00:55:03 +0000
Received: from b03ledav002.gho.boulder.ibm.com
 (b03ledav002.gho.boulder.ibm.com [9.17.130.233])
 by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 2570t2Z713894016
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Tue, 7 Jun 2022 00:55:02 GMT
Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 38101136053;
 Tue,  7 Jun 2022 00:55:02 +0000 (GMT)
Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 9FEFA136051;
 Tue,  7 Jun 2022 00:55:01 +0000 (GMT)
Received: from toto.the-meissners.org (unknown [9.160.87.14])
 by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTPS;
 Tue,  7 Jun 2022 00:55:01 +0000 (GMT)
Date: Mon, 6 Jun 2022 20:55:00 -0400
To: Michael Meissner <meissner@linux.ibm.com>, gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
Subject: [PATCH 1/3] Disable generating store vector pair.
Message-ID: <Yp6h5EF3TQea1tvz@toto.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>,
 gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>,
 David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>,
 Will Schmidt <will_schmidt@vnet.ibm.com>
References: <Yp6hdmJqK1oDsbzB@toto.the-meissners.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <Yp6hdmJqK1oDsbzB@toto.the-meissners.org>
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: fUYEZqu-k-OVUNdT6xTIoY-0AWalpaaW
X-Proofpoint-GUID: RihfDlTI_YIXYWPtbrWia_EgqW2YNLTy
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.517,FMLib:17.11.64.514
 definitions=2022-06-06_07,2022-06-03_01,2022-02-23_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 clxscore=1015 impostorscore=0
 phishscore=0 bulkscore=0 adultscore=0 priorityscore=1501 mlxscore=0
 suspectscore=0 malwarescore=0 mlxlogscore=999 spamscore=0
 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2204290000 definitions=main-2206070000
X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT,
 RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Michael Meissner via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Michael Meissner <meissner@linux.ibm.com>
Reply-To: Michael Meissner <meissner@linux.ibm.com>
Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>

[PATCH 1/3] Disable generating store vector pair.

Testing has revealed that the power10 has some slowdowns if the store
vector pair instruction is generated in some cases.  This patch disables
generating the store vector pair instructions (stxvp, pstxvp, and stxvpx)
unless an undocumented switch is used.  It is anticipated that perhaps
with future machines we can generate the store vector pair instruction.

This patch does a split after reload to convert a store vector pair
instruction into a pair of store vector instructions.

We do continue to generate the load vector pair instructions (lxvp, plxvp,
and lxvpx), since we have found that in code that heavily uses MMA, it is
still a win to generate the load vector pair instructions.

There are two future patches planed:

    1)	Disable block moves from generating load/store vector pair
	instructions unless the the store vector pair instructions are
	being generted.

    2)	Make the built-in functions for generating store vector pair
	always generate those instructions even if store vector pair
	instructions are disabled.

I have built bootstrap compilers and run the regression tests on three
different systems:

    1)	Little endian power10 using the --with-cpu=power10 option.

    2)	Little endian power9 using the --with-cpu=power9 option.

    3)	Big endian power8 using the --with-cpu=power8 option.  On this system,
	both 64-bit and 32-bit code generation was tested.

There were no regressions in the runs except for the tests that are
modified in patch #3 in these series of patches.  Can I check this patch
into the trunk?  If there are no changes needed for the backports, can I
check this code into the active branches after a burn-in period?

2022-06-06   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/mma.md (movoo): Disable generating store vector
	pair instructions unless these are enabled by the user.
	(movxo): Likewise.
	* config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): If store
	vector pair instructions are disabled, do not allow vector pair
	addresses to be indexed.
	(rs6000_split_multireg_move): Do not split XOmode stores into two
	store vector pair instructions unless store vector pair
	instructions are enabled.
	* config/rs6000/rs6000.md (isa attribute): Add stxvp attribute.
	(enabled attribute): Disable alternative using store vector pair
	instructions unless they are enabled.
	* config/rs6000/rs6000.opt (-mstore-vector-pair): New option.

gcc/testsuite/

	* gcc.target/powerpc/p10-store-vector-pair-1.c: New test.
	* gcc.target/powerpc/p10-store-vector-pair-2.c: New test.
---
 gcc/config/rs6000/mma.md                      | 41 ++++++----
 gcc/config/rs6000/rs6000.cc                   |  9 +-
 gcc/config/rs6000/rs6000.md                   |  8 +-
 gcc/config/rs6000/rs6000.opt                  |  4 +
 .../powerpc/p10-store-vector-pair-1.c         | 82 +++++++++++++++++++
 .../powerpc/p10-store-vector-pair-2.c         | 81 ++++++++++++++++++
 6 files changed, 206 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index a183b6a168a..9b5f243b88d 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -274,26 +274,35 @@ (define_expand "movoo"
   DONE;
 })
 
+;; By default for power10, do not generate the stxvp/pstxvp/stxvpx
+;; instructions.  Instead, split these instructions into two separate store
+;; vector instructions.  We do always generate a lxvp/plxvp/lxvpx instruction.
+;; We leave in the support for generating stxvp/pstxvp/stxvpx in future
+;; machines.
 (define_insn_and_split "*movoo"
-  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,m,wa")
-	(match_operand:OO 1 "input_operand" "m,wa,wa"))]
+  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,m, o, wa")
+        (match_operand:OO 1 "input_operand"         "m, wa,wa,wa"))]
   "TARGET_MMA
    && (gpc_reg_operand (operands[0], OOmode)
        || gpc_reg_operand (operands[1], OOmode))"
   "@
    lxvp%X1 %x0,%1
    stxvp%X0 %x1,%0
+   #
    #"
   "&& reload_completed
-   && (!MEM_P (operands[0]) && !MEM_P (operands[1]))"
+   && ((MEM_P (operands[0]) && !TARGET_STORE_VECTOR_PAIR)
+       || (!MEM_P (operands[0]) && !MEM_P (operands[1])))"
   [(const_int 0)]
 {
   rs6000_split_multireg_move (operands[0], operands[1]);
   DONE;
 }
-  [(set_attr "type" "vecload,vecstore,veclogical")
-   (set_attr "size" "256")
-   (set_attr "length" "*,*,8")])
+  [(set_attr "type"               "vecload,vecstore,vecstore,veclogical")
+   (set_attr "max_prefixed_insns" "*,      *,       2,       *")
+   (set_attr "length"             "*,      *,       *,       8")
+   (set_attr "isa"                "*,      stxvp,   *,       *")])
+   (set_attr "size"               "256")
 
 
 ;; Vector quad support.  XOmode can only live in FPRs.
@@ -306,25 +315,27 @@ (define_expand "movxo"
   DONE;
 })
 
+;; By default for power10, do not generate two stxvp/pstxvp instructions.
+;; Instead, split these instructions into four separate store vector
+;; instructions.  We do always generate two lxvp/plxvp instructions.  We leave
+;; in the support for generating stxvp/pstxvp in future machines.
 (define_insn_and_split "*movxo"
-  [(set (match_operand:XO 0 "nonimmediate_operand" "=d,m,d")
-	(match_operand:XO 1 "input_operand" "m,d,d"))]
+  [(set (match_operand:XO 0 "nonimmediate_operand" "=d,m,o,d")
+	(match_operand:XO 1 "input_operand"         "m,d,d,d"))]
   "TARGET_MMA
    && (gpc_reg_operand (operands[0], XOmode)
        || gpc_reg_operand (operands[1], XOmode))"
-  "@
-   #
-   #
-   #"
+  "#"
   "&& reload_completed"
   [(const_int 0)]
 {
   rs6000_split_multireg_move (operands[0], operands[1]);
   DONE;
 }
-  [(set_attr "type" "vecload,vecstore,veclogical")
-   (set_attr "length" "*,*,16")
-   (set_attr "max_prefixed_insns" "2,2,*")])
+  [(set_attr "type"               "vecload,vecstore,vecstore,veclogical")
+   (set_attr "length"             "*,      *,       *,       16")
+   (set_attr "max_prefixed_insns" "2,      2,       4,       *")
+   (set_attr "isa"                "*,      stxvp,   *,       *")])
 
 (define_expand "vsx_assemble_pair"
   [(match_operand:OO 0 "vsx_register_operand")
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0af2085adc0..30ed24fff30 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -2714,7 +2714,8 @@ rs6000_setup_reg_addr_masks (void)
 	  /* Vector pairs can do both indexed and offset loads if the
 	     instructions are enabled, otherwise they can only do offset loads
 	     since it will be broken into two vector moves.  Vector quads can
-	     only do offset loads.  */
+	     only do offset loads.  If stxvp is disabled, we can't do indexed
+	     arithmetic.  */
 	  else if ((addr_mask != 0) && TARGET_MMA
 		   && (m2 == OOmode || m2 == XOmode))
 	    {
@@ -2722,7 +2723,8 @@ rs6000_setup_reg_addr_masks (void)
 	      if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX)
 		{
 		  addr_mask |= RELOAD_REG_QUAD_OFFSET;
-		  if (m2 == OOmode)
+		  if (m2 == OOmode
+		      && TARGET_STORE_VECTOR_PAIR)
 		    addr_mask |= RELOAD_REG_INDEXED;
 		}
 	    }
@@ -26992,7 +26994,8 @@ rs6000_split_multireg_move (rtx dst, rtx src)
   /* If we have a vector quad register for MMA, and this is a load or store,
      see if we can use vector paired load/stores.  */
   if (mode == XOmode && TARGET_MMA
-      && (MEM_P (dst) || MEM_P (src)))
+      && ((MEM_P (dst) && TARGET_STORE_VECTOR_PAIR)
+	  || MEM_P (src)))
     {
       reg_mode = OOmode;
       nregs /= 2;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3eca448a262..7eb107148ca 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@ (define_attr "cpu"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,stxvp"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,12 @@ (define_attr "enabled" ""
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+
+     (and (eq_attr "isa" "stxvp")
+	  (match_test "TARGET_POWER10")
+	  (match_test "TARGET_STORE_VECTOR_PAIR"))
+     (const_int 1)
+
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 4931d781c4e..79ceec6e6a5 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -624,6 +624,10 @@ mieee128-constant
 Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the LXVKQ instruction.
 
+; Generate (do not generate) code that uses the store vector pair instruction.
+mstore-vector-pair
+Target Undocumented Var(TARGET_STORE_VECTOR_PAIR) Init(0) Save
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c b/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
new file mode 100644
index 00000000000..c1a36bf5fff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
@@ -0,0 +1,82 @@
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mstore-vector-pair -mmma" } */
+
+/* Test if we generate store vector pair instructions if the user uses the
+   -mstore-vector-pair option.  */
+static __vector_quad sq;
+static __vector_pair sp;
+
+void
+load_store_pair (__vector_pair *p, __vector_pair *q)
+{
+  *p = *q;			/* lxvp, stxvp.  */
+}
+
+void
+load_store_pair_1 (__vector_pair *p, __vector_pair *q)
+{
+  p[1] = q[1];			/* lxvp, stxvp.  */
+}
+
+void
+load_store_pair_0x10000 (__vector_pair *p, __vector_pair *q)
+{
+  p[0x10000] = q[0x10000];	/* plxvp, pstxvp.  */
+}
+
+void
+load_store_pair_n (__vector_pair *p, __vector_pair *q, unsigned long n)
+{
+  p[n] = q[n];			/* lxvpx, 2x stxvp.  */
+}
+
+void
+load_pair_static (__vector_pair *p)
+{
+  *p = sp;			/* plxvp, stxvp.  */
+}
+
+void
+store_pair_static (__vector_pair *p)
+{
+  sp = *p;			/* lxvp, pstxvp.  */
+}
+
+void
+load_store_quad (__vector_quad *p, __vector_quad *q)
+{
+  *p = *q;			/* 2x lxvp, 2x stxvp.  */
+}
+
+void
+load_store_quad_1 (__vector_quad *p, __vector_quad *q)
+{
+  p[1] = q[1];			/* 2x lxvp, 2x stxvp.  */
+}
+
+void
+load_store_quad_0x10000 (__vector_quad *p, __vector_quad *q)
+{
+  p[0x10000] = q[0x10000];	/* 2x plxvp, 2x pstxvp.  */
+}
+
+void
+load_store_quad_n (__vector_quad *p, __vector_quad *q, unsigned long n)
+{
+  p[n] = q[n];			/* 2x lxvp, 2x stxv.  */
+}
+
+void
+load_quad_static (__vector_quad *p)
+{
+  *p = sq;			/* 2x plxvp, 2x stxvp.  */
+}
+
+void
+store_quad_static (__vector_quad *p)
+{
+  sq = *p;			/* 2x lxvp, 2x stxvp.  */
+}
+
+/* { dg-final { scan-assembler {\mp?stxvpx?\M}  } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c b/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c
new file mode 100644
index 00000000000..b8c3bdbfd89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c
@@ -0,0 +1,81 @@
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-store-vector-pair -mmma" } */
+
+/* Test if we do not generate store vector pair instructions if the user uses
+   the -mno-store-vector-pair option.  */
+static __vector_quad sq;
+static __vector_pair sp;
+
+void
+load_store_pair (__vector_pair *p, __vector_pair *q)
+{
+  *p = *q;			/* lxvp, 2x stxv.  */
+}
+
+void
+load_store_pair_1 (__vector_pair *p, __vector_pair *q)
+{
+  p[1] = q[1];			/* lxvp, 2x stxv.  */
+}
+
+void
+load_store_pair_0x10000 (__vector_pair *p, __vector_pair *q)
+{
+  p[0x10000] = q[0x10000];	/* plxvp, 2x pstxv.  */
+}
+
+void
+load_store_pair_n (__vector_pair *p, __vector_pair *q, unsigned long n)
+{
+  p[n] = q[n];			/* lxvpx, 2x stxv.  */
+}
+
+void
+load_pair_static (__vector_pair *p)
+{
+  *p = sp;			/* plxvp, 2x stxv.  */
+}
+
+void
+store_pair_static (__vector_pair *p)
+{
+  sp = *p;			/* lxvp, 2x pstxv.  */
+}
+
+void
+load_store_quad (__vector_quad *p, __vector_quad *q)
+{
+  *p = *q;			/* 2x lxvp, 4x stxv.  */
+}
+
+void
+load_store_quad_1 (__vector_quad *p, __vector_quad *q)
+{
+  p[1] = q[1];			/* 2x lxvp, 4x stxv.  */
+}
+
+void
+load_store_quad_0x10000 (__vector_quad *p, __vector_quad *q)
+{
+  p[0x10000] = q[0x10000];	/* 2x plxvp, 4x pstxv.  */
+}
+
+void
+load_store_quad_n (__vector_quad *p, __vector_quad *q, unsigned long n)
+{
+  p[n] = q[n];			/* 2x lxvp, 4x stxv.  */
+}
+
+void
+load_quad_static (__vector_quad *p)
+{
+  *p = sq;			/* 2x plxvp, 4x stxv.  */
+}
+
+void
+store_quad_static (__vector_quad *p)
+{
+  sq = *p;			/* 2x lxvp, 4x pstxv.  */
+}
+
+/* { dg-final { scan-assembler-not {\mp?vstxvpx?\M} } } */