[V4] Use reg mode to move sub blocks for parameters and returns

Message ID 20230104124439.191858-1-guojiufu@linux.ibm.com
State New
Headers
Series [V4] Use reg mode to move sub blocks for parameters and returns |

Commit Message

Jiufu Guo Jan. 4, 2023, 12:44 p.m. UTC
  Hi,

When assigning a parameter to a variable, or assigning a variable to
return value with struct type, "block move" may be used to expand
the assignment if the parameter/return is passing through registers and
the parameter/return has BLK mode.
For this kind of case, when moving the blocks, it would be better to use
the nature mode of the registers.
This would raise more opportunities for other optimization passes(cse,
dse, xprop).

As the example code (like code in PR65421):

typedef struct SA {double a[3];} A;
A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s)
A ret_arg (A a) {return a;} // just empty fun body
void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)

This patches check the "from" and "to" of an assignment in
"expand_assignment", if it is about param/ret which may passing via
register, then use the register nature mode to move sub-blocks for
the assignning.

This patch may be still useful even if we change the behavior of
parameter setup or adopt SRA-like code in expender.

Comparing with previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
This patch update the code slightly and merged/added test cases.
And I checked the cases with large struct or non-homogeneous struct
to confirm it does not degrade the code.

Bootstrap and regtest pass on ppc64{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu)

	PR target/65421

gcc/ChangeLog:

	* cfgexpand.cc (expand_used_vars): Update to mark DECL_USEDBY_RETURN_P
	for returns.
	* expr.cc (move_sub_blocks): New function.
	(expand_assignment): Update to call move_sub_blocks for returns or
	parameters.
	* function.cc (assign_parm_setup_block): Update to mark
	DECL_REGS_TO_STACK_P for parameter.
	* tree-core.h (struct tree_decl_common): Add comment.
	* tree.h (DECL_USEDBY_RETURN_P): New define.
	(DECL_REGS_TO_STACK_P): New define.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr65421-1.c: New test.
	* gcc.target/powerpc/pr65421.c: New test.

---
 gcc/cfgexpand.cc                             | 14 ++++
 gcc/expr.cc                                  | 77 ++++++++++++++++++++
 gcc/function.cc                              |  3 +
 gcc/tree-core.h                              |  4 +-
 gcc/tree.h                                   |  9 +++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |  6 ++
 gcc/testsuite/gcc.target/powerpc/pr65421.c   | 33 +++++++++
 7 files changed, 145 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
  

Comments

Jiufu Guo March 2, 2023, 9:21 a.m. UTC | #1
Hi,

Ping this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609394.html

Thanks for any comments and suggestions!


BR,
Jeff (Jiufu)


Jiufu Guo <guojiufu@linux.ibm.com> writes:

> Hi,
>
> When assigning a parameter to a variable, or assigning a variable to
> return value with struct type, "block move" may be used to expand
> the assignment if the parameter/return is passing through registers and
> the parameter/return has BLK mode.
> For this kind of case, when moving the blocks, it would be better to use
> the nature mode of the registers.
> This would raise more opportunities for other optimization passes(cse,
> dse, xprop).
>
> As the example code (like code in PR65421):
>
> typedef struct SA {double a[3];} A;
> A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s)
> A ret_arg (A a) {return a;} // just empty fun body
> void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)
>
> This patches check the "from" and "to" of an assignment in
> "expand_assignment", if it is about param/ret which may passing via
> register, then use the register nature mode to move sub-blocks for
> the assignning.
>
> This patch may be still useful even if we change the behavior of
> parameter setup or adopt SRA-like code in expender.
>
> Comparing with previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
> This patch update the code slightly and merged/added test cases.
> And I checked the cases with large struct or non-homogeneous struct
> to confirm it does not degrade the code.
>
> Bootstrap and regtest pass on ppc64{,le} and x86_64.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu)
>
> 	PR target/65421
>
> gcc/ChangeLog:
>
> 	* cfgexpand.cc (expand_used_vars): Update to mark DECL_USEDBY_RETURN_P
> 	for returns.
> 	* expr.cc (move_sub_blocks): New function.
> 	(expand_assignment): Update to call move_sub_blocks for returns or
> 	parameters.
> 	* function.cc (assign_parm_setup_block): Update to mark
> 	DECL_REGS_TO_STACK_P for parameter.
> 	* tree-core.h (struct tree_decl_common): Add comment.
> 	* tree.h (DECL_USEDBY_RETURN_P): New define.
> 	(DECL_REGS_TO_STACK_P): New define.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/pr65421-1.c: New test.
> 	* gcc.target/powerpc/pr65421.c: New test.
>
> ---
>  gcc/cfgexpand.cc                             | 14 ++++
>  gcc/expr.cc                                  | 77 ++++++++++++++++++++
>  gcc/function.cc                              |  3 +
>  gcc/tree-core.h                              |  4 +-
>  gcc/tree.h                                   |  9 +++
>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c |  6 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421.c   | 33 +++++++++
>  7 files changed, 145 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index dd29ffffc03..09b8ec64cea 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
>      frame_phase = off ? align - off : 0;
>    }
>  
> +  /* Collect VARs on returns.  */
> +  if (DECL_RESULT (current_function_decl))
> +    {
> +      edge_iterator ei;
> +      edge e;
> +      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +	if (greturn *ret = safe_dyn_cast<greturn *> (last_stmt (e->src)))
> +	  {
> +	    tree val = gimple_return_retval (ret);
> +	    if (val && VAR_P (val))
> +	      DECL_USEDBY_RETURN_P (val) = 1;
> +	  }
> +    }
> +
>    /* Set TREE_USED on all variables in the local_decls.  */
>    FOR_EACH_LOCAL_DECL (cfun, i, var)
>      TREE_USED (var) = 1;
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index d9407432ea5..afcec6f3c10 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -5559,6 +5559,51 @@ mem_ref_refers_to_non_mem_p (tree ref)
>    return non_mem_decl_p (base);
>  }
>  
> +/* Sub routine of expand_assignment, invoked when assigning from a
> +   parameter or assigning to a return val on struct type which may
> +   be passed through registers.  The mode of register is used to
> +   move the content for the assignment.
> +
> +   This routine generates code for expression FROM which is BLKmode,
> +   and move the generated content to TO_RTX by su-blocks in SUB_MODE.  */
> +
> +static void
> +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode, bool nontemporal)
> +{
> +  gcc_assert (MEM_P (to_rtx));
> +
> +  HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant ();
> +  HOST_WIDE_INT sub_size = GET_MODE_SIZE (sub_mode).to_constant ();
> +  HOST_WIDE_INT len = size / sub_size;
> +
> +  /* It would be not profitable to move through sub-modes, if the size does
> +     not meet register mode.  */
> +  if ((size % sub_size) != 0)
> +    {
> +      push_temp_slots ();
> +      rtx result = store_expr (from, to_rtx, 0, nontemporal, false);
> +      preserve_temp_slots (result);
> +      pop_temp_slots ();
> +      return;
> +    }
> +
> +  push_temp_slots ();
> +
> +  rtx from_rtx = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL);
> +  for (int i = 0; i < len; i++)
> +    {
> +      rtx temp = gen_reg_rtx (sub_mode);
> +      rtx src = adjust_address (from_rtx, sub_mode, sub_size * i);
> +      rtx dest = adjust_address (to_rtx, sub_mode, sub_size * i);
> +      emit_move_insn (temp, src);
> +      emit_move_insn (dest, temp);
> +    }
> +
> +  preserve_temp_slots (to_rtx);
> +  pop_temp_slots ();
> +  return;
> +}
> +
>  /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
>     is true, try generating a nontemporal store.  */
>  
> @@ -6045,6 +6090,38 @@ expand_assignment (tree to, tree from, bool nontemporal)
>        return;
>      }
>  
> +  /* If it is assigning from a struct param which may be passed via registers,
> +     It would be better to use the register's mode to move sub-blocks for the
> +     assignment.  */
> +  if (TREE_CODE (from) == PARM_DECL && mode == BLKmode
> +      && DECL_REGS_TO_STACK_P (from))
> +    {
> +      rtx parm = DECL_INCOMING_RTL (from);
> +      gcc_assert (REG_P (parm) || GET_CODE (parm) == PARALLEL);
> +
> +      machine_mode sub_mode;
> +      if (REG_P (parm))
> +	sub_mode = word_mode;
> +      else
> +	sub_mode = GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0));
> +
> +      move_sub_blocks (to_rtx, from, sub_mode, nontemporal);
> +      return;
> +    }
> +
> +  /* If it is assigning to a struct var which will be returned, and the
> +     function is returning via registers, it would be better to use the
> +     register's mode to move sub-blocks for the assignment.  */
> +  if (VAR_P (to) && DECL_USEDBY_RETURN_P (to) && mode == BLKmode
> +      && TREE_CODE (from) != CONSTRUCTOR
> +      && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) == PARALLEL)
> +    {
> +      rtx ret = DECL_RTL (DECL_RESULT (current_function_decl));
> +      machine_mode sub_mode = GET_MODE (XEXP (XVECEXP (ret, 0, 0), 0));
> +      move_sub_blocks (to_rtx, from, sub_mode, nontemporal);
> +      return;
> +    }
> +
>    /* Compute FROM and store the value in the rtx we got.  */
>  
>    push_temp_slots ();
> diff --git a/gcc/function.cc b/gcc/function.cc
> index dc333c27e92..0ff89d89365 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -2991,6 +2991,9 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
>  
>        mem = validize_mem (copy_rtx (stack_parm));
>  
> +      if (MEM_P (mem))
> +	DECL_REGS_TO_STACK_P (parm) = 1;
> +
>        /* Handle values in multiple non-contiguous locations.  */
>        if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem))
>  	emit_group_store (mem, entry_parm, data->arg.type, size);
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index e146b133dbd..c76d08bd109 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1808,7 +1808,9 @@ struct GTY(()) tree_decl_common {
>       In VAR_DECL, PARM_DECL and RESULT_DECL, this is
>       DECL_HAS_VALUE_EXPR_P.  */
>    unsigned decl_flag_2 : 1;
> -  /* In FIELD_DECL, this is DECL_PADDING_P.  */
> +  /* In FIELD_DECL, this is DECL_PADDING_P
> +     In VAR_DECL, this is DECL_USEDBY_RETURN_P
> +     In PARM_DECL, this is DECL_REGS_TO_STACK_P.  */
>    unsigned decl_flag_3 : 1;
>    /* Logically, these two would go in a theoretical base shared by var and
>       parm decl. */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 7e26e726bc5..3f732e3c00c 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -3009,6 +3009,15 @@ extern void decl_value_expr_insert (tree, tree);
>  #define DECL_PADDING_P(NODE) \
>    (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_3)
>  
> +/* Used in a VAR_DECL to indicate that it is used by a return stmt.  */
> +#define DECL_USEDBY_RETURN_P(NODE) \
> +  (VAR_DECL_CHECK (NODE)->decl_common.decl_flag_3)
> +
> +/* Used in a PARM_DECL to indicate that it is struct parameter passed
> +   by registers totally and stored to stack during setup.  */
> +#define DECL_REGS_TO_STACK_P(NODE) \
> +  (PARM_DECL_CHECK (NODE)->decl_common.decl_flag_3)
> +
>  /* Used in a FIELD_DECL to indicate whether this field is not a flexible
>     array member. This is only valid for the last array type field of a
>     structure.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
> new file mode 100644
> index 00000000000..4e1f87f7939
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
> @@ -0,0 +1,6 @@
> +/* PR target/65421 */
> +/* { dg-options "-O2" } */
> +
> +typedef struct LARGE {double a[4]; int arr[32];} LARGE;
> +LARGE foo (LARGE a){return a;}
> +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c b/gcc/testsuite/gcc.target/powerpc/pr65421.c
> new file mode 100644
> index 00000000000..fd5ad542c64
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c
> @@ -0,0 +1,33 @@
> +/* PR target/65421 */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target powerpc_elfv2 } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +typedef struct FLOATS
> +{
> +  double a[3];
> +} FLOATS;
> +
> +/* 3 lfd */
> +FLOATS ret_arg_pt (FLOATS *a){return *a;}
> +/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */
> +
> +/* 3 stfd */
> +void st_arg (FLOATS a, FLOATS *p) {*p = a;}
> +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
> +
> +/* blr */
> +FLOATS ret_arg (FLOATS a) {return a;}
> +
> +typedef struct MIX
> +{
> +  double a[2];
> +  long l;
> +} MIX;
> +
> +/* std 3 param regs to return slot */
> +MIX ret_arg1 (MIX a) {return a;}
> +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */
> +
> +/* count insns */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 13 } } */
  

Patch

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index dd29ffffc03..09b8ec64cea 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2158,6 +2158,20 @@  expand_used_vars (bitmap forced_stack_vars)
     frame_phase = off ? align - off : 0;
   }
 
+  /* Collect VARs on returns.  */
+  if (DECL_RESULT (current_function_decl))
+    {
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+	if (greturn *ret = safe_dyn_cast<greturn *> (last_stmt (e->src)))
+	  {
+	    tree val = gimple_return_retval (ret);
+	    if (val && VAR_P (val))
+	      DECL_USEDBY_RETURN_P (val) = 1;
+	  }
+    }
+
   /* Set TREE_USED on all variables in the local_decls.  */
   FOR_EACH_LOCAL_DECL (cfun, i, var)
     TREE_USED (var) = 1;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..afcec6f3c10 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -5559,6 +5559,51 @@  mem_ref_refers_to_non_mem_p (tree ref)
   return non_mem_decl_p (base);
 }
 
+/* Sub routine of expand_assignment, invoked when assigning from a
+   parameter or assigning to a return val on struct type which may
+   be passed through registers.  The mode of register is used to
+   move the content for the assignment.
+
+   This routine generates code for expression FROM which is BLKmode,
+   and move the generated content to TO_RTX by su-blocks in SUB_MODE.  */
+
+static void
+move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode, bool nontemporal)
+{
+  gcc_assert (MEM_P (to_rtx));
+
+  HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant ();
+  HOST_WIDE_INT sub_size = GET_MODE_SIZE (sub_mode).to_constant ();
+  HOST_WIDE_INT len = size / sub_size;
+
+  /* It would be not profitable to move through sub-modes, if the size does
+     not meet register mode.  */
+  if ((size % sub_size) != 0)
+    {
+      push_temp_slots ();
+      rtx result = store_expr (from, to_rtx, 0, nontemporal, false);
+      preserve_temp_slots (result);
+      pop_temp_slots ();
+      return;
+    }
+
+  push_temp_slots ();
+
+  rtx from_rtx = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL);
+  for (int i = 0; i < len; i++)
+    {
+      rtx temp = gen_reg_rtx (sub_mode);
+      rtx src = adjust_address (from_rtx, sub_mode, sub_size * i);
+      rtx dest = adjust_address (to_rtx, sub_mode, sub_size * i);
+      emit_move_insn (temp, src);
+      emit_move_insn (dest, temp);
+    }
+
+  preserve_temp_slots (to_rtx);
+  pop_temp_slots ();
+  return;
+}
+
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
 
@@ -6045,6 +6090,38 @@  expand_assignment (tree to, tree from, bool nontemporal)
       return;
     }
 
+  /* If it is assigning from a struct param which may be passed via registers,
+     It would be better to use the register's mode to move sub-blocks for the
+     assignment.  */
+  if (TREE_CODE (from) == PARM_DECL && mode == BLKmode
+      && DECL_REGS_TO_STACK_P (from))
+    {
+      rtx parm = DECL_INCOMING_RTL (from);
+      gcc_assert (REG_P (parm) || GET_CODE (parm) == PARALLEL);
+
+      machine_mode sub_mode;
+      if (REG_P (parm))
+	sub_mode = word_mode;
+      else
+	sub_mode = GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0));
+
+      move_sub_blocks (to_rtx, from, sub_mode, nontemporal);
+      return;
+    }
+
+  /* If it is assigning to a struct var which will be returned, and the
+     function is returning via registers, it would be better to use the
+     register's mode to move sub-blocks for the assignment.  */
+  if (VAR_P (to) && DECL_USEDBY_RETURN_P (to) && mode == BLKmode
+      && TREE_CODE (from) != CONSTRUCTOR
+      && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) == PARALLEL)
+    {
+      rtx ret = DECL_RTL (DECL_RESULT (current_function_decl));
+      machine_mode sub_mode = GET_MODE (XEXP (XVECEXP (ret, 0, 0), 0));
+      move_sub_blocks (to_rtx, from, sub_mode, nontemporal);
+      return;
+    }
+
   /* Compute FROM and store the value in the rtx we got.  */
 
   push_temp_slots ();
diff --git a/gcc/function.cc b/gcc/function.cc
index dc333c27e92..0ff89d89365 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -2991,6 +2991,9 @@  assign_parm_setup_block (struct assign_parm_data_all *all,
 
       mem = validize_mem (copy_rtx (stack_parm));
 
+      if (MEM_P (mem))
+	DECL_REGS_TO_STACK_P (parm) = 1;
+
       /* Handle values in multiple non-contiguous locations.  */
       if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem))
 	emit_group_store (mem, entry_parm, data->arg.type, size);
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e146b133dbd..c76d08bd109 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1808,7 +1808,9 @@  struct GTY(()) tree_decl_common {
      In VAR_DECL, PARM_DECL and RESULT_DECL, this is
      DECL_HAS_VALUE_EXPR_P.  */
   unsigned decl_flag_2 : 1;
-  /* In FIELD_DECL, this is DECL_PADDING_P.  */
+  /* In FIELD_DECL, this is DECL_PADDING_P
+     In VAR_DECL, this is DECL_USEDBY_RETURN_P
+     In PARM_DECL, this is DECL_REGS_TO_STACK_P.  */
   unsigned decl_flag_3 : 1;
   /* Logically, these two would go in a theoretical base shared by var and
      parm decl. */
diff --git a/gcc/tree.h b/gcc/tree.h
index 7e26e726bc5..3f732e3c00c 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3009,6 +3009,15 @@  extern void decl_value_expr_insert (tree, tree);
 #define DECL_PADDING_P(NODE) \
   (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_3)
 
+/* Used in a VAR_DECL to indicate that it is used by a return stmt.  */
+#define DECL_USEDBY_RETURN_P(NODE) \
+  (VAR_DECL_CHECK (NODE)->decl_common.decl_flag_3)
+
+/* Used in a PARM_DECL to indicate that it is struct parameter passed
+   by registers totally and stored to stack during setup.  */
+#define DECL_REGS_TO_STACK_P(NODE) \
+  (PARM_DECL_CHECK (NODE)->decl_common.decl_flag_3)
+
 /* Used in a FIELD_DECL to indicate whether this field is not a flexible
    array member. This is only valid for the last array type field of a
    structure.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
new file mode 100644
index 00000000000..4e1f87f7939
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
@@ -0,0 +1,6 @@ 
+/* PR target/65421 */
+/* { dg-options "-O2" } */
+
+typedef struct LARGE {double a[4]; int arr[32];} LARGE;
+LARGE foo (LARGE a){return a;}
+/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c b/gcc/testsuite/gcc.target/powerpc/pr65421.c
new file mode 100644
index 00000000000..fd5ad542c64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c
@@ -0,0 +1,33 @@ 
+/* PR target/65421 */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+typedef struct FLOATS
+{
+  double a[3];
+} FLOATS;
+
+/* 3 lfd */
+FLOATS ret_arg_pt (FLOATS *a){return *a;}
+/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */
+
+/* 3 stfd */
+void st_arg (FLOATS a, FLOATS *p) {*p = a;}
+/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
+
+/* blr */
+FLOATS ret_arg (FLOATS a) {return a;}
+
+typedef struct MIX
+{
+  double a[2];
+  long l;
+} MIX;
+
+/* std 3 param regs to return slot */
+MIX ret_arg1 (MIX a) {return a;}
+/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */
+
+/* count insns */
+/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 13 } } */