Using sub-scalars mode to move struct block

Message ID 20221111082532.24898-1-guojiufu@linux.ibm.com
State New
Headers
Series Using sub-scalars mode to move struct block |

Commit Message

Jiufu Guo Nov. 11, 2022, 8:25 a.m. UTC
  Hi,

When assigning a struct parameter to another variable, or loading a
memory block to a struct var (especially for return value),
Now, "block move" would be used during expand the assignment. And
the "block move" may use a type/mode different from the mode which
is accessing the var. e.g. on ppc64le, V2DI would be used to move
the block of 16bytes.

And then, this "block move" would prevent optimization passes from
leaping/crossing over the assignment. PR65421 reflects this issue.

As the example code in PR65421.

typedef struct { double a[4]; } A;
A foo (const A *a) { return *a; }

On ppc64le, the below instructions are used for the "block move":
  7: r122:V2DI=[r121:DI]
  8: r124:V2DI=[r121:DI+r123:DI]
  9: [r112:DI]=r122:V2DI
  10: [r112:DI+0x10]=r124:V2DI

For this issue, a few comments/suggestions are mentioned via RFC:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
I drafted a patch which is updating the behavior of block_move for
struct type. This patch is simple to work with, a few ideas in the
comments are not put into this patch. I would submit this
patch first.

The idea is trying to use sub-modes(scalar) for the "block move".
And the sub-modes would align with the access patterns of the
struct members and usages on parameter/return value.
The major benefits of this change would be raising more 
opportunities for other optimization passes(cse/dse/xprop).

The suitable mode would be target specified and relates to ABI,
this patch introduces a target hook. And in this patch, the hook
is implemented on rs6000.

In this patch, the hook would be just using heuristic modes for all
struct block moving. And the hook would not check if the "block move"
is about parameters or return value or other uses.

For the rs6000 implementation of this hook, it is able to use
DF/DI/TD/.. modes for the struct block movement. The sub-modes
would be the same as the mode when the struct type is on parameter or
return value.

Bootstrapped and regtested on ppc64/ppc64le. 
Is this ok for trunk?


BR,
Jeff(Jiufu)


gcc/ChangeLog:

	* config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
	(submode_for_struct_block_move): New function.  Called from
	rs600_block_move_for_struct.
	(rs600_block_move_for_struct): New function.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
	* expr.cc (store_expr): Call block_move_for_struct.
	* target.def (block_move_for_struct): New hook.
	* targhooks.cc (default_block_move_for_struct): New function.
	* targhooks.h (default_block_move_for_struct): New Prototype.

---
 gcc/config/rs6000/rs6000.cc | 44 +++++++++++++++++++++++++++++++++++++
 gcc/doc/tm.texi             |  6 +++++
 gcc/doc/tm.texi.in          |  2 ++
 gcc/expr.cc                 | 14 +++++++++---
 gcc/target.def              | 10 +++++++++
 gcc/targhooks.cc            |  7 ++++++
 gcc/targhooks.h             |  1 +
 7 files changed, 81 insertions(+), 3 deletions(-)
  

Comments

Richard Biener Nov. 11, 2022, 9:40 a.m. UTC | #1
On Fri, 11 Nov 2022, Jiufu Guo wrote:

> Hi,
> 
> When assigning a struct parameter to another variable, or loading a
> memory block to a struct var (especially for return value),
> Now, "block move" would be used during expand the assignment. And
> the "block move" may use a type/mode different from the mode which
> is accessing the var. e.g. on ppc64le, V2DI would be used to move
> the block of 16bytes.
> 
> And then, this "block move" would prevent optimization passes from
> leaping/crossing over the assignment. PR65421 reflects this issue.
> 
> As the example code in PR65421.
> 
> typedef struct { double a[4]; } A;
> A foo (const A *a) { return *a; }
> 
> On ppc64le, the below instructions are used for the "block move":
>   7: r122:V2DI=[r121:DI]
>   8: r124:V2DI=[r121:DI+r123:DI]
>   9: [r112:DI]=r122:V2DI
>   10: [r112:DI+0x10]=r124:V2DI
> 
> For this issue, a few comments/suggestions are mentioned via RFC:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
> I drafted a patch which is updating the behavior of block_move for
> struct type. This patch is simple to work with, a few ideas in the
> comments are not put into this patch. I would submit this
> patch first.
> 
> The idea is trying to use sub-modes(scalar) for the "block move".
> And the sub-modes would align with the access patterns of the
> struct members and usages on parameter/return value.
> The major benefits of this change would be raising more 
> opportunities for other optimization passes(cse/dse/xprop).
> 
> The suitable mode would be target specified and relates to ABI,
> this patch introduces a target hook. And in this patch, the hook
> is implemented on rs6000.
> 
> In this patch, the hook would be just using heuristic modes for all
> struct block moving. And the hook would not check if the "block move"
> is about parameters or return value or other uses.
> 
> For the rs6000 implementation of this hook, it is able to use
> DF/DI/TD/.. modes for the struct block movement. The sub-modes
> would be the same as the mode when the struct type is on parameter or
> return value.
> 
> Bootstrapped and regtested on ppc64/ppc64le. 
> Is this ok for trunk?
> 
> 
> BR,
> Jeff(Jiufu)
> 
> 
> gcc/ChangeLog:
> 
> 	* config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
> 	(submode_for_struct_block_move): New function.  Called from
> 	rs600_block_move_for_struct.
> 	(rs600_block_move_for_struct): New function.
> 	* doc/tm.texi: Regenerate.
> 	* doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
> 	* expr.cc (store_expr): Call block_move_for_struct.
> 	* target.def (block_move_for_struct): New hook.
> 	* targhooks.cc (default_block_move_for_struct): New function.
> 	* targhooks.h (default_block_move_for_struct): New Prototype.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 44 +++++++++++++++++++++++++++++++++++++
>  gcc/doc/tm.texi             |  6 +++++
>  gcc/doc/tm.texi.in          |  2 ++
>  gcc/expr.cc                 | 14 +++++++++---
>  gcc/target.def              | 10 +++++++++
>  gcc/targhooks.cc            |  7 ++++++
>  gcc/targhooks.h             |  1 +
>  7 files changed, 81 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index a85d7630b41..e14cecba0ef 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1758,6 +1758,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>  #undef TARGET_NEED_IPA_FN_TARGET_INFO
>  #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
>  
> +#undef TARGET_BLOCK_MOVE_FOR_STRUCT
> +#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
> +
>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>  
> @@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype,
>    return gen_rtx_REG (mode, regno);
>  }
>  
> +/* Subroutine of rs600_block_move_for_struct, to get the internal mode which
> +   would be used to move the struct.  */
> +static machine_mode
> +submode_for_struct_block_move (tree type)
> +{
> +  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
> +
> +  /* The sub mode may not be the field's type of the struct.
> +     It would be fine to use the mode as if the type is used as a function
> +     parameter or return value.  For example: DF for "{double a[4];}", and
> +     DI for "{doubel a[3]; long l;}".
> +     Here, using the mode as if it is function return type.  */
> +  rtx val = rs6000_function_value (type, NULL, 0);
> +  return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, 0), 0))
> +				      : word_mode;
> +}
> +
> +/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook.  */
> +static void
> +rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
> +{
> +  machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp));
> +  int mode_size = GET_MODE_SIZE (mode);
> +  int size = UINTVAL (expr_size (exp));
> +  if (size < mode_size || (size % mode_size) != 0 || size > 64)
> +    {
> +      default_block_move_for_struct (x, y, exp, method);
> +      return;
> +    }
> +
> +  int len = size / mode_size;
> +  for (int i = 0; i < len; i++)
> +    {
> +      rtx temp = gen_reg_rtx (mode);
> +      rtx src = adjust_address (y, mode, mode_size * i);
> +      rtx dest = adjust_address (x, mode, mode_size * i);
> +      emit_move_insn (temp, src);
> +      emit_move_insn (dest, temp);
> +    }
> +}
> +
>  /* Define how to find the value returned by a library function
>     assuming the value has mode MODE.  */
>  rtx
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 8572313b308..c8a1c1f30cf 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -1380,6 +1380,12 @@ retain the field's mode.
>  Normally, this is not needed.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} void TARGET_BLOCK_MOVE_FOR_STRUCT (rtx @var{x}, rtx @var{y}, tree @var{exp}, HOST_WIDE_INT @var{method})
> +Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure
> +type. @var{method} is the method if this function invokes block_move.
> +The default definition invokes block_move.
> +@end deftypefn
> +
>  @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
>  Define this macro as an expression for the alignment of a type (given
>  by @var{type} as a tree node) if the alignment computed in the usual
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 986e8f0da09..f0f525a2008 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -1226,6 +1226,8 @@ to aligning a bit-field within the structure.
>  
>  @hook TARGET_MEMBER_TYPE_FORCES_BLK
>  
> +@hook TARGET_BLOCK_MOVE_FOR_STRUCT
> +
>  @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
>  Define this macro as an expression for the alignment of a type (given
>  by @var{type} as a tree node) if the alignment computed in the usual
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index c6917fbf7bd..734dc07a76b 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -6504,9 +6504,17 @@ store_expr (tree exp, rtx target, int call_param_p,
>  	emit_group_store (target, temp, TREE_TYPE (exp),
>  			  int_size_in_bytes (TREE_TYPE (exp)));
>        else if (GET_MODE (temp) == BLKmode)
> -	emit_block_move (target, temp, expr_size (exp),
> -			 (call_param_p
> -			  ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> +	{
> +	  if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE)
> +	    targetm.block_move_for_struct (target, temp, exp,
> +					   call_param_p ? BLOCK_OP_CALL_PARM
> +							: BLOCK_OP_NORMAL);

I think it's only sensible to do something about the call_param_p case,
all other cases are GIGO.  And I don't see a good reason to add a new
target hook since function parameter passing info is readily
available.

The question is whether this is best handled up in the call chain where
the parameter setup is done or below here (or in emit_block_move).

Richard.

> +	  else
> +	    emit_block_move (target, temp, expr_size (exp),
> +			     (call_param_p ? BLOCK_OP_CALL_PARM
> +					   : BLOCK_OP_NORMAL));
> +	}
> +
>        /* If we emit a nontemporal store, there is nothing else to do.  */
>        else if (nontemporal && emit_storent_insn (target, temp))
>  	;
> diff --git a/gcc/target.def b/gcc/target.def
> index 25f94c19fa7..e141f72a8a3 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5584,6 +5584,16 @@ Normally, this is not needed.",
>   bool, (const_tree field, machine_mode mode),
>   default_member_type_forces_blk)
>  
> +
> +/* Move block for structure type.  */
> +DEFHOOK
> +(block_move_for_struct,
> + "Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure\n\
> +type. @var{method} is the method if this function invokes block_move.\n\
> +The default definition invokes block_move.",
> + void, (rtx x, rtx y, tree exp, HOST_WIDE_INT method),
> + default_block_move_for_struct)
> +
>  /* See tree-ssa-math-opts.cc:divmod_candidate_p for conditions
>     that gate the divod transform.  */
>  DEFHOOK
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index 12a58456b39..2b96c348419 100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -2330,6 +2330,13 @@ default_member_type_forces_blk (const_tree, machine_mode)
>    return false;
>  }
>  
> +/* Default version of block_move_for_struct.  */
> +void
> +default_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
> +{
> +  emit_block_move (x, y, expr_size (exp), (enum block_op_methods)method);
> +}
> +
>  /* Default version of canonicalize_comparison.  */
>  
>  void
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index a6a423c1abb..c284a35ee28 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -264,6 +264,7 @@ extern void default_asm_output_ident_directive (const char*);
>  
>  extern scalar_int_mode default_cstore_mode (enum insn_code);
>  extern bool default_member_type_forces_blk (const_tree, machine_mode);
> +extern void default_block_move_for_struct (rtx, rtx, tree, HOST_WIDE_INT);
>  extern void default_atomic_assign_expand_fenv (tree *, tree *, tree *);
>  extern tree build_va_arg_indirect_ref (tree);
>  extern tree std_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
>
  
Jiufu Guo Nov. 14, 2022, 3:51 a.m. UTC | #2
Hi!
Thanks so much for your review!

Richard Biener <rguenther@suse.de> writes:

> On Fri, 11 Nov 2022, Jiufu Guo wrote:
>
>> Hi,
>> 
>> When assigning a struct parameter to another variable, or loading a
>> memory block to a struct var (especially for return value),
>> Now, "block move" would be used during expand the assignment. And
>> the "block move" may use a type/mode different from the mode which
>> is accessing the var. e.g. on ppc64le, V2DI would be used to move
>> the block of 16bytes.
>> 
>> And then, this "block move" would prevent optimization passes from
>> leaping/crossing over the assignment. PR65421 reflects this issue.
>> 
>> As the example code in PR65421.
>> 
>> typedef struct { double a[4]; } A;
>> A foo (const A *a) { return *a; }
>> 
>> On ppc64le, the below instructions are used for the "block move":
>>   7: r122:V2DI=[r121:DI]
>>   8: r124:V2DI=[r121:DI+r123:DI]
>>   9: [r112:DI]=r122:V2DI
>>   10: [r112:DI+0x10]=r124:V2DI
>> 
>> For this issue, a few comments/suggestions are mentioned via RFC:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>> I drafted a patch which is updating the behavior of block_move for
>> struct type. This patch is simple to work with, a few ideas in the
>> comments are not put into this patch. I would submit this
>> patch first.
>> 
>> The idea is trying to use sub-modes(scalar) for the "block move".
>> And the sub-modes would align with the access patterns of the
>> struct members and usages on parameter/return value.
>> The major benefits of this change would be raising more 
>> opportunities for other optimization passes(cse/dse/xprop).
>> 
>> The suitable mode would be target specified and relates to ABI,
>> this patch introduces a target hook. And in this patch, the hook
>> is implemented on rs6000.
>> 
>> In this patch, the hook would be just using heuristic modes for all
>> struct block moving. And the hook would not check if the "block move"
>> is about parameters or return value or other uses.
>> 
>> For the rs6000 implementation of this hook, it is able to use
>> DF/DI/TD/.. modes for the struct block movement. The sub-modes
>> would be the same as the mode when the struct type is on parameter or
>> return value.
>> 
>> Bootstrapped and regtested on ppc64/ppc64le. 
>> Is this ok for trunk?
>> 
>> 
>> BR,
>> Jeff(Jiufu)
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 	* config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
>> 	(submode_for_struct_block_move): New function.  Called from
>> 	rs600_block_move_for_struct.
>> 	(rs600_block_move_for_struct): New function.
>> 	* doc/tm.texi: Regenerate.
>> 	* doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
>> 	* expr.cc (store_expr): Call block_move_for_struct.
>> 	* target.def (block_move_for_struct): New hook.
>> 	* targhooks.cc (default_block_move_for_struct): New function.
>> 	* targhooks.h (default_block_move_for_struct): New Prototype.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc | 44 +++++++++++++++++++++++++++++++++++++
>>  gcc/doc/tm.texi             |  6 +++++
>>  gcc/doc/tm.texi.in          |  2 ++
>>  gcc/expr.cc                 | 14 +++++++++---
>>  gcc/target.def              | 10 +++++++++
>>  gcc/targhooks.cc            |  7 ++++++
>>  gcc/targhooks.h             |  1 +
>>  7 files changed, 81 insertions(+), 3 deletions(-)
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index a85d7630b41..e14cecba0ef 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1758,6 +1758,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>>  #undef TARGET_NEED_IPA_FN_TARGET_INFO
>>  #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
>>  
>> +#undef TARGET_BLOCK_MOVE_FOR_STRUCT
>> +#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
>> +
>>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>>  
>> @@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype,
>>    return gen_rtx_REG (mode, regno);
>>  }
>>  
>> +/* Subroutine of rs600_block_move_for_struct, to get the internal mode which
>> +   would be used to move the struct.  */
>> +static machine_mode
>> +submode_for_struct_block_move (tree type)
>> +{
>> +  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
>> +
>> +  /* The sub mode may not be the field's type of the struct.
>> +     It would be fine to use the mode as if the type is used as a function
>> +     parameter or return value.  For example: DF for "{double a[4];}", and
>> +     DI for "{doubel a[3]; long l;}".
>> +     Here, using the mode as if it is function return type.  */
>> +  rtx val = rs6000_function_value (type, NULL, 0);
>> +  return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, 0), 0))
>> +				      : word_mode;
>> +}
>> +
>> +/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook.  */
>> +static void
>> +rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
>> +{
>> +  machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp));
>> +  int mode_size = GET_MODE_SIZE (mode);
>> +  int size = UINTVAL (expr_size (exp));
>> +  if (size < mode_size || (size % mode_size) != 0 || size > 64)
>> +    {
>> +      default_block_move_for_struct (x, y, exp, method);
>> +      return;
>> +    }
>> +
>> +  int len = size / mode_size;
>> +  for (int i = 0; i < len; i++)
>> +    {
>> +      rtx temp = gen_reg_rtx (mode);
>> +      rtx src = adjust_address (y, mode, mode_size * i);
>> +      rtx dest = adjust_address (x, mode, mode_size * i);
>> +      emit_move_insn (temp, src);
>> +      emit_move_insn (dest, temp);
>> +    }
>> +}
>> +
>>  /* Define how to find the value returned by a library function
>>     assuming the value has mode MODE.  */
>>  rtx
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 8572313b308..c8a1c1f30cf 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -1380,6 +1380,12 @@ retain the field's mode.
>>  Normally, this is not needed.
>>  @end deftypefn
>>  
>> +@deftypefn {Target Hook} void TARGET_BLOCK_MOVE_FOR_STRUCT (rtx @var{x}, rtx @var{y}, tree @var{exp}, HOST_WIDE_INT @var{method})
>> +Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure
>> +type. @var{method} is the method if this function invokes block_move.
>> +The default definition invokes block_move.
>> +@end deftypefn
>> +
>>  @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
>>  Define this macro as an expression for the alignment of a type (given
>>  by @var{type} as a tree node) if the alignment computed in the usual
>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> index 986e8f0da09..f0f525a2008 100644
>> --- a/gcc/doc/tm.texi.in
>> +++ b/gcc/doc/tm.texi.in
>> @@ -1226,6 +1226,8 @@ to aligning a bit-field within the structure.
>>  
>>  @hook TARGET_MEMBER_TYPE_FORCES_BLK
>>  
>> +@hook TARGET_BLOCK_MOVE_FOR_STRUCT
>> +
>>  @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
>>  Define this macro as an expression for the alignment of a type (given
>>  by @var{type} as a tree node) if the alignment computed in the usual
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index c6917fbf7bd..734dc07a76b 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -6504,9 +6504,17 @@ store_expr (tree exp, rtx target, int call_param_p,
>>  	emit_group_store (target, temp, TREE_TYPE (exp),
>>  			  int_size_in_bytes (TREE_TYPE (exp)));
>>        else if (GET_MODE (temp) == BLKmode)
>> -	emit_block_move (target, temp, expr_size (exp),
>> -			 (call_param_p
>> -			  ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>> +	{
>> +	  if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE)
>> +	    targetm.block_move_for_struct (target, temp, exp,
>> +					   call_param_p ? BLOCK_OP_CALL_PARM
>> +							: BLOCK_OP_NORMAL);
>
> I think it's only sensible to do something about the call_param_p case,
> all other cases are GIGO.  And I don't see a good reason to add a new
> target hook since function parameter passing info is readily
> available.
This patch enhances the normal assignments on struct variables (
expand_assignment-->store_expr, where the call_param_p is 0). Because
the "struct block move/copy" is generated for this case too.

For example, the struct block_move is generated.
# .MEM_3 = VDEF <.MEM_1(D)>
D.3956 = *a_2(D);
For this stmt, "D.3956" is a struct variable, and "a_2(D)" points to a
block of the struct type.

If we only care about parameter/arg/retval(s), we would not need a
hook, as you said, since parm/retval contains the information about
the target/ABI.  And for more as you suggested in the previous emails,
we could do SRA-like analyze to check if an assignment relates to
parm/retval.

The current patch is simple relatively, it uses 'scalar-modes' to move
blocks for all struct-type assignments.  I feel this would be enough,
and I'm wondering if we need to check parm/retval.

>
> The question is whether this is best handled up in the call chain where
> the parameter setup is done or below here (or in emit_block_move).
As above discussed, this patch touches 'block move' for struct type,
including copy from 'parameter stack block' to a variable block.
Oh, assigning from 'parameter regs' to 'stack block' is not changed
here, it is done through "assign_parms-...->emit_group_xx".

If any misunderstanding, or any comments/concerns, please point them
out.  Thanks a lot!

BR,
Jeff (Jiufu)

>
> Richard.
>
>> +	  else
>> +	    emit_block_move (target, temp, expr_size (exp),
>> +			     (call_param_p ? BLOCK_OP_CALL_PARM
>> +					   : BLOCK_OP_NORMAL));
>> +	}
>> +
>>        /* If we emit a nontemporal store, there is nothing else to do.  */
>>        else if (nontemporal && emit_storent_insn (target, temp))
>>  	;
>> diff --git a/gcc/target.def b/gcc/target.def
>> index 25f94c19fa7..e141f72a8a3 100644
>> --- a/gcc/target.def
>> +++ b/gcc/target.def
>> @@ -5584,6 +5584,16 @@ Normally, this is not needed.",
>>   bool, (const_tree field, machine_mode mode),
>>   default_member_type_forces_blk)
>>  
>> +
>> +/* Move block for structure type.  */
>> +DEFHOOK
>> +(block_move_for_struct,
>> + "Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure\n\
>> +type. @var{method} is the method if this function invokes block_move.\n\
>> +The default definition invokes block_move.",
>> + void, (rtx x, rtx y, tree exp, HOST_WIDE_INT method),
>> + default_block_move_for_struct)
>> +
>>  /* See tree-ssa-math-opts.cc:divmod_candidate_p for conditions
>>     that gate the divod transform.  */
>>  DEFHOOK
>> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
>> index 12a58456b39..2b96c348419 100644
>> --- a/gcc/targhooks.cc
>> +++ b/gcc/targhooks.cc
>> @@ -2330,6 +2330,13 @@ default_member_type_forces_blk (const_tree, machine_mode)
>>    return false;
>>  }
>>  
>> +/* Default version of block_move_for_struct.  */
>> +void
>> +default_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
>> +{
>> +  emit_block_move (x, y, expr_size (exp), (enum block_op_methods)method);
>> +}
>> +
>>  /* Default version of canonicalize_comparison.  */
>>  
>>  void
>> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
>> index a6a423c1abb..c284a35ee28 100644
>> --- a/gcc/targhooks.h
>> +++ b/gcc/targhooks.h
>> @@ -264,6 +264,7 @@ extern void default_asm_output_ident_directive (const char*);
>>  
>>  extern scalar_int_mode default_cstore_mode (enum insn_code);
>>  extern bool default_member_type_forces_blk (const_tree, machine_mode);
>> +extern void default_block_move_for_struct (rtx, rtx, tree, HOST_WIDE_INT);
>>  extern void default_atomic_assign_expand_fenv (tree *, tree *, tree *);
>>  extern tree build_va_arg_indirect_ref (tree);
>>  extern tree std_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
>>
  
Richard Biener Nov. 14, 2022, 7:23 a.m. UTC | #3
On Mon, 14 Nov 2022, Jiufu Guo wrote:

> 
> Hi!
> Thanks so much for your review!
> 
> Richard Biener <rguenther@suse.de> writes:
> 
> > On Fri, 11 Nov 2022, Jiufu Guo wrote:
> >
> >> Hi,
> >> 
> >> When assigning a struct parameter to another variable, or loading a
> >> memory block to a struct var (especially for return value),
> >> Now, "block move" would be used during expand the assignment. And
> >> the "block move" may use a type/mode different from the mode which
> >> is accessing the var. e.g. on ppc64le, V2DI would be used to move
> >> the block of 16bytes.
> >> 
> >> And then, this "block move" would prevent optimization passes from
> >> leaping/crossing over the assignment. PR65421 reflects this issue.
> >> 
> >> As the example code in PR65421.
> >> 
> >> typedef struct { double a[4]; } A;
> >> A foo (const A *a) { return *a; }
> >> 
> >> On ppc64le, the below instructions are used for the "block move":
> >>   7: r122:V2DI=[r121:DI]
> >>   8: r124:V2DI=[r121:DI+r123:DI]
> >>   9: [r112:DI]=r122:V2DI
> >>   10: [r112:DI+0x10]=r124:V2DI
> >> 
> >> For this issue, a few comments/suggestions are mentioned via RFC:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
> >> I drafted a patch which is updating the behavior of block_move for
> >> struct type. This patch is simple to work with, a few ideas in the
> >> comments are not put into this patch. I would submit this
> >> patch first.
> >> 
> >> The idea is trying to use sub-modes(scalar) for the "block move".
> >> And the sub-modes would align with the access patterns of the
> >> struct members and usages on parameter/return value.
> >> The major benefits of this change would be raising more 
> >> opportunities for other optimization passes(cse/dse/xprop).
> >> 
> >> The suitable mode would be target specified and relates to ABI,
> >> this patch introduces a target hook. And in this patch, the hook
> >> is implemented on rs6000.
> >> 
> >> In this patch, the hook would be just using heuristic modes for all
> >> struct block moving. And the hook would not check if the "block move"
> >> is about parameters or return value or other uses.
> >> 
> >> For the rs6000 implementation of this hook, it is able to use
> >> DF/DI/TD/.. modes for the struct block movement. The sub-modes
> >> would be the same as the mode when the struct type is on parameter or
> >> return value.
> >> 
> >> Bootstrapped and regtested on ppc64/ppc64le. 
> >> Is this ok for trunk?
> >> 
> >> 
> >> BR,
> >> Jeff(Jiufu)
> >> 
> >> 
> >> gcc/ChangeLog:
> >> 
> >> 	* config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
> >> 	(submode_for_struct_block_move): New function.  Called from
> >> 	rs600_block_move_for_struct.
> >> 	(rs600_block_move_for_struct): New function.
> >> 	* doc/tm.texi: Regenerate.
> >> 	* doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
> >> 	* expr.cc (store_expr): Call block_move_for_struct.
> >> 	* target.def (block_move_for_struct): New hook.
> >> 	* targhooks.cc (default_block_move_for_struct): New function.
> >> 	* targhooks.h (default_block_move_for_struct): New Prototype.
> >> 
> >> ---
> >>  gcc/config/rs6000/rs6000.cc | 44 +++++++++++++++++++++++++++++++++++++
> >>  gcc/doc/tm.texi             |  6 +++++
> >>  gcc/doc/tm.texi.in          |  2 ++
> >>  gcc/expr.cc                 | 14 +++++++++---
> >>  gcc/target.def              | 10 +++++++++
> >>  gcc/targhooks.cc            |  7 ++++++
> >>  gcc/targhooks.h             |  1 +
> >>  7 files changed, 81 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >> index a85d7630b41..e14cecba0ef 100644
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -1758,6 +1758,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
> >>  #undef TARGET_NEED_IPA_FN_TARGET_INFO
> >>  #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
> >>  
> >> +#undef TARGET_BLOCK_MOVE_FOR_STRUCT
> >> +#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
> >> +
> >>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
> >>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
> >>  
> >> @@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype,
> >>    return gen_rtx_REG (mode, regno);
> >>  }
> >>  
> >> +/* Subroutine of rs600_block_move_for_struct, to get the internal mode which
> >> +   would be used to move the struct.  */
> >> +static machine_mode
> >> +submode_for_struct_block_move (tree type)
> >> +{
> >> +  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
> >> +
> >> +  /* The sub mode may not be the field's type of the struct.
> >> +     It would be fine to use the mode as if the type is used as a function
> >> +     parameter or return value.  For example: DF for "{double a[4];}", and
> >> +     DI for "{doubel a[3]; long l;}".
> >> +     Here, using the mode as if it is function return type.  */
> >> +  rtx val = rs6000_function_value (type, NULL, 0);
> >> +  return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, 0), 0))
> >> +				      : word_mode;
> >> +}
> >> +
> >> +/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook.  */
> >> +static void
> >> +rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
> >> +{
> >> +  machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp));
> >> +  int mode_size = GET_MODE_SIZE (mode);
> >> +  int size = UINTVAL (expr_size (exp));
> >> +  if (size < mode_size || (size % mode_size) != 0 || size > 64)
> >> +    {
> >> +      default_block_move_for_struct (x, y, exp, method);
> >> +      return;
> >> +    }
> >> +
> >> +  int len = size / mode_size;
> >> +  for (int i = 0; i < len; i++)
> >> +    {
> >> +      rtx temp = gen_reg_rtx (mode);
> >> +      rtx src = adjust_address (y, mode, mode_size * i);
> >> +      rtx dest = adjust_address (x, mode, mode_size * i);
> >> +      emit_move_insn (temp, src);
> >> +      emit_move_insn (dest, temp);
> >> +    }
> >> +}
> >> +
> >>  /* Define how to find the value returned by a library function
> >>     assuming the value has mode MODE.  */
> >>  rtx
> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> >> index 8572313b308..c8a1c1f30cf 100644
> >> --- a/gcc/doc/tm.texi
> >> +++ b/gcc/doc/tm.texi
> >> @@ -1380,6 +1380,12 @@ retain the field's mode.
> >>  Normally, this is not needed.
> >>  @end deftypefn
> >>  
> >> +@deftypefn {Target Hook} void TARGET_BLOCK_MOVE_FOR_STRUCT (rtx @var{x}, rtx @var{y}, tree @var{exp}, HOST_WIDE_INT @var{method})
> >> +Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure
> >> +type. @var{method} is the method if this function invokes block_move.
> >> +The default definition invokes block_move.
> >> +@end deftypefn
> >> +
> >>  @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
> >>  Define this macro as an expression for the alignment of a type (given
> >>  by @var{type} as a tree node) if the alignment computed in the usual
> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> >> index 986e8f0da09..f0f525a2008 100644
> >> --- a/gcc/doc/tm.texi.in
> >> +++ b/gcc/doc/tm.texi.in
> >> @@ -1226,6 +1226,8 @@ to aligning a bit-field within the structure.
> >>  
> >>  @hook TARGET_MEMBER_TYPE_FORCES_BLK
> >>  
> >> +@hook TARGET_BLOCK_MOVE_FOR_STRUCT
> >> +
> >>  @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
> >>  Define this macro as an expression for the alignment of a type (given
> >>  by @var{type} as a tree node) if the alignment computed in the usual
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> index c6917fbf7bd..734dc07a76b 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -6504,9 +6504,17 @@ store_expr (tree exp, rtx target, int call_param_p,
> >>  	emit_group_store (target, temp, TREE_TYPE (exp),
> >>  			  int_size_in_bytes (TREE_TYPE (exp)));
> >>        else if (GET_MODE (temp) == BLKmode)
> >> -	emit_block_move (target, temp, expr_size (exp),
> >> -			 (call_param_p
> >> -			  ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> >> +	{
> >> +	  if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE)
> >> +	    targetm.block_move_for_struct (target, temp, exp,
> >> +					   call_param_p ? BLOCK_OP_CALL_PARM
> >> +							: BLOCK_OP_NORMAL);
> >
> > I think it's only sensible to do something about the call_param_p case,
> > all other cases are GIGO.  And I don't see a good reason to add a new
> > target hook since function parameter passing info is readily
> > available.
> This patch enhances the normal assignments on struct variables (
> expand_assignment-->store_expr, where the call_param_p is 0). Because
> the "struct block move/copy" is generated for this case too.
> 
> For example, the struct block_move is generated.
> # .MEM_3 = VDEF <.MEM_1(D)>
> D.3956 = *a_2(D);
> For this stmt, "D.3956" is a struct variable, and "a_2(D)" points to a
> block of the struct type.
> 
> If we only care about parameter/arg/retval(s), we would not need a
> hook, as you said, since parm/retval contains the information about
> the target/ABI.  And for more as you suggested in the previous emails,
> we could do SRA-like analyze to check if an assignment relates to
> parm/retval.

But the issue with applying a stmt-local heuristic is that it's going
to be as many times wrong as it is right so there inevitably will be
regressions.

Also copies not directly involving parameters or returns can be optimized
on the GIMPLE level just fine (see what SRA does), the exception is
cpymem optab implementations but those you short-circuit as well here.

> The current patch is simple relatively, it uses 'scalar-modes' to move
> blocks for all struct-type assignments.  I feel this would be enough,
> and I'm wondering if we need to check parm/retval.

It's only for the param/retval that RTL expansion has more information
than GIMPLE and in that specific case the code generation is very 
difficult to untie and thus a change like this looks appropriate.

Thanks,
Richard.

> 
> >
> > The question is whether this is best handled up in the call chain where
> > the parameter setup is done or below here (or in emit_block_move).
> As above discussed, this patch touches 'block move' for struct type,
> including copy from 'parameter stack block' to a variable block.
> Oh, assigning from 'parameter regs' to 'stack block' is not changed
> here, it is done through "assign_parms-...->emit_group_xx".
> 
> If any misunderstanding, or any comments/concerns, please point them
> out.  Thanks a lot!
> 
> BR,
> Jeff (Jiufu)
> 
> >
> > Richard.
> >
> >> +	  else
> >> +	    emit_block_move (target, temp, expr_size (exp),
> >> +			     (call_param_p ? BLOCK_OP_CALL_PARM
> >> +					   : BLOCK_OP_NORMAL));
> >> +	}
> >> +
> >>        /* If we emit a nontemporal store, there is nothing else to do.  */
> >>        else if (nontemporal && emit_storent_insn (target, temp))
> >>  	;
> >> diff --git a/gcc/target.def b/gcc/target.def
> >> index 25f94c19fa7..e141f72a8a3 100644
> >> --- a/gcc/target.def
> >> +++ b/gcc/target.def
> >> @@ -5584,6 +5584,16 @@ Normally, this is not needed.",
> >>   bool, (const_tree field, machine_mode mode),
> >>   default_member_type_forces_blk)
> >>  
> >> +
> >> +/* Move block for structure type.  */
> >> +DEFHOOK
> >> +(block_move_for_struct,
> >> + "Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure\n\
> >> +type. @var{method} is the method if this function invokes block_move.\n\
> >> +The default definition invokes block_move.",
> >> + void, (rtx x, rtx y, tree exp, HOST_WIDE_INT method),
> >> + default_block_move_for_struct)
> >> +
> >>  /* See tree-ssa-math-opts.cc:divmod_candidate_p for conditions
> >>     that gate the divod transform.  */
> >>  DEFHOOK
> >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> >> index 12a58456b39..2b96c348419 100644
> >> --- a/gcc/targhooks.cc
> >> +++ b/gcc/targhooks.cc
> >> @@ -2330,6 +2330,13 @@ default_member_type_forces_blk (const_tree, machine_mode)
> >>    return false;
> >>  }
> >>  
> >> +/* Default version of block_move_for_struct.  */
> >> +void
> >> +default_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
> >> +{
> >> +  emit_block_move (x, y, expr_size (exp), (enum block_op_methods)method);
> >> +}
> >> +
> >>  /* Default version of canonicalize_comparison.  */
> >>  
> >>  void
> >> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> >> index a6a423c1abb..c284a35ee28 100644
> >> --- a/gcc/targhooks.h
> >> +++ b/gcc/targhooks.h
> >> @@ -264,6 +264,7 @@ extern void default_asm_output_ident_directive (const char*);
> >>  
> >>  extern scalar_int_mode default_cstore_mode (enum insn_code);
> >>  extern bool default_member_type_forces_blk (const_tree, machine_mode);
> >> +extern void default_block_move_for_struct (rtx, rtx, tree, HOST_WIDE_INT);
> >>  extern void default_atomic_assign_expand_fenv (tree *, tree *, tree *);
> >>  extern tree build_va_arg_indirect_ref (tree);
> >>  extern tree std_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
> >> 
>
  
Jiufu Guo Nov. 14, 2022, 9:53 a.m. UTC | #4
Hi!

Thanks for your helpful comments/sugguestions!

Richard Biener <rguenther@suse.de> writes:

> On Mon, 14 Nov 2022, Jiufu Guo wrote:
>
>> 
>> Hi!
>> Thanks so much for your review!
>> 
>> Richard Biener <rguenther@suse.de> writes:
>> 
>> > On Fri, 11 Nov 2022, Jiufu Guo wrote:
>> >
>> >> Hi,
>> >> 
>> >> When assigning a struct parameter to another variable, or loading a
>> >> memory block to a struct var (especially for return value),
>> >> Now, "block move" would be used during expand the assignment. And
>> >> the "block move" may use a type/mode different from the mode which
>> >> is accessing the var. e.g. on ppc64le, V2DI would be used to move
>> >> the block of 16bytes.
>> >> 
>> >> And then, this "block move" would prevent optimization passes from
>> >> leaping/crossing over the assignment. PR65421 reflects this issue.
>> >> 
>> >> As the example code in PR65421.
>> >> 
>> >> typedef struct { double a[4]; } A;
>> >> A foo (const A *a) { return *a; }
>> >> 
>> >> On ppc64le, the below instructions are used for the "block move":
>> >>   7: r122:V2DI=[r121:DI]
>> >>   8: r124:V2DI=[r121:DI+r123:DI]
>> >>   9: [r112:DI]=r122:V2DI
>> >>   10: [r112:DI+0x10]=r124:V2DI
>> >> 
>> >> For this issue, a few comments/suggestions are mentioned via RFC:
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>> >> I drafted a patch which is updating the behavior of block_move for
>> >> struct type. This patch is simple to work with, a few ideas in the
>> >> comments are not put into this patch. I would submit this
>> >> patch first.
>> >> 
>> >> The idea is trying to use sub-modes(scalar) for the "block move".
>> >> And the sub-modes would align with the access patterns of the
>> >> struct members and usages on parameter/return value.
>> >> The major benefits of this change would be raising more 
>> >> opportunities for other optimization passes(cse/dse/xprop).
>> >> 
>> >> The suitable mode would be target specified and relates to ABI,
>> >> this patch introduces a target hook. And in this patch, the hook
>> >> is implemented on rs6000.
>> >> 
>> >> In this patch, the hook would be just using heuristic modes for all
>> >> struct block moving. And the hook would not check if the "block move"
>> >> is about parameters or return value or other uses.
>> >> 
>> >> For the rs6000 implementation of this hook, it is able to use
>> >> DF/DI/TD/.. modes for the struct block movement. The sub-modes
>> >> would be the same as the mode when the struct type is on parameter or
>> >> return value.
>> >> 
>> >> Bootstrapped and regtested on ppc64/ppc64le. 
>> >> Is this ok for trunk?
>> >> 
>> >> 
>> >> BR,
>> >> Jeff(Jiufu)
>> >> 
>> >> 
>> >> gcc/ChangeLog:
>> >> 
>> >> 	* config/rs6000/rs6000.cc (TARGET_BLOCK_MOVE_FOR_STRUCT): Define.
>> >> 	(submode_for_struct_block_move): New function.  Called from
>> >> 	rs600_block_move_for_struct.
>> >> 	(rs600_block_move_for_struct): New function.
>> >> 	* doc/tm.texi: Regenerate.
>> >> 	* doc/tm.texi.in (TARGET_BLOCK_MOVE_FOR_STRUCT): New.
>> >> 	* expr.cc (store_expr): Call block_move_for_struct.
>> >> 	* target.def (block_move_for_struct): New hook.
>> >> 	* targhooks.cc (default_block_move_for_struct): New function.
>> >> 	* targhooks.h (default_block_move_for_struct): New Prototype.
>> >> 
>> >> ---
>> >>  gcc/config/rs6000/rs6000.cc | 44 +++++++++++++++++++++++++++++++++++++
>> >>  gcc/doc/tm.texi             |  6 +++++
>> >>  gcc/doc/tm.texi.in          |  2 ++
>> >>  gcc/expr.cc                 | 14 +++++++++---
>> >>  gcc/target.def              | 10 +++++++++
>> >>  gcc/targhooks.cc            |  7 ++++++
>> >>  gcc/targhooks.h             |  1 +
>> >>  7 files changed, 81 insertions(+), 3 deletions(-)
>> >> 
>> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> >> index a85d7630b41..e14cecba0ef 100644
>> >> --- a/gcc/config/rs6000/rs6000.cc
>> >> +++ b/gcc/config/rs6000/rs6000.cc
>> >> @@ -1758,6 +1758,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>> >>  #undef TARGET_NEED_IPA_FN_TARGET_INFO
>> >>  #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
>> >>  
>> >> +#undef TARGET_BLOCK_MOVE_FOR_STRUCT
>> >> +#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
>> >> +
>> >>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
>> >>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
>> >>  
>> >> @@ -23672,6 +23675,47 @@ rs6000_function_value (const_tree valtype,
>> >>    return gen_rtx_REG (mode, regno);
>> >>  }
>> >>  
>> >> +/* Subroutine of rs600_block_move_for_struct, to get the internal mode which
>> >> +   would be used to move the struct.  */
>> >> +static machine_mode
>> >> +submode_for_struct_block_move (tree type)
>> >> +{
>> >> +  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
>> >> +
>> >> +  /* The sub mode may not be the field's type of the struct.
>> >> +     It would be fine to use the mode as if the type is used as a function
>> >> +     parameter or return value.  For example: DF for "{double a[4];}", and
>> >> +     DI for "{doubel a[3]; long l;}".
>> >> +     Here, using the mode as if it is function return type.  */
>> >> +  rtx val = rs6000_function_value (type, NULL, 0);
>> >> +  return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, 0), 0))
>> >> +				      : word_mode;
>> >> +}
>> >> +
>> >> +/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook.  */
>> >> +static void
>> >> +rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
>> >> +{
>> >> +  machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp));
>> >> +  int mode_size = GET_MODE_SIZE (mode);
>> >> +  int size = UINTVAL (expr_size (exp));
>> >> +  if (size < mode_size || (size % mode_size) != 0 || size > 64)
>> >> +    {
>> >> +      default_block_move_for_struct (x, y, exp, method);
>> >> +      return;
>> >> +    }
>> >> +
>> >> +  int len = size / mode_size;
>> >> +  for (int i = 0; i < len; i++)
>> >> +    {
>> >> +      rtx temp = gen_reg_rtx (mode);
>> >> +      rtx src = adjust_address (y, mode, mode_size * i);
>> >> +      rtx dest = adjust_address (x, mode, mode_size * i);
>> >> +      emit_move_insn (temp, src);
>> >> +      emit_move_insn (dest, temp);
>> >> +    }
>> >> +}
>> >> +
>> >>  /* Define how to find the value returned by a library function
>> >>     assuming the value has mode MODE.  */
>> >>  rtx
>> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> >> index 8572313b308..c8a1c1f30cf 100644
>> >> --- a/gcc/doc/tm.texi
>> >> +++ b/gcc/doc/tm.texi
>> >> @@ -1380,6 +1380,12 @@ retain the field's mode.
>> >>  Normally, this is not needed.
>> >>  @end deftypefn
>> >>  
>> >> +@deftypefn {Target Hook} void TARGET_BLOCK_MOVE_FOR_STRUCT (rtx @var{x}, rtx @var{y}, tree @var{exp}, HOST_WIDE_INT @var{method})
>> >> +Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure
>> >> +type. @var{method} is the method if this function invokes block_move.
>> >> +The default definition invokes block_move.
>> >> +@end deftypefn
>> >> +
>> >>  @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
>> >>  Define this macro as an expression for the alignment of a type (given
>> >>  by @var{type} as a tree node) if the alignment computed in the usual
>> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> >> index 986e8f0da09..f0f525a2008 100644
>> >> --- a/gcc/doc/tm.texi.in
>> >> +++ b/gcc/doc/tm.texi.in
>> >> @@ -1226,6 +1226,8 @@ to aligning a bit-field within the structure.
>> >>  
>> >>  @hook TARGET_MEMBER_TYPE_FORCES_BLK
>> >>  
>> >> +@hook TARGET_BLOCK_MOVE_FOR_STRUCT
>> >> +
>> >>  @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
>> >>  Define this macro as an expression for the alignment of a type (given
>> >>  by @var{type} as a tree node) if the alignment computed in the usual
>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> >> index c6917fbf7bd..734dc07a76b 100644
>> >> --- a/gcc/expr.cc
>> >> +++ b/gcc/expr.cc
>> >> @@ -6504,9 +6504,17 @@ store_expr (tree exp, rtx target, int call_param_p,
>> >>  	emit_group_store (target, temp, TREE_TYPE (exp),
>> >>  			  int_size_in_bytes (TREE_TYPE (exp)));
>> >>        else if (GET_MODE (temp) == BLKmode)
>> >> -	emit_block_move (target, temp, expr_size (exp),
>> >> -			 (call_param_p
>> >> -			  ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>> >> +	{
>> >> +	  if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE)
>> >> +	    targetm.block_move_for_struct (target, temp, exp,
>> >> +					   call_param_p ? BLOCK_OP_CALL_PARM
>> >> +							: BLOCK_OP_NORMAL);
>> >
>> > I think it's only sensible to do something about the call_param_p case,
>> > all other cases are GIGO.  And I don't see a good reason to add a new
>> > target hook since function parameter passing info is readily
>> > available.
>> This patch enhances the normal assignments on struct variables (
>> expand_assignment-->store_expr, where the call_param_p is 0). Because
>> the "struct block move/copy" is generated for this case too.
>> 
>> For example, the struct block_move is generated.
>> # .MEM_3 = VDEF <.MEM_1(D)>
>> D.3956 = *a_2(D);
>> For this stmt, "D.3956" is a struct variable, and "a_2(D)" points to a
>> block of the struct type.
>> 
>> If we only care about parameter/arg/retval(s), we would not need a
>> hook, as you said, since parm/retval contains the information about
>> the target/ABI.  And for more as you suggested in the previous emails,
>> we could do SRA-like analyze to check if an assignment relates to
>> parm/retval.
>
> But the issue with applying a stmt-local heuristic is that it's going
> to be as many times wrong as it is right so there inevitably will be
> regressions.
Understant. :-)  In the patch, the heurisitic is aplied to all
assignment on struct (no matter parm/return related).  The generated
binary would be changed definitly: mem move(via vector loads/stores)
for struct type becomes through sub-modes scalar loads/stores.
The binary change may cause some difference(e.g. performance change).
Even I had spec2017 tests on some machines(P9 and P10), it is sitll
possible has regressions.

>
> Also copies not directly involving parameters or returns can be optimized
> on the GIMPLE level just fine (see what SRA does), the exception is
> cpymem optab implementations but those you short-circuit as well here.
Yes, agree. So, enhancing the stmt/assignment about param/retval would
fix the issue (with low regression risks). 

>
>> The current patch is simple relatively, it uses 'scalar-modes' to move
>> blocks for all struct-type assignments.  I feel this would be enough,
>> and I'm wondering if we need to check parm/retval.
>
> It's only for the param/retval that RTL expansion has more information
> than GIMPLE and in that specific case the code generation is very 
> difficult to untie and thus a change like this looks appropriate.
Understant, I will update patch to handle more about param/retval.

Thanks again!

BR,
Jeff (Jiufu)

>
> Thanks,
> Richard.
>
>> 
>> >
>> > The question is whether this is best handled up in the call chain where
>> > the parameter setup is done or below here (or in emit_block_move).
>> As above discussed, this patch touches 'block move' for struct type,
>> including copy from 'parameter stack block' to a variable block.
>> Oh, assigning from 'parameter regs' to 'stack block' is not changed
>> here, it is done through "assign_parms-...->emit_group_xx".
>> 
>> If any misunderstanding, or any comments/concerns, please point them
>> out.  Thanks a lot!
>> 
>> BR,
>> Jeff (Jiufu)
>> 
>> >
>> > Richard.
>> >
>> >> +	  else
>> >> +	    emit_block_move (target, temp, expr_size (exp),
>> >> +			     (call_param_p ? BLOCK_OP_CALL_PARM
>> >> +					   : BLOCK_OP_NORMAL));
>> >> +	}
>> >> +
>> >>        /* If we emit a nontemporal store, there is nothing else to do.  */
>> >>        else if (nontemporal && emit_storent_insn (target, temp))
>> >>  	;
>> >> diff --git a/gcc/target.def b/gcc/target.def
>> >> index 25f94c19fa7..e141f72a8a3 100644
>> >> --- a/gcc/target.def
>> >> +++ b/gcc/target.def
>> >> @@ -5584,6 +5584,16 @@ Normally, this is not needed.",
>> >>   bool, (const_tree field, machine_mode mode),
>> >>   default_member_type_forces_blk)
>> >>  
>> >> +
>> >> +/* Move block for structure type.  */
>> >> +DEFHOOK
>> >> +(block_move_for_struct,
>> >> + "Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure\n\
>> >> +type. @var{method} is the method if this function invokes block_move.\n\
>> >> +The default definition invokes block_move.",
>> >> + void, (rtx x, rtx y, tree exp, HOST_WIDE_INT method),
>> >> + default_block_move_for_struct)
>> >> +
>> >>  /* See tree-ssa-math-opts.cc:divmod_candidate_p for conditions
>> >>     that gate the divod transform.  */
>> >>  DEFHOOK
>> >> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
>> >> index 12a58456b39..2b96c348419 100644
>> >> --- a/gcc/targhooks.cc
>> >> +++ b/gcc/targhooks.cc
>> >> @@ -2330,6 +2330,13 @@ default_member_type_forces_blk (const_tree, machine_mode)
>> >>    return false;
>> >>  }
>> >>  
>> >> +/* Default version of block_move_for_struct.  */
>> >> +void
>> >> +default_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
>> >> +{
>> >> +  emit_block_move (x, y, expr_size (exp), (enum block_op_methods)method);
>> >> +}
>> >> +
>> >>  /* Default version of canonicalize_comparison.  */
>> >>  
>> >>  void
>> >> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
>> >> index a6a423c1abb..c284a35ee28 100644
>> >> --- a/gcc/targhooks.h
>> >> +++ b/gcc/targhooks.h
>> >> @@ -264,6 +264,7 @@ extern void default_asm_output_ident_directive (const char*);
>> >>  
>> >>  extern scalar_int_mode default_cstore_mode (enum insn_code);
>> >>  extern bool default_member_type_forces_blk (const_tree, machine_mode);
>> >> +extern void default_block_move_for_struct (rtx, rtx, tree, HOST_WIDE_INT);
>> >>  extern void default_atomic_assign_expand_fenv (tree *, tree *, tree *);
>> >>  extern tree build_va_arg_indirect_ref (tree);
>> >>  extern tree std_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
>> >> 
>>
  

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a85d7630b41..e14cecba0ef 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1758,6 +1758,9 @@  static const struct attribute_spec rs6000_attribute_table[] =
 #undef TARGET_NEED_IPA_FN_TARGET_INFO
 #define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
 
+#undef TARGET_BLOCK_MOVE_FOR_STRUCT
+#define TARGET_BLOCK_MOVE_FOR_STRUCT rs600_block_move_for_struct
+
 #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
 #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
 
@@ -23672,6 +23675,47 @@  rs6000_function_value (const_tree valtype,
   return gen_rtx_REG (mode, regno);
 }
 
+/* Subroutine of rs600_block_move_for_struct, to get the internal mode which
+   would be used to move the struct.  */
+static machine_mode
+submode_for_struct_block_move (tree type)
+{
+  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
+
+  /* The sub mode may not be the field's type of the struct.
+     It would be fine to use the mode as if the type is used as a function
+     parameter or return value.  For example: DF for "{double a[4];}", and
+     DI for "{doubel a[3]; long l;}".
+     Here, using the mode as if it is function return type.  */
+  rtx val = rs6000_function_value (type, NULL, 0);
+  return (GET_CODE (val) == PARALLEL) ? GET_MODE (XEXP (XVECEXP (val, 0, 0), 0))
+				      : word_mode;
+}
+
+/* Implement the TARGET_BLOCK_MOVE_FOR_STRUCT hook.  */
+static void
+rs600_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
+{
+  machine_mode mode = submode_for_struct_block_move (TREE_TYPE (exp));
+  int mode_size = GET_MODE_SIZE (mode);
+  int size = UINTVAL (expr_size (exp));
+  if (size < mode_size || (size % mode_size) != 0 || size > 64)
+    {
+      default_block_move_for_struct (x, y, exp, method);
+      return;
+    }
+
+  int len = size / mode_size;
+  for (int i = 0; i < len; i++)
+    {
+      rtx temp = gen_reg_rtx (mode);
+      rtx src = adjust_address (y, mode, mode_size * i);
+      rtx dest = adjust_address (x, mode, mode_size * i);
+      emit_move_insn (temp, src);
+      emit_move_insn (dest, temp);
+    }
+}
+
 /* Define how to find the value returned by a library function
    assuming the value has mode MODE.  */
 rtx
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8572313b308..c8a1c1f30cf 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -1380,6 +1380,12 @@  retain the field's mode.
 Normally, this is not needed.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_BLOCK_MOVE_FOR_STRUCT (rtx @var{x}, rtx @var{y}, tree @var{exp}, HOST_WIDE_INT @var{method})
+Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure
+type. @var{method} is the method if this function invokes block_move.
+The default definition invokes block_move.
+@end deftypefn
+
 @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
 Define this macro as an expression for the alignment of a type (given
 by @var{type} as a tree node) if the alignment computed in the usual
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 986e8f0da09..f0f525a2008 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1226,6 +1226,8 @@  to aligning a bit-field within the structure.
 
 @hook TARGET_MEMBER_TYPE_FORCES_BLK
 
+@hook TARGET_BLOCK_MOVE_FOR_STRUCT
+
 @defmac ROUND_TYPE_ALIGN (@var{type}, @var{computed}, @var{specified})
 Define this macro as an expression for the alignment of a type (given
 by @var{type} as a tree node) if the alignment computed in the usual
diff --git a/gcc/expr.cc b/gcc/expr.cc
index c6917fbf7bd..734dc07a76b 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6504,9 +6504,17 @@  store_expr (tree exp, rtx target, int call_param_p,
 	emit_group_store (target, temp, TREE_TYPE (exp),
 			  int_size_in_bytes (TREE_TYPE (exp)));
       else if (GET_MODE (temp) == BLKmode)
-	emit_block_move (target, temp, expr_size (exp),
-			 (call_param_p
-			  ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
+	{
+	  if (TREE_CODE (TREE_TYPE (exp)) == RECORD_TYPE)
+	    targetm.block_move_for_struct (target, temp, exp,
+					   call_param_p ? BLOCK_OP_CALL_PARM
+							: BLOCK_OP_NORMAL);
+	  else
+	    emit_block_move (target, temp, expr_size (exp),
+			     (call_param_p ? BLOCK_OP_CALL_PARM
+					   : BLOCK_OP_NORMAL));
+	}
+
       /* If we emit a nontemporal store, there is nothing else to do.  */
       else if (nontemporal && emit_storent_insn (target, temp))
 	;
diff --git a/gcc/target.def b/gcc/target.def
index 25f94c19fa7..e141f72a8a3 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5584,6 +5584,16 @@  Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+
+/* Move block for structure type.  */
+DEFHOOK
+(block_move_for_struct,
+ "Move from @var{y} to @var{x}, where @var{y} is as @var{exp} in structure\n\
+type. @var{method} is the method if this function invokes block_move.\n\
+The default definition invokes block_move.",
+ void, (rtx x, rtx y, tree exp, HOST_WIDE_INT method),
+ default_block_move_for_struct)
+
 /* See tree-ssa-math-opts.cc:divmod_candidate_p for conditions
    that gate the divod transform.  */
 DEFHOOK
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 12a58456b39..2b96c348419 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -2330,6 +2330,13 @@  default_member_type_forces_blk (const_tree, machine_mode)
   return false;
 }
 
+/* Default version of block_move_for_struct.  */
+void
+default_block_move_for_struct (rtx x, rtx y, tree exp, HOST_WIDE_INT method)
+{
+  emit_block_move (x, y, expr_size (exp), (enum block_op_methods)method);
+}
+
 /* Default version of canonicalize_comparison.  */
 
 void
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index a6a423c1abb..c284a35ee28 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -264,6 +264,7 @@  extern void default_asm_output_ident_directive (const char*);
 
 extern scalar_int_mode default_cstore_mode (enum insn_code);
 extern bool default_member_type_forces_blk (const_tree, machine_mode);
+extern void default_block_move_for_struct (rtx, rtx, tree, HOST_WIDE_INT);
 extern void default_atomic_assign_expand_fenv (tree *, tree *, tree *);
 extern tree build_va_arg_indirect_ref (tree);
 extern tree std_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);