loading float member of parameter stored via int registers

Message ID 20221221062736.78036-1-guojiufu@linux.ibm.com
State New
Headers
Series loading float member of parameter stored via int registers |

Commit Message

Jiufu Guo Dec. 21, 2022, 6:27 a.m. UTC
  Hi,

This patch is fixing an issue about parameter accessing if the
parameter is struct type and passed through integer registers, and
there is floating member is accessed. Like below code:

typedef struct DF {double a[4]; long l; } DF;
double foo_df (DF arg){return arg.a[3];}

On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
generated.  While instruction "mtvsrd 1, 6" would be enough for
this case.

This patch updates the behavior when loading floating members of a
parameter: if that floating member is stored via integer register,
then loading it as integer mode first, and converting it to floating
mode.

I also thought of a method: before storing the register to stack,
convert it to float mode first. While there are some cases that may
still prefer to keep an integer register store.                                                  

Bootstrap and regtest passes on ppc64{,le}.
I would ask for help to review for comments and if this patch is
acceptable for the trunk.


BR,
Jeff (Jiufu)

	PR target/108073

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
	macro definition.
	(rs6000_loading_int_convert_to_float): New hook implement.
	* doc/tm.texi: Regenerated.
	* doc/tm.texi.in (loading_int_convert_to_float): New hook.
	* expr.cc (expand_expr_real_1): Updated to use the new hook.
	* target.def (loading_int_convert_to_float): New hook.

gcc/testsuite/ChangeLog:

	* g++.target/powerpc/pr102024.C: Update.
	* gcc.target/powerpc/pr108073.c: New test.

---
 gcc/config/rs6000/rs6000.cc                 | 70 +++++++++++++++++++++
 gcc/doc/tm.texi                             |  6 ++
 gcc/doc/tm.texi.in                          |  2 +
 gcc/expr.cc                                 | 15 +++++
 gcc/target.def                              | 11 ++++
 gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
 7 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
  

Comments

Richard Biener Dec. 21, 2022, 7:30 a.m. UTC | #1
On Wed, 21 Dec 2022, Jiufu Guo wrote:

> Hi,
> 
> This patch is fixing an issue about parameter accessing if the
> parameter is struct type and passed through integer registers, and
> there is floating member is accessed. Like below code:
> 
> typedef struct DF {double a[4]; long l; } DF;
> double foo_df (DF arg){return arg.a[3];}
> 
> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
> generated.  While instruction "mtvsrd 1, 6" would be enough for
> this case.

So why do we end up spilling for ppc?

struct X { int i; float f; };

float foo (struct X x)
{
  return x.f;
}

does pass the structure in $RDI on x86_64 and we manage (with 
optimization, with -O0 we spill) to generate

        shrq    $32, %rdi
        movd    %edi, %xmm0

and RTL expansion generates

(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
        (reg:DI 5 di [ x ])) "t.c":4:1 -1
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
            (set (reg:DI 85)
                (ashiftrt:DI (reg/v:DI 83 [ x ])
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) "t.c":5:11 -1
     (nil))
(insn 7 6 8 2 (set (reg:SI 86)
        (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
     (nil))

I would imagine that for the ppc case we only see the subreg here
which should be even easier to optimize.

So how's this not fixable by providing proper patterns / subreg
capabilities?  Looking a bit at the RTL we have the issue might
be that nothing seems to handle CSE of

(note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
                (const_int 56 [0x38])) [2 arg+24 S8 A64])
        (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
     (expr_list:REG_DEAD (reg:DI 6 6)
        (nil)))
(note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
(note 10 7 15 2 NOTE_INSN_DELETED)
(insn 15 10 16 2 (set (reg/i:DF 33 1)
        (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
                (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40 
576 {*movdf_hardfloat64}
     (nil))

Possibly because the store and load happen in a different mode?  Can
you see why CSE doesn't handle this (producing a subreg)?  On
the GIMPLE side we'd happily do that (but we don't see the argument
setup).

Thanks,
Richard.

> This patch updates the behavior when loading floating members of a
> parameter: if that floating member is stored via integer register,
> then loading it as integer mode first, and converting it to floating
> mode.
> 
> I also thought of a method: before storing the register to stack,
> convert it to float mode first. While there are some cases that may
> still prefer to keep an integer register store.                                                  
> 
> Bootstrap and regtest passes on ppc64{,le}.
> I would ask for help to review for comments and if this patch is
> acceptable for the trunk.
> 
> 
> BR,
> Jeff (Jiufu)
> 
> 	PR target/108073
> 
> gcc/ChangeLog:
> 
> 	* config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
> 	macro definition.
> 	(rs6000_loading_int_convert_to_float): New hook implement.
> 	* doc/tm.texi: Regenerated.
> 	* doc/tm.texi.in (loading_int_convert_to_float): New hook.
> 	* expr.cc (expand_expr_real_1): Updated to use the new hook.
> 	* target.def (loading_int_convert_to_float): New hook.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.target/powerpc/pr102024.C: Update.
> 	* gcc.target/powerpc/pr108073.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc                 | 70 +++++++++++++++++++++
>  gcc/doc/tm.texi                             |  6 ++
>  gcc/doc/tm.texi.in                          |  2 +
>  gcc/expr.cc                                 | 15 +++++
>  gcc/target.def                              | 11 ++++
>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
>  7 files changed, 129 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index b3a609f3aa3..af676eea276 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1559,6 +1559,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN invalid_arg_for_unprototyped_fn
>  
> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT rs6000_loading_int_convert_to_float
> +
>  #undef TARGET_MD_ASM_ADJUST
>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
>  
> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree typelist, const_tree funcdecl, const
>  	  : NULL;
>  }
>  
> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
> +static rtx
> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx base)
> +{
> +  rtx src_base = XEXP (source, 0);
> +  poly_uint64 offset = MEM_OFFSET (source);
> +
> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
> +    {
> +      offset += INTVAL (XEXP (src_base, 1));
> +      src_base = XEXP (src_base, 0);
> +    }
> +
> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
> +    return NULL_RTX;
> +
> +  rtx temp_reg = gen_reg_rtx (word_mode);
> +  rtx temp_mem = copy_rtx (source);
> +  PUT_MODE (temp_mem, word_mode);
> +
> +  /* DI->DF */
> +  if (word_mode == DImode && mode == DFmode)
> +    {
> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> +	{
> +	  emit_move_insn (temp_reg, temp_mem);
> +	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode, 0);
> +	  rtx target_reg = gen_reg_rtx (mode);
> +	  emit_move_insn (target_reg, float_subreg);
> +	  return target_reg;
> +	}
> +      return NULL_RTX;
> +    }
> +
> +  /* Sub DI#->SF */
> +  if (word_mode == DImode && mode == SFmode)
> +    {
> +      poly_uint64 byte_off = 0;
> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> +	byte_off = 0;
> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
> +			   GET_MODE_SIZE (word_mode)))
> +	byte_off = GET_MODE_SIZE (mode);
> +      else
> +	return NULL_RTX;
> +
> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
> +      emit_move_insn (temp_reg, temp_mem);
> +
> +      /* little endia only? */
> +      poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
> +      if (known_eq (byte_off, high_off))
> +	{
> +	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
> +				   GET_MODE_PRECISION (SImode), temp_reg, 0);
> +	}
> +      rtx subreg_si = gen_reg_rtx (SImode);
> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0);
> +      rtx target_reg = gen_reg_rtx (mode);
> +      emit_move_insn (target_reg, float_subreg);
> +      return target_reg;
> +    }
> +
> +  return NULL_RTX;
> +}
> +
>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
>     setup by using __stack_chk_fail_local hidden function instead of
>     calling __stack_chk_fail directly.  Otherwise it is better to call
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 8fe49c2ba3d..10f94af553d 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -11933,6 +11933,12 @@ or when the back end is in a partially-initialized state.
>  outside of any function scope.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
> +If the target is protifiable to load an integer in word_mode from
> +@var{source} which is based on @var{base}, then convert to a floating
> +point value in @var{mode}.
> +@end deftypefn
> +
>  @defmac TARGET_OBJECT_SUFFIX
>  Define this macro to be a C string representing the suffix for object
>  files on your target machine.  If you do not define this macro, GCC will
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 62c49ac46de..1ca6a671d86 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
>  
>  @hook TARGET_SET_CURRENT_FUNCTION
>  
> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
> +
>  @defmac TARGET_OBJECT_SUFFIX
>  Define this macro to be a C string representing the suffix for object
>  files on your target machine.  If you do not define this macro, GCC will
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index d9407432ea5..466079220e7 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
>  	    && modifier != EXPAND_WRITE)
>  	  op0 = flip_storage_order (mode1, op0);
>  
> +	/* Accessing float field of struct parameter which passed via integer
> +	   registers.  */
> +	if (targetm.loading_int_convert_to_float && mode == mode1
> +	    && GET_MODE_CLASS (mode) == MODE_FLOAT
> +	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
> +	    && REG_P (DECL_INCOMING_RTL (tem))
> +	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
> +	    && MEM_OFFSET_KNOWN_P (op0))
> +	  {
> +	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
> +							    DECL_RTL (tem));
> +	    if (res)
> +	      op0 = res;
> +	  }
> +
>  	if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
>  	    || modifier == EXPAND_CONST_ADDRESS
>  	    || modifier == EXPAND_INITIALIZER)
> diff --git a/gcc/target.def b/gcc/target.def
> index 082a7c62f34..837ce902489 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -4491,6 +4491,17 @@ original and the returned modes should be @code{MODE_INT}.",
>   (machine_mode mode),
>   default_preferred_doloop_mode)
>  
> +
> +/* Loading an integer value from memory first, then convert (bitcast) to\n\n
> +   floating point value, if target is able to support such behavior.  */
> +DEFHOOK
> +(loading_int_convert_to_float,
> +"If the target is protifiable to load an integer in word_mode from\n\
> +@var{source} which is based on @var{base}, then convert to a floating\n\
> +point value in @var{mode}.",
> + rtx, (machine_mode mode, rtx source, rtx base),
> + NULL)
> +
>  /* Returns true for a legitimate combined insn.  */
>  DEFHOOK
>  (legitimate_combined_insn,
> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C
> index 769585052b5..c8995cae707 100644
> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> @@ -5,7 +5,7 @@
>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>  // generates a psabi warning and passes arguments in GPRs.
>  
> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>  
>  struct a_thing
>  {
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> new file mode 100644
> index 00000000000..aa02de56405
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -save-temps" } */
> +
> +typedef struct DF {double a[4]; long l; } DF;
> +typedef struct SF {float a[4];short l; } SF;
> +
> +/* Each of below function contains one mtvsrd.  */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
> +
> +double gd = 4.0;
> +float gf1 = 3.0f, gf2 = 2.0f;
> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
> +
> +int main()
> +{
> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) == gf2))
> +    __builtin_abort ();
> +  return 0;
> +}
> +
>
  
Jiufu Guo Dec. 22, 2022, 7:25 a.m. UTC | #2
Hi,

On 2022-12-21 15:30, Richard Biener wrote:
> On Wed, 21 Dec 2022, Jiufu Guo wrote:
> 
>> Hi,
>> 
>> This patch is fixing an issue about parameter accessing if the
>> parameter is struct type and passed through integer registers, and
>> there is floating member is accessed. Like below code:
>> 
>> typedef struct DF {double a[4]; long l; } DF;
>> double foo_df (DF arg){return arg.a[3];}
>> 
>> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>> generated.  While instruction "mtvsrd 1, 6" would be enough for
>> this case.
> 
> So why do we end up spilling for PPC?

Good question! According to GCC source code (in function.cc/expr.cc),
it is common behavior: using "word_mode" to store the parameter to 
stack,
And using the field's mode (e.g. float mode) to load from the stack.
But with some tries, I fail to construct cases on many platforms.
So, I convert the fix to a target hook and implemented the rs6000 part
first.

> 
> struct X { int i; float f; };
> 
> float foo (struct X x)
> {
>   return x.f;
> }
> 
> does pass the structure in $RDI on x86_64 and we manage (with
> optimization, with -O0 we spill) to generate
> 
>         shrq    $32, %rdi
>         movd    %edi, %xmm0
> 
> and RTL expansion generates
> 
> (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
>         (reg:DI 5 di [ x ])) "t.c":4:1 -1
>      (nil))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (parallel [
>             (set (reg:DI 85)
>                 (ashiftrt:DI (reg/v:DI 83 [ x ])
>                     (const_int 32 [0x20])))
>             (clobber (reg:CC 17 flags))
>         ]) "t.c":5:11 -1
>      (nil))
> (insn 7 6 8 2 (set (reg:SI 86)
>         (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
>      (nil))
> 
> I would imagine that for the ppc case we only see the subreg here
> which should be even easier to optimize.
> 
> So how's this not fixable by providing proper patterns / subreg
> capabilities?  Looking a bit at the RTL we have the issue might
> be that nothing seems to handle CSE of
> 

This case is also related to 'parameter on struct', PR89310 is
just for this case. On trunk, it is fixed.
One difference: the parameter is in DImode, and passed via an
integer register for "{int i; float f;}".
But for "{double a[4]; long l;}", the parameter is in BLKmode,
and stored to stack during the argument setup.

> (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
>                 (const_int 56 [0x38])) [2 arg+24 S8 A64])
>         (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg:DI 6 6)
>         (nil)))
> (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
> (note 10 7 15 2 NOTE_INSN_DELETED)
> (insn 15 10 16 2 (set (reg/i:DF 33 1)
>         (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
>                 (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) 
> "t.c":2:40
> 576 {*movdf_hardfloat64}
>      (nil))
> 
> Possibly because the store and load happen in a different mode?  Can
> you see why CSE doesn't handle this (producing a subreg)?  On

Yes, exactly! For "{double a[4]; long l;}", because the store and load
are using a different mode, and then CSE does not optimize it.  This
patch makes the store and load using the same mode (DImode), and then
leverage CSE to handle it.

> the GIMPLE side we'd happily do that (but we don't see the argument
> setup).

Thanks for your comments!


BR,
Jeff (Jiufu)

> 
> Thanks,
> Richard.
> 
>> This patch updates the behavior when loading floating members of a
>> parameter: if that floating member is stored via integer register,
>> then loading it as integer mode first, and converting it to floating
>> mode.
>> 
>> I also thought of a method: before storing the register to stack,
>> convert it to float mode first. While there are some cases that may
>> still prefer to keep an integer register store.
>> 
>> Bootstrap and regtest passes on ppc64{,le}.
>> I would ask for help to review for comments and if this patch is
>> acceptable for the trunk.
>> 
>> 
>> BR,
>> Jeff (Jiufu)
>> 
>> 	PR target/108073
>> 
>> gcc/ChangeLog:
>> 
>> 	* config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
>> 	macro definition.
>> 	(rs6000_loading_int_convert_to_float): New hook implement.
>> 	* doc/tm.texi: Regenerated.
>> 	* doc/tm.texi.in (loading_int_convert_to_float): New hook.
>> 	* expr.cc (expand_expr_real_1): Updated to use the new hook.
>> 	* target.def (loading_int_convert_to_float): New hook.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 	* g++.target/powerpc/pr102024.C: Update.
>> 	* gcc.target/powerpc/pr108073.c: New test.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc                 | 70 
>> +++++++++++++++++++++
>>  gcc/doc/tm.texi                             |  6 ++
>>  gcc/doc/tm.texi.in                          |  2 +
>>  gcc/expr.cc                                 | 15 +++++
>>  gcc/target.def                              | 11 ++++
>>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
>>  7 files changed, 129 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index b3a609f3aa3..af676eea276 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1559,6 +1559,9 @@ static const struct attribute_spec 
>> rs6000_attribute_table[] =
>>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
>>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
>> invalid_arg_for_unprototyped_fn
>> 
>> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT 
>> rs6000_loading_int_convert_to_float
>> +
>>  #undef TARGET_MD_ASM_ADJUST
>>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
>> 
>> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree 
>> typelist, const_tree funcdecl, const
>>  	  : NULL;
>>  }
>> 
>> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
>> +static rtx
>> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, 
>> rtx base)
>> +{
>> +  rtx src_base = XEXP (source, 0);
>> +  poly_uint64 offset = MEM_OFFSET (source);
>> +
>> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
>> +    {
>> +      offset += INTVAL (XEXP (src_base, 1));
>> +      src_base = XEXP (src_base, 0);
>> +    }
>> +
>> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
>> +    return NULL_RTX;
>> +
>> +  rtx temp_reg = gen_reg_rtx (word_mode);
>> +  rtx temp_mem = copy_rtx (source);
>> +  PUT_MODE (temp_mem, word_mode);
>> +
>> +  /* DI->DF */
>> +  if (word_mode == DImode && mode == DFmode)
>> +    {
>> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>> +	{
>> +	  emit_move_insn (temp_reg, temp_mem);
>> +	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode, 
>> 0);
>> +	  rtx target_reg = gen_reg_rtx (mode);
>> +	  emit_move_insn (target_reg, float_subreg);
>> +	  return target_reg;
>> +	}
>> +      return NULL_RTX;
>> +    }
>> +
>> +  /* Sub DI#->SF */
>> +  if (word_mode == DImode && mode == SFmode)
>> +    {
>> +      poly_uint64 byte_off = 0;
>> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>> +	byte_off = 0;
>> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
>> +			   GET_MODE_SIZE (word_mode)))
>> +	byte_off = GET_MODE_SIZE (mode);
>> +      else
>> +	return NULL_RTX;
>> +
>> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
>> +      emit_move_insn (temp_reg, temp_mem);
>> +
>> +      /* little endia only? */
>> +      poly_uint64 high_off = subreg_highpart_offset (SImode, 
>> word_mode);
>> +      if (known_eq (byte_off, high_off))
>> +	{
>> +	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
>> +				   GET_MODE_PRECISION (SImode), temp_reg, 0);
>> +	}
>> +      rtx subreg_si = gen_reg_rtx (SImode);
>> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
>> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, 
>> SImode, 0);
>> +      rtx target_reg = gen_reg_rtx (mode);
>> +      emit_move_insn (target_reg, float_subreg);
>> +      return target_reg;
>> +    }
>> +
>> +  return NULL_RTX;
>> +}
>> +
>>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
>>     setup by using __stack_chk_fail_local hidden function instead of
>>     calling __stack_chk_fail directly.  Otherwise it is better to call
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 8fe49c2ba3d..10f94af553d 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -11933,6 +11933,12 @@ or when the back end is in a 
>> partially-initialized state.
>>  outside of any function scope.
>>  @end deftypefn
>> 
>> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT 
>> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
>> +If the target is protifiable to load an integer in word_mode from
>> +@var{source} which is based on @var{base}, then convert to a floating
>> +point value in @var{mode}.
>> +@end deftypefn
>> +
>>  @defmac TARGET_OBJECT_SUFFIX
>>  Define this macro to be a C string representing the suffix for object
>>  files on your target machine.  If you do not define this macro, GCC 
>> will
>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> index 62c49ac46de..1ca6a671d86 100644
>> --- a/gcc/doc/tm.texi.in
>> +++ b/gcc/doc/tm.texi.in
>> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
>> 
>>  @hook TARGET_SET_CURRENT_FUNCTION
>> 
>> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> +
>>  @defmac TARGET_OBJECT_SUFFIX
>>  Define this macro to be a C string representing the suffix for object
>>  files on your target machine.  If you do not define this macro, GCC 
>> will
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index d9407432ea5..466079220e7 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, 
>> machine_mode tmode,
>>  	    && modifier != EXPAND_WRITE)
>>  	  op0 = flip_storage_order (mode1, op0);
>> 
>> +	/* Accessing float field of struct parameter which passed via 
>> integer
>> +	   registers.  */
>> +	if (targetm.loading_int_convert_to_float && mode == mode1
>> +	    && GET_MODE_CLASS (mode) == MODE_FLOAT
>> +	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
>> +	    && REG_P (DECL_INCOMING_RTL (tem))
>> +	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
>> +	    && MEM_OFFSET_KNOWN_P (op0))
>> +	  {
>> +	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
>> +							    DECL_RTL (tem));
>> +	    if (res)
>> +	      op0 = res;
>> +	  }
>> +
>>  	if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
>>  	    || modifier == EXPAND_CONST_ADDRESS
>>  	    || modifier == EXPAND_INITIALIZER)
>> diff --git a/gcc/target.def b/gcc/target.def
>> index 082a7c62f34..837ce902489 100644
>> --- a/gcc/target.def
>> +++ b/gcc/target.def
>> @@ -4491,6 +4491,17 @@ original and the returned modes should be 
>> @code{MODE_INT}.",
>>   (machine_mode mode),
>>   default_preferred_doloop_mode)
>> 
>> +
>> +/* Loading an integer value from memory first, then convert (bitcast) 
>> to\n\n
>> +   floating point value, if target is able to support such behavior.  
>> */
>> +DEFHOOK
>> +(loading_int_convert_to_float,
>> +"If the target is protifiable to load an integer in word_mode from\n\
>> +@var{source} which is based on @var{base}, then convert to a 
>> floating\n\
>> +point value in @var{mode}.",
>> + rtx, (machine_mode mode, rtx source, rtx base),
>> + NULL)
>> +
>>  /* Returns true for a legitimate combined insn.  */
>>  DEFHOOK
>>  (legitimate_combined_insn,
>> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
>> b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> index 769585052b5..c8995cae707 100644
>> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> @@ -5,7 +5,7 @@
>>  // Test that a zero-width bit field in an otherwise homogeneous 
>> aggregate
>>  // generates a psabi warning and passes arguments in GPRs.
>> 
>> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>> 
>>  struct a_thing
>>  {
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> new file mode 100644
>> index 00000000000..aa02de56405
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> @@ -0,0 +1,24 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -save-temps" } */
>> +
>> +typedef struct DF {double a[4]; long l; } DF;
>> +typedef struct SF {float a[4];short l; } SF;
>> +
>> +/* Each of below function contains one mtvsrd.  */
>> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { 
>> has_arch_ppc64 && has_arch_pwr8 } } } } */
>> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
>> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
>> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
>> +
>> +double gd = 4.0;
>> +float gf1 = 3.0f, gf2 = 2.0f;
>> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
>> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
>> +
>> +int main()
>> +{
>> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) == 
>> gf2))
>> +    __builtin_abort ();
>> +  return 0;
>> +}
>> +
>>
  
Richard Biener Dec. 22, 2022, 7:54 a.m. UTC | #3
On Thu, 22 Dec 2022, guojiufu wrote:

> Hi,
> 
> On 2022-12-21 15:30, Richard Biener wrote:
> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
> > 
> >> Hi,
> >> 
> >> This patch is fixing an issue about parameter accessing if the
> >> parameter is struct type and passed through integer registers, and
> >> there is floating member is accessed. Like below code:
> >> 
> >> typedef struct DF {double a[4]; long l; } DF;
> >> double foo_df (DF arg){return arg.a[3];}
> >> 
> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
> >> this case.
> > 
> > So why do we end up spilling for PPC?
> 
> Good question! According to GCC source code (in function.cc/expr.cc),
> it is common behavior: using "word_mode" to store the parameter to stack,
> And using the field's mode (e.g. float mode) to load from the stack.
> But with some tries, I fail to construct cases on many platforms.
> So, I convert the fix to a target hook and implemented the rs6000 part
> first.
> 
> > 
> > struct X { int i; float f; };
> > 
> > float foo (struct X x)
> > {
> >   return x.f;
> > }
> > 
> > does pass the structure in $RDI on x86_64 and we manage (with
> > optimization, with -O0 we spill) to generate
> > 
> >         shrq    $32, %rdi
> >         movd    %edi, %xmm0
> > 
> > and RTL expansion generates
> > 
> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
> >         (reg:DI 5 di [ x ])) "t.c":4:1 -1
> >      (nil))
> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> > (insn 6 3 7 2 (parallel [
> >             (set (reg:DI 85)
> >                 (ashiftrt:DI (reg/v:DI 83 [ x ])
> >                     (const_int 32 [0x20])))
> >             (clobber (reg:CC 17 flags))
> >         ]) "t.c":5:11 -1
> >      (nil))
> > (insn 7 6 8 2 (set (reg:SI 86)
> >         (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
> >      (nil))
> > 
> > I would imagine that for the ppc case we only see the subreg here
> > which should be even easier to optimize.
> > 
> > So how's this not fixable by providing proper patterns / subreg
> > capabilities?  Looking a bit at the RTL we have the issue might
> > be that nothing seems to handle CSE of
> > 
> 
> This case is also related to 'parameter on struct', PR89310 is
> just for this case. On trunk, it is fixed.
> One difference: the parameter is in DImode, and passed via an
> integer register for "{int i; float f;}".
> But for "{double a[4]; long l;}", the parameter is in BLKmode,
> and stored to stack during the argument setup.

OK, so this would be another case for "heuristics" to use
sth different than word_mode for storing, but of course
the arguments are in integer registers and using different
modes can for example prohibit store-multiple instruction use.

As I said in the related thread an RTL expansion time "SRA"
with the incoming argument assignment in mind could make
more optimal decisions for these kind of special-cases.

> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
> >                 (const_int 56 [0x38])) [2 arg+24 S8 A64])
> >         (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
> >      (expr_list:REG_DEAD (reg:DI 6 6)
> >         (nil)))
> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
> > (note 10 7 15 2 NOTE_INSN_DELETED)
> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
> >         (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
> >                 (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
> > 576 {*movdf_hardfloat64}
> >      (nil))
> > 
> > Possibly because the store and load happen in a different mode?  Can
> > you see why CSE doesn't handle this (producing a subreg)?  On
> 
> Yes, exactly! For "{double a[4]; long l;}", because the store and load
> are using a different mode, and then CSE does not optimize it.  This
> patch makes the store and load using the same mode (DImode), and then
> leverage CSE to handle it.

So can we instead fix CSE to consider replacing insn 15 above with

 (insn 15 (set (reg/i:DF 33 1)
               (subreg:DF (reg/f:DI 6 6)))

?  One peculiarity is that this introduces a hardreg use in this
special case.

Richard.

> > the GIMPLE side we'd happily do that (but we don't see the argument
> > setup).
> 
> Thanks for your comments!
> 
> 
> BR,
> Jeff (Jiufu)
> 
> > 
> > Thanks,
> > Richard.
> > 
> >> This patch updates the behavior when loading floating members of a
> >> parameter: if that floating member is stored via integer register,
> >> then loading it as integer mode first, and converting it to floating
> >> mode.
> >> 
> >> I also thought of a method: before storing the register to stack,
> >> convert it to float mode first. While there are some cases that may
> >> still prefer to keep an integer register store.
> >> 
> >> Bootstrap and regtest passes on ppc64{,le}.
> >> I would ask for help to review for comments and if this patch is
> >> acceptable for the trunk.
> >> 
> >> 
> >> BR,
> >> Jeff (Jiufu)
> >> 
> >>  PR target/108073
> >> 
> >> gcc/ChangeLog:
> >> 
> >>  * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
> >>  macro definition.
> >>  (rs6000_loading_int_convert_to_float): New hook implement.
> >>  * doc/tm.texi: Regenerated.
> >>  * doc/tm.texi.in (loading_int_convert_to_float): New hook.
> >>  * expr.cc (expand_expr_real_1): Updated to use the new hook.
> >>  * target.def (loading_int_convert_to_float): New hook.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >>  * g++.target/powerpc/pr102024.C: Update.
> >>  * gcc.target/powerpc/pr108073.c: New test.
> >> 
> >> ---
> >> gcc/config/rs6000/rs6000.cc                 | 70 +++++++++++++++++++++
> >>  gcc/doc/tm.texi                             |  6 ++
> >>  gcc/doc/tm.texi.in                          |  2 +
> >>  gcc/expr.cc                                 | 15 +++++
> >>  gcc/target.def                              | 11 ++++
> >>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
> >>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
> >>  7 files changed, 129 insertions(+), 1 deletion(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> 
> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >> index b3a609f3aa3..af676eea276 100644
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec 
> >> rs6000_attribute_table[] =
> >>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
> >>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
> >> invalid_arg_for_unprototyped_fn
> >> 
> >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> rs6000_loading_int_convert_to_float
> >> +
> >>  #undef TARGET_MD_ASM_ADJUST
> >>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
> >> 
> >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree 
> >> typelist, const_tree funcdecl, const
> >>     : NULL;
> >>  }
> >> 
> >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
> >> +static rtx
> >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx
> >> base)
> >> +{
> >> +  rtx src_base = XEXP (source, 0);
> >> +  poly_uint64 offset = MEM_OFFSET (source);
> >> +
> >> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
> >> +    {
> >> +      offset += INTVAL (XEXP (src_base, 1));
> >> +      src_base = XEXP (src_base, 0);
> >> +    }
> >> +
> >> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
> >> +    return NULL_RTX;
> >> +
> >> +  rtx temp_reg = gen_reg_rtx (word_mode);
> >> +  rtx temp_mem = copy_rtx (source);
> >> +  PUT_MODE (temp_mem, word_mode);
> >> +
> >> +  /* DI->DF */
> >> +  if (word_mode == DImode && mode == DFmode)
> >> +    {
> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> >> +	{
> >> +	  emit_move_insn (temp_reg, temp_mem);
> >> +	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode,
> >> 0);
> >> +	  rtx target_reg = gen_reg_rtx (mode);
> >> +	  emit_move_insn (target_reg, float_subreg);
> >> +	  return target_reg;
> >> +	}
> >> +      return NULL_RTX;
> >> +    }
> >> +
> >> +  /* Sub DI#->SF */
> >> +  if (word_mode == DImode && mode == SFmode)
> >> +    {
> >> +      poly_uint64 byte_off = 0;
> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> >> +	byte_off = 0;
> >> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
> >> +			   GET_MODE_SIZE (word_mode)))
> >> +	byte_off = GET_MODE_SIZE (mode);
> >> +      else
> >> +	return NULL_RTX;
> >> +
> >> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
> >> +      emit_move_insn (temp_reg, temp_mem);
> >> +
> >> +      /* little endia only? */
> >> +      poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
> >> +      if (known_eq (byte_off, high_off))
> >> +	{
> >> +	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
> >> +				   GET_MODE_PRECISION (SImode), temp_reg, 0);
> >> +	}
> >> +      rtx subreg_si = gen_reg_rtx (SImode);
> >> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
> >> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0);
> >> +      rtx target_reg = gen_reg_rtx (mode);
> >> +      emit_move_insn (target_reg, float_subreg);
> >> +      return target_reg;
> >> +    }
> >> +
> >> +  return NULL_RTX;
> >> +}
> >> +
> >>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
> >>     setup by using __stack_chk_fail_local hidden function instead of
> >>     calling __stack_chk_fail directly.  Otherwise it is better to call
> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> >> index 8fe49c2ba3d..10f94af553d 100644
> >> --- a/gcc/doc/tm.texi
> >> +++ b/gcc/doc/tm.texi
> >> @@ -11933,6 +11933,12 @@ or when the back end is in a 
> >> partially-initialized state.
> >>  outside of any function scope.
> >>  @end deftypefn
> >> 
> >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
> >> +If the target is protifiable to load an integer in word_mode from
> >> +@var{source} which is based on @var{base}, then convert to a floating
> >> +point value in @var{mode}.
> >> +@end deftypefn
> >> +
> >>  @defmac TARGET_OBJECT_SUFFIX
> >>  Define this macro to be a C string representing the suffix for object
> >>  files on your target machine.  If you do not define this macro, GCC 
> >> will
> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> >> index 62c49ac46de..1ca6a671d86 100644
> >> --- a/gcc/doc/tm.texi.in
> >> +++ b/gcc/doc/tm.texi.in
> >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
> >> 
> >>  @hook TARGET_SET_CURRENT_FUNCTION
> >> 
> >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> +
> >>  @defmac TARGET_OBJECT_SUFFIX
> >>  Define this macro to be a C string representing the suffix for object
> >>  files on your target machine.  If you do not define this macro, GCC 
> >> will
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> index d9407432ea5..466079220e7 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, 
> >> machine_mode tmode,
> >>       && modifier != EXPAND_WRITE)
> >>     op0 = flip_storage_order (mode1, op0);
> >> 
> >> +	/* Accessing float field of struct parameter which passed via integer
> >> +	   registers.  */
> >> +	if (targetm.loading_int_convert_to_float && mode == mode1
> >> +	    && GET_MODE_CLASS (mode) == MODE_FLOAT
> >> +	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
> >> +	    && REG_P (DECL_INCOMING_RTL (tem))
> >> +	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
> >> +	    && MEM_OFFSET_KNOWN_P (op0))
> >> +	  {
> >> +	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
> >> +							    DECL_RTL (tem));
> >> +	    if (res)
> >> +	      op0 = res;
> >> +	  }
> >> +
> >>   if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
> >>       || modifier == EXPAND_CONST_ADDRESS
> >>       || modifier == EXPAND_INITIALIZER)
> >> diff --git a/gcc/target.def b/gcc/target.def
> >> index 082a7c62f34..837ce902489 100644
> >> --- a/gcc/target.def
> >> +++ b/gcc/target.def
> >> @@ -4491,6 +4491,17 @@ original and the returned modes should be 
> >> @code{MODE_INT}.",
> >>   (machine_mode mode),
> >>   default_preferred_doloop_mode)
> >> 
> >> +
> >> +/* Loading an integer value from memory first, then convert (bitcast)
> >> to\n\n
> >> +   floating point value, if target is able to support such behavior.  */
> >> +DEFHOOK
> >> +(loading_int_convert_to_float,
> >> +"If the target is protifiable to load an integer in word_mode from\n\
> >> +@var{source} which is based on @var{base}, then convert to a floating\n\
> >> +point value in @var{mode}.",
> >> + rtx, (machine_mode mode, rtx source, rtx base),
> >> + NULL)
> >> +
> >>  /* Returns true for a legitimate combined insn.  */
> >>  DEFHOOK
> >>  (legitimate_combined_insn,
> >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> b/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> index 769585052b5..c8995cae707 100644
> >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> @@ -5,7 +5,7 @@
> >> // Test that a zero-width bit field in an otherwise homogeneous aggregate
> >>  // generates a psabi warning and passes arguments in GPRs.
> >> 
> >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
> >> 
> >>  struct a_thing
> >>  {
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> new file mode 100644
> >> index 00000000000..aa02de56405
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> @@ -0,0 +1,24 @@
> >> +/* { dg-do run } */
> >> +/* { dg-options "-O2 -save-temps" } */
> >> +
> >> +typedef struct DF {double a[4]; long l; } DF;
> >> +typedef struct SF {float a[4];short l; } SF;
> >> +
> >> +/* Each of below function contains one mtvsrd.  */
> >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target {
> >> has_arch_ppc64 && has_arch_pwr8 } } } } */
> >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
> >> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
> >> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
> >> +
> >> +double gd = 4.0;
> >> +float gf1 = 3.0f, gf2 = 2.0f;
> >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
> >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
> >> +
> >> +int main()
> >> +{
> >> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) ==
> >> gf2))
> >> +    __builtin_abort ();
> >> +  return 0;
> >> +}
> >> +
> >> 
> 
>
  
Jiufu Guo Dec. 22, 2022, 9:02 a.m. UTC | #4
Hi,

Richard Biener <rguenther@suse.de> writes:

> On Thu, 22 Dec 2022, guojiufu wrote:
>
>> Hi,
>> 
>> On 2022-12-21 15:30, Richard Biener wrote:
>> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
>> > 
>> >> Hi,
>> >> 
>> >> This patch is fixing an issue about parameter accessing if the
>> >> parameter is struct type and passed through integer registers, and
>> >> there is floating member is accessed. Like below code:
>> >> 
>> >> typedef struct DF {double a[4]; long l; } DF;
>> >> double foo_df (DF arg){return arg.a[3];}
>> >> 
>> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
>> >> this case.
>> > 
>> > So why do we end up spilling for PPC?
>> 
>> Good question! According to GCC source code (in function.cc/expr.cc),
>> it is common behavior: using "word_mode" to store the parameter to stack,
>> And using the field's mode (e.g. float mode) to load from the stack.
>> But with some tries, I fail to construct cases on many platforms.
>> So, I convert the fix to a target hook and implemented the rs6000 part
>> first.
>> 
>> > 
>> > struct X { int i; float f; };
>> > 
>> > float foo (struct X x)
>> > {
>> >   return x.f;
>> > }
>> > 
>> > does pass the structure in $RDI on x86_64 and we manage (with
>> > optimization, with -O0 we spill) to generate
>> > 
>> >         shrq    $32, %rdi
>> >         movd    %edi, %xmm0
>> > 
>> > and RTL expansion generates
>> > 
>> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
>> >         (reg:DI 5 di [ x ])) "t.c":4:1 -1
>> >      (nil))
>> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>> > (insn 6 3 7 2 (parallel [
>> >             (set (reg:DI 85)
>> >                 (ashiftrt:DI (reg/v:DI 83 [ x ])
>> >                     (const_int 32 [0x20])))
>> >             (clobber (reg:CC 17 flags))
>> >         ]) "t.c":5:11 -1
>> >      (nil))
>> > (insn 7 6 8 2 (set (reg:SI 86)
>> >         (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
>> >      (nil))
>> > 
>> > I would imagine that for the ppc case we only see the subreg here
>> > which should be even easier to optimize.
>> > 
>> > So how's this not fixable by providing proper patterns / subreg
>> > capabilities?  Looking a bit at the RTL we have the issue might
>> > be that nothing seems to handle CSE of
>> > 
>> 
>> This case is also related to 'parameter on struct', PR89310 is
>> just for this case. On trunk, it is fixed.
>> One difference: the parameter is in DImode, and passed via an
>> integer register for "{int i; float f;}".
>> But for "{double a[4]; long l;}", the parameter is in BLKmode,
>> and stored to stack during the argument setup.
>
> OK, so this would be another case for "heuristics" to use
> sth different than word_mode for storing, but of course
> the arguments are in integer registers and using different
> modes can for example prohibit store-multiple instruction use.
>
> As I said in the related thread an RTL expansion time "SRA"
> with the incoming argument assignment in mind could make
> more optimal decisions for these kind of special-cases.

Thanks a lot for your comments!

Yeap! Using SRA-like analysis during expansion for parameter
and returns (and may also some field accessing) would be a
generic improvement for this kind of issue (PR101926 collected
a lot of them).
While we may still need some work for various ABIs and different
targets, to analyze where the 'struct field' come from
(int/float/vector/.. registers, or stack) and how the struct
need to be handled (keep in pseudo or store in the stack).
This may indicate a mount of changes for param_setup code.

To reduce risk, I'm just draft straightforward patches for
special cases currently, Like:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
and this patch.

>
>> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
>> >                 (const_int 56 [0x38])) [2 arg+24 S8 A64])
>> >         (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
>> >      (expr_list:REG_DEAD (reg:DI 6 6)
>> >         (nil)))
>> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
>> > (note 10 7 15 2 NOTE_INSN_DELETED)
>> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
>> >         (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
>> >                 (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
>> > 576 {*movdf_hardfloat64}
>> >      (nil))
>> > 
>> > Possibly because the store and load happen in a different mode?  Can
>> > you see why CSE doesn't handle this (producing a subreg)?  On
>> 
>> Yes, exactly! For "{double a[4]; long l;}", because the store and load
>> are using a different mode, and then CSE does not optimize it.  This
>> patch makes the store and load using the same mode (DImode), and then
>> leverage CSE to handle it.
>
> So can we instead fix CSE to consider replacing insn 15 above with
>
>  (insn 15 (set (reg/i:DF 33 1)
>                (subreg:DF (reg/f:DI 6 6)))
>

Thanks for your suggestion! I will check to see if able to draft
a quick fix.

BR,
Jeff (Jiufu)

> ?  One peculiarity is that this introduces a hardreg use in this
> special case.
>
> Richard.
>
>> > the GIMPLE side we'd happily do that (but we don't see the argument
>> > setup).
>> 
>> Thanks for your comments!
>> 
>> 
>> BR,
>> Jeff (Jiufu)
>> 
>> > 
>> > Thanks,
>> > Richard.
>> > 
>> >> This patch updates the behavior when loading floating members of a
>> >> parameter: if that floating member is stored via integer register,
>> >> then loading it as integer mode first, and converting it to floating
>> >> mode.
>> >> 
>> >> I also thought of a method: before storing the register to stack,
>> >> convert it to float mode first. While there are some cases that may
>> >> still prefer to keep an integer register store.
>> >> 
>> >> Bootstrap and regtest passes on ppc64{,le}.
>> >> I would ask for help to review for comments and if this patch is
>> >> acceptable for the trunk.
>> >> 
>> >> 
>> >> BR,
>> >> Jeff (Jiufu)
>> >> 
>> >>  PR target/108073
>> >> 
>> >> gcc/ChangeLog:
>> >> 
>> >>  * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
>> >>  macro definition.
>> >>  (rs6000_loading_int_convert_to_float): New hook implement.
>> >>  * doc/tm.texi: Regenerated.
>> >>  * doc/tm.texi.in (loading_int_convert_to_float): New hook.
>> >>  * expr.cc (expand_expr_real_1): Updated to use the new hook.
>> >>  * target.def (loading_int_convert_to_float): New hook.
>> >> 
>> >> gcc/testsuite/ChangeLog:
>> >> 
>> >>  * g++.target/powerpc/pr102024.C: Update.
>> >>  * gcc.target/powerpc/pr108073.c: New test.
>> >> 
>> >> ---
>> >> gcc/config/rs6000/rs6000.cc                 | 70 +++++++++++++++++++++
>> >>  gcc/doc/tm.texi                             |  6 ++
>> >>  gcc/doc/tm.texi.in                          |  2 +
>> >>  gcc/expr.cc                                 | 15 +++++
>> >>  gcc/target.def                              | 11 ++++
>> >>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>> >>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
>> >>  7 files changed, 129 insertions(+), 1 deletion(-)
>> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>> >> 
>> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> >> index b3a609f3aa3..af676eea276 100644
>> >> --- a/gcc/config/rs6000/rs6000.cc
>> >> +++ b/gcc/config/rs6000/rs6000.cc
>> >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec 
>> >> rs6000_attribute_table[] =
>> >>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
>> >>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
>> >> invalid_arg_for_unprototyped_fn
>> >> 
>> >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> >> rs6000_loading_int_convert_to_float
>> >> +
>> >>  #undef TARGET_MD_ASM_ADJUST
>> >>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
>> >> 
>> >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree 
>> >> typelist, const_tree funcdecl, const
>> >>     : NULL;
>> >>  }
>> >> 
>> >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
>> >> +static rtx
>> >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx
>> >> base)
>> >> +{
>> >> +  rtx src_base = XEXP (source, 0);
>> >> +  poly_uint64 offset = MEM_OFFSET (source);
>> >> +
>> >> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
>> >> +    {
>> >> +      offset += INTVAL (XEXP (src_base, 1));
>> >> +      src_base = XEXP (src_base, 0);
>> >> +    }
>> >> +
>> >> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
>> >> +    return NULL_RTX;
>> >> +
>> >> +  rtx temp_reg = gen_reg_rtx (word_mode);
>> >> +  rtx temp_mem = copy_rtx (source);
>> >> +  PUT_MODE (temp_mem, word_mode);
>> >> +
>> >> +  /* DI->DF */
>> >> +  if (word_mode == DImode && mode == DFmode)
>> >> +    {
>> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>> >> +	{
>> >> +	  emit_move_insn (temp_reg, temp_mem);
>> >> +	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode,
>> >> 0);
>> >> +	  rtx target_reg = gen_reg_rtx (mode);
>> >> +	  emit_move_insn (target_reg, float_subreg);
>> >> +	  return target_reg;
>> >> +	}
>> >> +      return NULL_RTX;
>> >> +    }
>> >> +
>> >> +  /* Sub DI#->SF */
>> >> +  if (word_mode == DImode && mode == SFmode)
>> >> +    {
>> >> +      poly_uint64 byte_off = 0;
>> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>> >> +	byte_off = 0;
>> >> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
>> >> +			   GET_MODE_SIZE (word_mode)))
>> >> +	byte_off = GET_MODE_SIZE (mode);
>> >> +      else
>> >> +	return NULL_RTX;
>> >> +
>> >> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
>> >> +      emit_move_insn (temp_reg, temp_mem);
>> >> +
>> >> +      /* little endia only? */
>> >> +      poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
>> >> +      if (known_eq (byte_off, high_off))
>> >> +	{
>> >> +	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
>> >> +				   GET_MODE_PRECISION (SImode), temp_reg, 0);
>> >> +	}
>> >> +      rtx subreg_si = gen_reg_rtx (SImode);
>> >> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
>> >> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0);
>> >> +      rtx target_reg = gen_reg_rtx (mode);
>> >> +      emit_move_insn (target_reg, float_subreg);
>> >> +      return target_reg;
>> >> +    }
>> >> +
>> >> +  return NULL_RTX;
>> >> +}
>> >> +
>> >>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
>> >>     setup by using __stack_chk_fail_local hidden function instead of
>> >>     calling __stack_chk_fail directly.  Otherwise it is better to call
>> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> >> index 8fe49c2ba3d..10f94af553d 100644
>> >> --- a/gcc/doc/tm.texi
>> >> +++ b/gcc/doc/tm.texi
>> >> @@ -11933,6 +11933,12 @@ or when the back end is in a 
>> >> partially-initialized state.
>> >>  outside of any function scope.
>> >>  @end deftypefn
>> >> 
>> >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
>> >> +If the target is protifiable to load an integer in word_mode from
>> >> +@var{source} which is based on @var{base}, then convert to a floating
>> >> +point value in @var{mode}.
>> >> +@end deftypefn
>> >> +
>> >>  @defmac TARGET_OBJECT_SUFFIX
>> >>  Define this macro to be a C string representing the suffix for object
>> >>  files on your target machine.  If you do not define this macro, GCC 
>> >> will
>> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> >> index 62c49ac46de..1ca6a671d86 100644
>> >> --- a/gcc/doc/tm.texi.in
>> >> +++ b/gcc/doc/tm.texi.in
>> >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
>> >> 
>> >>  @hook TARGET_SET_CURRENT_FUNCTION
>> >> 
>> >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> >> +
>> >>  @defmac TARGET_OBJECT_SUFFIX
>> >>  Define this macro to be a C string representing the suffix for object
>> >>  files on your target machine.  If you do not define this macro, GCC 
>> >> will
>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> >> index d9407432ea5..466079220e7 100644
>> >> --- a/gcc/expr.cc
>> >> +++ b/gcc/expr.cc
>> >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, 
>> >> machine_mode tmode,
>> >>       && modifier != EXPAND_WRITE)
>> >>     op0 = flip_storage_order (mode1, op0);
>> >> 
>> >> +	/* Accessing float field of struct parameter which passed via integer
>> >> +	   registers.  */
>> >> +	if (targetm.loading_int_convert_to_float && mode == mode1
>> >> +	    && GET_MODE_CLASS (mode) == MODE_FLOAT
>> >> +	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
>> >> +	    && REG_P (DECL_INCOMING_RTL (tem))
>> >> +	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
>> >> +	    && MEM_OFFSET_KNOWN_P (op0))
>> >> +	  {
>> >> +	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
>> >> +							    DECL_RTL (tem));
>> >> +	    if (res)
>> >> +	      op0 = res;
>> >> +	  }
>> >> +
>> >>   if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
>> >>       || modifier == EXPAND_CONST_ADDRESS
>> >>       || modifier == EXPAND_INITIALIZER)
>> >> diff --git a/gcc/target.def b/gcc/target.def
>> >> index 082a7c62f34..837ce902489 100644
>> >> --- a/gcc/target.def
>> >> +++ b/gcc/target.def
>> >> @@ -4491,6 +4491,17 @@ original and the returned modes should be 
>> >> @code{MODE_INT}.",
>> >>   (machine_mode mode),
>> >>   default_preferred_doloop_mode)
>> >> 
>> >> +
>> >> +/* Loading an integer value from memory first, then convert (bitcast)
>> >> to\n\n
>> >> +   floating point value, if target is able to support such behavior.  */
>> >> +DEFHOOK
>> >> +(loading_int_convert_to_float,
>> >> +"If the target is protifiable to load an integer in word_mode from\n\
>> >> +@var{source} which is based on @var{base}, then convert to a floating\n\
>> >> +point value in @var{mode}.",
>> >> + rtx, (machine_mode mode, rtx source, rtx base),
>> >> + NULL)
>> >> +
>> >>  /* Returns true for a legitimate combined insn.  */
>> >>  DEFHOOK
>> >>  (legitimate_combined_insn,
>> >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C
>> >> b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> >> index 769585052b5..c8995cae707 100644
>> >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>> >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> >> @@ -5,7 +5,7 @@
>> >> // Test that a zero-width bit field in an otherwise homogeneous aggregate
>> >>  // generates a psabi warning and passes arguments in GPRs.
>> >> 
>> >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>> >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>> >> 
>> >>  struct a_thing
>> >>  {
>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> >> new file mode 100644
>> >> index 00000000000..aa02de56405
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> >> @@ -0,0 +1,24 @@
>> >> +/* { dg-do run } */
>> >> +/* { dg-options "-O2 -save-temps" } */
>> >> +
>> >> +typedef struct DF {double a[4]; long l; } DF;
>> >> +typedef struct SF {float a[4];short l; } SF;
>> >> +
>> >> +/* Each of below function contains one mtvsrd.  */
>> >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target {
>> >> has_arch_ppc64 && has_arch_pwr8 } } } } */
>> >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
>> >> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
>> >> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
>> >> +
>> >> +double gd = 4.0;
>> >> +float gf1 = 3.0f, gf2 = 2.0f;
>> >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
>> >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
>> >> +
>> >> +int main()
>> >> +{
>> >> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) ==
>> >> gf2))
>> >> +    __builtin_abort ();
>> >> +  return 0;
>> >> +}
>> >> +
>> >> 
>> 
>>
  
Richard Biener Dec. 22, 2022, 11:28 a.m. UTC | #5
On Thu, 22 Dec 2022, Jiufu Guo wrote:

> 
> Hi,
> 
> Richard Biener <rguenther@suse.de> writes:
> 
> > On Thu, 22 Dec 2022, guojiufu wrote:
> >
> >> Hi,
> >> 
> >> On 2022-12-21 15:30, Richard Biener wrote:
> >> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
> >> > 
> >> >> Hi,
> >> >> 
> >> >> This patch is fixing an issue about parameter accessing if the
> >> >> parameter is struct type and passed through integer registers, and
> >> >> there is floating member is accessed. Like below code:
> >> >> 
> >> >> typedef struct DF {double a[4]; long l; } DF;
> >> >> double foo_df (DF arg){return arg.a[3];}
> >> >> 
> >> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
> >> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
> >> >> this case.
> >> > 
> >> > So why do we end up spilling for PPC?
> >> 
> >> Good question! According to GCC source code (in function.cc/expr.cc),
> >> it is common behavior: using "word_mode" to store the parameter to stack,
> >> And using the field's mode (e.g. float mode) to load from the stack.
> >> But with some tries, I fail to construct cases on many platforms.
> >> So, I convert the fix to a target hook and implemented the rs6000 part
> >> first.
> >> 
> >> > 
> >> > struct X { int i; float f; };
> >> > 
> >> > float foo (struct X x)
> >> > {
> >> >   return x.f;
> >> > }
> >> > 
> >> > does pass the structure in $RDI on x86_64 and we manage (with
> >> > optimization, with -O0 we spill) to generate
> >> > 
> >> >         shrq    $32, %rdi
> >> >         movd    %edi, %xmm0
> >> > 
> >> > and RTL expansion generates
> >> > 
> >> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> >> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
> >> >         (reg:DI 5 di [ x ])) "t.c":4:1 -1
> >> >      (nil))
> >> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> >> > (insn 6 3 7 2 (parallel [
> >> >             (set (reg:DI 85)
> >> >                 (ashiftrt:DI (reg/v:DI 83 [ x ])
> >> >                     (const_int 32 [0x20])))
> >> >             (clobber (reg:CC 17 flags))
> >> >         ]) "t.c":5:11 -1
> >> >      (nil))
> >> > (insn 7 6 8 2 (set (reg:SI 86)
> >> >         (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
> >> >      (nil))
> >> > 
> >> > I would imagine that for the ppc case we only see the subreg here
> >> > which should be even easier to optimize.
> >> > 
> >> > So how's this not fixable by providing proper patterns / subreg
> >> > capabilities?  Looking a bit at the RTL we have the issue might
> >> > be that nothing seems to handle CSE of
> >> > 
> >> 
> >> This case is also related to 'parameter on struct', PR89310 is
> >> just for this case. On trunk, it is fixed.
> >> One difference: the parameter is in DImode, and passed via an
> >> integer register for "{int i; float f;}".
> >> But for "{double a[4]; long l;}", the parameter is in BLKmode,
> >> and stored to stack during the argument setup.
> >
> > OK, so this would be another case for "heuristics" to use
> > sth different than word_mode for storing, but of course
> > the arguments are in integer registers and using different
> > modes can for example prohibit store-multiple instruction use.
> >
> > As I said in the related thread an RTL expansion time "SRA"
> > with the incoming argument assignment in mind could make
> > more optimal decisions for these kind of special-cases.
> 
> Thanks a lot for your comments!
> 
> Yeap! Using SRA-like analysis during expansion for parameter
> and returns (and may also some field accessing) would be a
> generic improvement for this kind of issue (PR101926 collected
> a lot of them).
> While we may still need some work for various ABIs and different
> targets, to analyze where the 'struct field' come from
> (int/float/vector/.. registers, or stack) and how the struct
> need to be handled (keep in pseudo or store in the stack).
> This may indicate a mount of changes for param_setup code.
> 
> To reduce risk, I'm just draft straightforward patches for
> special cases currently, Like:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
> and this patch.

Heh, yes - though I'm not fond of special-casing things.  RTL
expansion is already full of special cases :/

So basically what I'd do is SRA analysis as if the aggregates
were initialized by copies from the argument registers (and
stack slots) and then try to avoid expanding the aggregate
PARM_DECL itself but aim at fully scalarizing that with only
the copies from the argument space remaining.

> >
> >> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> >> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
> >> >                 (const_int 56 [0x38])) [2 arg+24 S8 A64])
> >> >         (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
> >> >      (expr_list:REG_DEAD (reg:DI 6 6)
> >> >         (nil)))
> >> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
> >> > (note 10 7 15 2 NOTE_INSN_DELETED)
> >> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
> >> >         (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
> >> >                 (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
> >> > 576 {*movdf_hardfloat64}
> >> >      (nil))
> >> > 
> >> > Possibly because the store and load happen in a different mode?  Can
> >> > you see why CSE doesn't handle this (producing a subreg)?  On
> >> 
> >> Yes, exactly! For "{double a[4]; long l;}", because the store and load
> >> are using a different mode, and then CSE does not optimize it.  This
> >> patch makes the store and load using the same mode (DImode), and then
> >> leverage CSE to handle it.
> >
> > So can we instead fix CSE to consider replacing insn 15 above with
> >
> >  (insn 15 (set (reg/i:DF 33 1)
> >                (subreg:DF (reg/f:DI 6 6)))
> >
> 
> Thanks for your suggestion! I will check to see if able to draft
> a quick fix.
> 
> BR,
> Jeff (Jiufu)
> 
> > ?  One peculiarity is that this introduces a hardreg use in this
> > special case.
> >
> > Richard.
> >
> >> > the GIMPLE side we'd happily do that (but we don't see the argument
> >> > setup).
> >> 
> >> Thanks for your comments!
> >> 
> >> 
> >> BR,
> >> Jeff (Jiufu)
> >> 
> >> > 
> >> > Thanks,
> >> > Richard.
> >> > 
> >> >> This patch updates the behavior when loading floating members of a
> >> >> parameter: if that floating member is stored via integer register,
> >> >> then loading it as integer mode first, and converting it to floating
> >> >> mode.
> >> >> 
> >> >> I also thought of a method: before storing the register to stack,
> >> >> convert it to float mode first. While there are some cases that may
> >> >> still prefer to keep an integer register store.
> >> >> 
> >> >> Bootstrap and regtest passes on ppc64{,le}.
> >> >> I would ask for help to review for comments and if this patch is
> >> >> acceptable for the trunk.
> >> >> 
> >> >> 
> >> >> BR,
> >> >> Jeff (Jiufu)
> >> >> 
> >> >>  PR target/108073
> >> >> 
> >> >> gcc/ChangeLog:
> >> >> 
> >> >>  * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
> >> >>  macro definition.
> >> >>  (rs6000_loading_int_convert_to_float): New hook implement.
> >> >>  * doc/tm.texi: Regenerated.
> >> >>  * doc/tm.texi.in (loading_int_convert_to_float): New hook.
> >> >>  * expr.cc (expand_expr_real_1): Updated to use the new hook.
> >> >>  * target.def (loading_int_convert_to_float): New hook.
> >> >> 
> >> >> gcc/testsuite/ChangeLog:
> >> >> 
> >> >>  * g++.target/powerpc/pr102024.C: Update.
> >> >>  * gcc.target/powerpc/pr108073.c: New test.
> >> >> 
> >> >> ---
> >> >> gcc/config/rs6000/rs6000.cc                 | 70 +++++++++++++++++++++
> >> >>  gcc/doc/tm.texi                             |  6 ++
> >> >>  gcc/doc/tm.texi.in                          |  2 +
> >> >>  gcc/expr.cc                                 | 15 +++++
> >> >>  gcc/target.def                              | 11 ++++
> >> >>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
> >> >>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
> >> >>  7 files changed, 129 insertions(+), 1 deletion(-)
> >> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> >> 
> >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >> >> index b3a609f3aa3..af676eea276 100644
> >> >> --- a/gcc/config/rs6000/rs6000.cc
> >> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec 
> >> >> rs6000_attribute_table[] =
> >> >>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
> >> >>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
> >> >> invalid_arg_for_unprototyped_fn
> >> >> 
> >> >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> >> rs6000_loading_int_convert_to_float
> >> >> +
> >> >>  #undef TARGET_MD_ASM_ADJUST
> >> >>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
> >> >> 
> >> >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree 
> >> >> typelist, const_tree funcdecl, const
> >> >>     : NULL;
> >> >>  }
> >> >> 
> >> >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
> >> >> +static rtx
> >> >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx
> >> >> base)
> >> >> +{
> >> >> +  rtx src_base = XEXP (source, 0);
> >> >> +  poly_uint64 offset = MEM_OFFSET (source);
> >> >> +
> >> >> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
> >> >> +    {
> >> >> +      offset += INTVAL (XEXP (src_base, 1));
> >> >> +      src_base = XEXP (src_base, 0);
> >> >> +    }
> >> >> +
> >> >> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
> >> >> +    return NULL_RTX;
> >> >> +
> >> >> +  rtx temp_reg = gen_reg_rtx (word_mode);
> >> >> +  rtx temp_mem = copy_rtx (source);
> >> >> +  PUT_MODE (temp_mem, word_mode);
> >> >> +
> >> >> +  /* DI->DF */
> >> >> +  if (word_mode == DImode && mode == DFmode)
> >> >> +    {
> >> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> >> >> +	{
> >> >> +	  emit_move_insn (temp_reg, temp_mem);
> >> >> +	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode,
> >> >> 0);
> >> >> +	  rtx target_reg = gen_reg_rtx (mode);
> >> >> +	  emit_move_insn (target_reg, float_subreg);
> >> >> +	  return target_reg;
> >> >> +	}
> >> >> +      return NULL_RTX;
> >> >> +    }
> >> >> +
> >> >> +  /* Sub DI#->SF */
> >> >> +  if (word_mode == DImode && mode == SFmode)
> >> >> +    {
> >> >> +      poly_uint64 byte_off = 0;
> >> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> >> >> +	byte_off = 0;
> >> >> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
> >> >> +			   GET_MODE_SIZE (word_mode)))
> >> >> +	byte_off = GET_MODE_SIZE (mode);
> >> >> +      else
> >> >> +	return NULL_RTX;
> >> >> +
> >> >> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
> >> >> +      emit_move_insn (temp_reg, temp_mem);
> >> >> +
> >> >> +      /* little endia only? */
> >> >> +      poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
> >> >> +      if (known_eq (byte_off, high_off))
> >> >> +	{
> >> >> +	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
> >> >> +				   GET_MODE_PRECISION (SImode), temp_reg, 0);
> >> >> +	}
> >> >> +      rtx subreg_si = gen_reg_rtx (SImode);
> >> >> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
> >> >> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0);
> >> >> +      rtx target_reg = gen_reg_rtx (mode);
> >> >> +      emit_move_insn (target_reg, float_subreg);
> >> >> +      return target_reg;
> >> >> +    }
> >> >> +
> >> >> +  return NULL_RTX;
> >> >> +}
> >> >> +
> >> >>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
> >> >>     setup by using __stack_chk_fail_local hidden function instead of
> >> >>     calling __stack_chk_fail directly.  Otherwise it is better to call
> >> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> >> >> index 8fe49c2ba3d..10f94af553d 100644
> >> >> --- a/gcc/doc/tm.texi
> >> >> +++ b/gcc/doc/tm.texi
> >> >> @@ -11933,6 +11933,12 @@ or when the back end is in a 
> >> >> partially-initialized state.
> >> >>  outside of any function scope.
> >> >>  @end deftypefn
> >> >> 
> >> >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
> >> >> +If the target is protifiable to load an integer in word_mode from
> >> >> +@var{source} which is based on @var{base}, then convert to a floating
> >> >> +point value in @var{mode}.
> >> >> +@end deftypefn
> >> >> +
> >> >>  @defmac TARGET_OBJECT_SUFFIX
> >> >>  Define this macro to be a C string representing the suffix for object
> >> >>  files on your target machine.  If you do not define this macro, GCC 
> >> >> will
> >> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> >> >> index 62c49ac46de..1ca6a671d86 100644
> >> >> --- a/gcc/doc/tm.texi.in
> >> >> +++ b/gcc/doc/tm.texi.in
> >> >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
> >> >> 
> >> >>  @hook TARGET_SET_CURRENT_FUNCTION
> >> >> 
> >> >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> >> +
> >> >>  @defmac TARGET_OBJECT_SUFFIX
> >> >>  Define this macro to be a C string representing the suffix for object
> >> >>  files on your target machine.  If you do not define this macro, GCC 
> >> >> will
> >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> >> index d9407432ea5..466079220e7 100644
> >> >> --- a/gcc/expr.cc
> >> >> +++ b/gcc/expr.cc
> >> >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, 
> >> >> machine_mode tmode,
> >> >>       && modifier != EXPAND_WRITE)
> >> >>     op0 = flip_storage_order (mode1, op0);
> >> >> 
> >> >> +	/* Accessing float field of struct parameter which passed via integer
> >> >> +	   registers.  */
> >> >> +	if (targetm.loading_int_convert_to_float && mode == mode1
> >> >> +	    && GET_MODE_CLASS (mode) == MODE_FLOAT
> >> >> +	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
> >> >> +	    && REG_P (DECL_INCOMING_RTL (tem))
> >> >> +	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
> >> >> +	    && MEM_OFFSET_KNOWN_P (op0))
> >> >> +	  {
> >> >> +	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
> >> >> +							    DECL_RTL (tem));
> >> >> +	    if (res)
> >> >> +	      op0 = res;
> >> >> +	  }
> >> >> +
> >> >>   if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
> >> >>       || modifier == EXPAND_CONST_ADDRESS
> >> >>       || modifier == EXPAND_INITIALIZER)
> >> >> diff --git a/gcc/target.def b/gcc/target.def
> >> >> index 082a7c62f34..837ce902489 100644
> >> >> --- a/gcc/target.def
> >> >> +++ b/gcc/target.def
> >> >> @@ -4491,6 +4491,17 @@ original and the returned modes should be 
> >> >> @code{MODE_INT}.",
> >> >>   (machine_mode mode),
> >> >>   default_preferred_doloop_mode)
> >> >> 
> >> >> +
> >> >> +/* Loading an integer value from memory first, then convert (bitcast)
> >> >> to\n\n
> >> >> +   floating point value, if target is able to support such behavior.  */
> >> >> +DEFHOOK
> >> >> +(loading_int_convert_to_float,
> >> >> +"If the target is protifiable to load an integer in word_mode from\n\
> >> >> +@var{source} which is based on @var{base}, then convert to a floating\n\
> >> >> +point value in @var{mode}.",
> >> >> + rtx, (machine_mode mode, rtx source, rtx base),
> >> >> + NULL)
> >> >> +
> >> >>  /* Returns true for a legitimate combined insn.  */
> >> >>  DEFHOOK
> >> >>  (legitimate_combined_insn,
> >> >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> >> b/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> >> index 769585052b5..c8995cae707 100644
> >> >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> >> @@ -5,7 +5,7 @@
> >> >> // Test that a zero-width bit field in an otherwise homogeneous aggregate
> >> >>  // generates a psabi warning and passes arguments in GPRs.
> >> >> 
> >> >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> >> >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
> >> >> 
> >> >>  struct a_thing
> >> >>  {
> >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> >> new file mode 100644
> >> >> index 00000000000..aa02de56405
> >> >> --- /dev/null
> >> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> >> @@ -0,0 +1,24 @@
> >> >> +/* { dg-do run } */
> >> >> +/* { dg-options "-O2 -save-temps" } */
> >> >> +
> >> >> +typedef struct DF {double a[4]; long l; } DF;
> >> >> +typedef struct SF {float a[4];short l; } SF;
> >> >> +
> >> >> +/* Each of below function contains one mtvsrd.  */
> >> >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target {
> >> >> has_arch_ppc64 && has_arch_pwr8 } } } } */
> >> >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
> >> >> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
> >> >> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
> >> >> +
> >> >> +double gd = 4.0;
> >> >> +float gf1 = 3.0f, gf2 = 2.0f;
> >> >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
> >> >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
> >> >> +
> >> >> +int main()
> >> >> +{
> >> >> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) ==
> >> >> gf2))
> >> >> +    __builtin_abort ();
> >> >> +  return 0;
> >> >> +}
> >> >> +
> >> >> 
> >> 
> >> 
>
  
Segher Boessenkool Dec. 22, 2022, 6:40 p.m. UTC | #6
On Thu, Dec 22, 2022 at 11:28:01AM +0000, Richard Biener wrote:
> On Thu, 22 Dec 2022, Jiufu Guo wrote:
> > To reduce risk, I'm just draft straightforward patches for
> > special cases currently, Like:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
> > and this patch.
> 
> Heh, yes - though I'm not fond of special-casing things.  RTL
> expansion is already full of special cases :/

And many of those are not useful at all (would be done by later passes),
or are actively harmful.  Not to mention that expand is currently one of
the most impregnable and undebuggable RTL passes.

But there are also many things done during expand that although they
should be done somewhat later, aren't actually done later at all
currently.  So that needs fixing.

Maybe things should go via an intermediate step, where all the decisions
can be made, and then later we just have to translate the "low Gimple"
or "RTL-Gimple" ("Rimple"?) to RTL.  A format that is looser in many
ways than either RTL or Gimple.  A bit like Generic in that way.


Segher
  
Jiufu Guo Dec. 23, 2022, 12:23 p.m. UTC | #7
Hi,

Segher Boessenkool <segher@kernel.crashing.org> writes:

> On Thu, Dec 22, 2022 at 11:28:01AM +0000, Richard Biener wrote:
>> On Thu, 22 Dec 2022, Jiufu Guo wrote:
>> > To reduce risk, I'm just draft straightforward patches for
>> > special cases currently, Like:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
>> > and this patch.
>> 
>> Heh, yes - though I'm not fond of special-casing things.  RTL
>> expansion is already full of special cases :/
>
> And many of those are not useful at all (would be done by later passes),
> or are actively harmful.  Not to mention that expand is currently one of
> the most impregnable and undebuggable RTL passes.
>
> But there are also many things done during expand that although they
> should be done somewhat later, aren't actually done later at all
> currently.  So that needs fixing.
>
> Maybe things should go via an intermediate step, where all the decisions
> can be made, and then later we just have to translate the "low Gimple"
> or "RTL-Gimple" ("Rimple"?) to RTL.  A format that is looser in many
> ways than either RTL or Gimple.  A bit like Generic in that way.

Thanks for all your great comments!

BR,
Jeff (Jiufu)
>
>
> Segher
  
Jiufu Guo Dec. 23, 2022, 12:36 p.m. UTC | #8
HI,

Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
>
> Richard Biener <rguenther@suse.de> writes:
>
>> On Thu, 22 Dec 2022, guojiufu wrote:
>>
>>> Hi,
>>> 
>>> On 2022-12-21 15:30, Richard Biener wrote:
>>> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
>>> > 
>>> >> Hi,
>>> >> 
>>> >> This patch is fixing an issue about parameter accessing if the
>>> >> parameter is struct type and passed through integer registers, and
>>> >> there is floating member is accessed. Like below code:
>>> >> 
>>> >> typedef struct DF {double a[4]; long l; } DF;
>>> >> double foo_df (DF arg){return arg.a[3];}
>>> >> 
>>> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>>> >> generated.  While instruction "mtvsrd 1, 6" would be enough for
>>> >> this case.
>>> > 
>>> > So why do we end up spilling for PPC?
>>> 
>>> Good question! According to GCC source code (in function.cc/expr.cc),
>>> it is common behavior: using "word_mode" to store the parameter to stack,
>>> And using the field's mode (e.g. float mode) to load from the stack.
>>> But with some tries, I fail to construct cases on many platforms.
>>> So, I convert the fix to a target hook and implemented the rs6000 part
>>> first.
>>> 
>>> > 
>>> > struct X { int i; float f; };
>>> > 
>>> > float foo (struct X x)
>>> > {
>>> >   return x.f;
>>> > }
>>> > 
>>> > does pass the structure in $RDI on x86_64 and we manage (with
>>> > optimization, with -O0 we spill) to generate
>>> > 
>>> >         shrq    $32, %rdi
>>> >         movd    %edi, %xmm0
>>> > 
>>> > and RTL expansion generates
>>> > 
>>> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
>>> >         (reg:DI 5 di [ x ])) "t.c":4:1 -1
>>> >      (nil))
>>> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>>> > (insn 6 3 7 2 (parallel [
>>> >             (set (reg:DI 85)
>>> >                 (ashiftrt:DI (reg/v:DI 83 [ x ])
>>> >                     (const_int 32 [0x20])))
>>> >             (clobber (reg:CC 17 flags))
>>> >         ]) "t.c":5:11 -1
>>> >      (nil))
>>> > (insn 7 6 8 2 (set (reg:SI 86)
>>> >         (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
>>> >      (nil))
>>> > 
>>> > I would imagine that for the ppc case we only see the subreg here
>>> > which should be even easier to optimize.
>>> > 
>>> > So how's this not fixable by providing proper patterns / subreg
>>> > capabilities?  Looking a bit at the RTL we have the issue might
>>> > be that nothing seems to handle CSE of
>>> > 
>>> 
>>> This case is also related to 'parameter on struct', PR89310 is
>>> just for this case. On trunk, it is fixed.
>>> One difference: the parameter is in DImode, and passed via an
>>> integer register for "{int i; float f;}".
>>> But for "{double a[4]; long l;}", the parameter is in BLKmode,
>>> and stored to stack during the argument setup.
>>
>> OK, so this would be another case for "heuristics" to use
>> sth different than word_mode for storing, but of course
>> the arguments are in integer registers and using different
>> modes can for example prohibit store-multiple instruction use.
>>
>> As I said in the related thread an RTL expansion time "SRA"
>> with the incoming argument assignment in mind could make
>> more optimal decisions for these kind of special-cases.
>
> Thanks a lot for your comments!
>
> Yeap! Using SRA-like analysis during expansion for parameter
> and returns (and may also some field accessing) would be a
> generic improvement for this kind of issue (PR101926 collected
> a lot of them).
> While we may still need some work for various ABIs and different
> targets, to analyze where the 'struct field' come from
> (int/float/vector/.. registers, or stack) and how the struct
> need to be handled (keep in pseudo or store in the stack).
> This may indicate a mount of changes for param_setup code.
>
> To reduce risk, I'm just draft straightforward patches for
> special cases currently, Like:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html
> and this patch.
>
>>
>>> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
>>> >                 (const_int 56 [0x38])) [2 arg+24 S8 A64])
>>> >         (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
>>> >      (expr_list:REG_DEAD (reg:DI 6 6)
>>> >         (nil)))
>>> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
>>> > (note 10 7 15 2 NOTE_INSN_DELETED)
>>> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
>>> >         (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
>>> >                 (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
>>> > 576 {*movdf_hardfloat64}
>>> >      (nil))
>>> > 
>>> > Possibly because the store and load happen in a different mode?  Can
>>> > you see why CSE doesn't handle this (producing a subreg)?  On
>>> 
>>> Yes, exactly! For "{double a[4]; long l;}", because the store and load
>>> are using a different mode, and then CSE does not optimize it.  This
>>> patch makes the store and load using the same mode (DImode), and then
>>> leverage CSE to handle it.
>>
>> So can we instead fix CSE to consider replacing insn 15 above with
>>
>>  (insn 15 (set (reg/i:DF 33 1)
>>                (subreg:DF (reg/f:DI 6 6)))
>>
>
> Thanks for your suggestion! I will check to see if able to draft
> a quick fix.

I just hacked a patch as below.
It seems some limitations there. e.g. 1. "subreg:DF on DI register"
may not work well on pseudo, and 2. to convert high-part:DI to SF,
a "shift/rotate" is needed, and then we need to "emit shift insn"
in cse. I may need to update this patch.

Thanks for the comments and suggestions again!
And happy holiday!


BR,
Jeff (Jiufu)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index b13afd4ba72..77bcbf75d8f 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -5011,6 +5011,32 @@ cse_insn (rtx_insn *insn)
 	  src_related_is_const_anchor = src_related != NULL_RTX;
 	}
 
+      /* Try to optimize "[rm:DI]=rd:DI ; rf:DF=[rm:DI]" to "rf:DF=rd:DI#0" */
+      if (!src_related && MEM_P (src) && (mode == DFmode || mode == SFmode)
+	  && known_le (GET_MODE_SIZE (mode), GET_MODE_SIZE (word_mode)))
+	{
+	  machine_mode src_mode = GET_MODE (src);
+	  PUT_MODE (src, word_mode);
+	  struct table_elt *mem_elt = lookup (src, sets[i].src_hash, word_mode);
+
+	  if (mem_elt)
+	    {
+	      for (mem_elt = mem_elt->first_same_value; mem_elt;
+		   mem_elt = mem_elt->next_same_value)
+		if (REG_P (mem_elt->exp))
+		  {
+		    poly_uint64 low_off
+		      = subreg_lowpart_offset (SImode, word_mode);
+		    if (known_eq (low_off, 0U))
+		      src_related = gen_lowpart (mode, mem_elt->exp);
+		    //TODO: highpart DI->SF
+		    break;
+		  }
+	    }
+
+	  PUT_MODE (src, src_mode);
+	}
+
       /* Try to re-materialize a vec_dup with an existing constant.   */
       rtx src_elt;
       if ((!src_eqv_here || CONSTANT_P (src_eqv_here))
--------


>
> BR,
> Jeff (Jiufu)
>
>> ?  One peculiarity is that this introduces a hardreg use in this
>> special case.
>>
>> Richard.
>>
>>> > the GIMPLE side we'd happily do that (but we don't see the argument
>>> > setup).
>>> 
>>> Thanks for your comments!
>>> 
>>> 
>>> BR,
>>> Jeff (Jiufu)
>>> 
>>> > 
>>> > Thanks,
>>> > Richard.
>>> > 
>>> >> This patch updates the behavior when loading floating members of a
>>> >> parameter: if that floating member is stored via integer register,
>>> >> then loading it as integer mode first, and converting it to floating
>>> >> mode.
>>> >> 
>>> >> I also thought of a method: before storing the register to stack,
>>> >> convert it to float mode first. While there are some cases that may
>>> >> still prefer to keep an integer register store.
>>> >> 
>>> >> Bootstrap and regtest passes on ppc64{,le}.
>>> >> I would ask for help to review for comments and if this patch is
>>> >> acceptable for the trunk.
>>> >> 
>>> >> 
>>> >> BR,
>>> >> Jeff (Jiufu)
>>> >> 
>>> >>  PR target/108073
>>> >> 
>>> >> gcc/ChangeLog:
>>> >> 
>>> >>  * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
>>> >>  macro definition.
>>> >>  (rs6000_loading_int_convert_to_float): New hook implement.
>>> >>  * doc/tm.texi: Regenerated.
>>> >>  * doc/tm.texi.in (loading_int_convert_to_float): New hook.
>>> >>  * expr.cc (expand_expr_real_1): Updated to use the new hook.
>>> >>  * target.def (loading_int_convert_to_float): New hook.
>>> >> 
>>> >> gcc/testsuite/ChangeLog:
>>> >> 
>>> >>  * g++.target/powerpc/pr102024.C: Update.
>>> >>  * gcc.target/powerpc/pr108073.c: New test.
>>> >> 
>>> >> ---
>>> >> gcc/config/rs6000/rs6000.cc                 | 70 +++++++++++++++++++++
>>> >>  gcc/doc/tm.texi                             |  6 ++
>>> >>  gcc/doc/tm.texi.in                          |  2 +
>>> >>  gcc/expr.cc                                 | 15 +++++
>>> >>  gcc/target.def                              | 11 ++++
>>> >>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>>> >>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
>>> >>  7 files changed, 129 insertions(+), 1 deletion(-)
>>> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>>> >> 
>>> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> >> index b3a609f3aa3..af676eea276 100644
>>> >> --- a/gcc/config/rs6000/rs6000.cc
>>> >> +++ b/gcc/config/rs6000/rs6000.cc
>>> >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec 
>>> >> rs6000_attribute_table[] =
>>> >>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
>>> >>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
>>> >> invalid_arg_for_unprototyped_fn
>>> >> 
>>> >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
>>> >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT
>>> >> rs6000_loading_int_convert_to_float
>>> >> +
>>> >>  #undef TARGET_MD_ASM_ADJUST
>>> >>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
>>> >> 
>>> >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree 
>>> >> typelist, const_tree funcdecl, const
>>> >>     : NULL;
>>> >>  }
>>> >> 
>>> >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
>>> >> +static rtx
>>> >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx
>>> >> base)
>>> >> +{
>>> >> +  rtx src_base = XEXP (source, 0);
>>> >> +  poly_uint64 offset = MEM_OFFSET (source);
>>> >> +
>>> >> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
>>> >> +    {
>>> >> +      offset += INTVAL (XEXP (src_base, 1));
>>> >> +      src_base = XEXP (src_base, 0);
>>> >> +    }
>>> >> +
>>> >> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
>>> >> +    return NULL_RTX;
>>> >> +
>>> >> +  rtx temp_reg = gen_reg_rtx (word_mode);
>>> >> +  rtx temp_mem = copy_rtx (source);
>>> >> +  PUT_MODE (temp_mem, word_mode);
>>> >> +
>>> >> +  /* DI->DF */
>>> >> +  if (word_mode == DImode && mode == DFmode)
>>> >> +    {
>>> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>>> >> +	{
>>> >> +	  emit_move_insn (temp_reg, temp_mem);
>>> >> +	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode,
>>> >> 0);
>>> >> +	  rtx target_reg = gen_reg_rtx (mode);
>>> >> +	  emit_move_insn (target_reg, float_subreg);
>>> >> +	  return target_reg;
>>> >> +	}
>>> >> +      return NULL_RTX;
>>> >> +    }
>>> >> +
>>> >> +  /* Sub DI#->SF */
>>> >> +  if (word_mode == DImode && mode == SFmode)
>>> >> +    {
>>> >> +      poly_uint64 byte_off = 0;
>>> >> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>>> >> +	byte_off = 0;
>>> >> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
>>> >> +			   GET_MODE_SIZE (word_mode)))
>>> >> +	byte_off = GET_MODE_SIZE (mode);
>>> >> +      else
>>> >> +	return NULL_RTX;
>>> >> +
>>> >> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
>>> >> +      emit_move_insn (temp_reg, temp_mem);
>>> >> +
>>> >> +      /* little endia only? */
>>> >> +      poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
>>> >> +      if (known_eq (byte_off, high_off))
>>> >> +	{
>>> >> +	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
>>> >> +				   GET_MODE_PRECISION (SImode), temp_reg, 0);
>>> >> +	}
>>> >> +      rtx subreg_si = gen_reg_rtx (SImode);
>>> >> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
>>> >> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0);
>>> >> +      rtx target_reg = gen_reg_rtx (mode);
>>> >> +      emit_move_insn (target_reg, float_subreg);
>>> >> +      return target_reg;
>>> >> +    }
>>> >> +
>>> >> +  return NULL_RTX;
>>> >> +}
>>> >> +
>>> >>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
>>> >>     setup by using __stack_chk_fail_local hidden function instead of
>>> >>     calling __stack_chk_fail directly.  Otherwise it is better to call
>>> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>>> >> index 8fe49c2ba3d..10f94af553d 100644
>>> >> --- a/gcc/doc/tm.texi
>>> >> +++ b/gcc/doc/tm.texi
>>> >> @@ -11933,6 +11933,12 @@ or when the back end is in a 
>>> >> partially-initialized state.
>>> >>  outside of any function scope.
>>> >>  @end deftypefn
>>> >> 
>>> >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT
>>> >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
>>> >> +If the target is protifiable to load an integer in word_mode from
>>> >> +@var{source} which is based on @var{base}, then convert to a floating
>>> >> +point value in @var{mode}.
>>> >> +@end deftypefn
>>> >> +
>>> >>  @defmac TARGET_OBJECT_SUFFIX
>>> >>  Define this macro to be a C string representing the suffix for object
>>> >>  files on your target machine.  If you do not define this macro, GCC 
>>> >> will
>>> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>>> >> index 62c49ac46de..1ca6a671d86 100644
>>> >> --- a/gcc/doc/tm.texi.in
>>> >> +++ b/gcc/doc/tm.texi.in
>>> >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
>>> >> 
>>> >>  @hook TARGET_SET_CURRENT_FUNCTION
>>> >> 
>>> >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
>>> >> +
>>> >>  @defmac TARGET_OBJECT_SUFFIX
>>> >>  Define this macro to be a C string representing the suffix for object
>>> >>  files on your target machine.  If you do not define this macro, GCC 
>>> >> will
>>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>>> >> index d9407432ea5..466079220e7 100644
>>> >> --- a/gcc/expr.cc
>>> >> +++ b/gcc/expr.cc
>>> >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, 
>>> >> machine_mode tmode,
>>> >>       && modifier != EXPAND_WRITE)
>>> >>     op0 = flip_storage_order (mode1, op0);
>>> >> 
>>> >> +	/* Accessing float field of struct parameter which passed via integer
>>> >> +	   registers.  */
>>> >> +	if (targetm.loading_int_convert_to_float && mode == mode1
>>> >> +	    && GET_MODE_CLASS (mode) == MODE_FLOAT
>>> >> +	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
>>> >> +	    && REG_P (DECL_INCOMING_RTL (tem))
>>> >> +	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
>>> >> +	    && MEM_OFFSET_KNOWN_P (op0))
>>> >> +	  {
>>> >> +	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
>>> >> +							    DECL_RTL (tem));
>>> >> +	    if (res)
>>> >> +	      op0 = res;
>>> >> +	  }
>>> >> +
>>> >>   if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
>>> >>       || modifier == EXPAND_CONST_ADDRESS
>>> >>       || modifier == EXPAND_INITIALIZER)
>>> >> diff --git a/gcc/target.def b/gcc/target.def
>>> >> index 082a7c62f34..837ce902489 100644
>>> >> --- a/gcc/target.def
>>> >> +++ b/gcc/target.def
>>> >> @@ -4491,6 +4491,17 @@ original and the returned modes should be 
>>> >> @code{MODE_INT}.",
>>> >>   (machine_mode mode),
>>> >>   default_preferred_doloop_mode)
>>> >> 
>>> >> +
>>> >> +/* Loading an integer value from memory first, then convert (bitcast)
>>> >> to\n\n
>>> >> +   floating point value, if target is able to support such behavior.  */
>>> >> +DEFHOOK
>>> >> +(loading_int_convert_to_float,
>>> >> +"If the target is protifiable to load an integer in word_mode from\n\
>>> >> +@var{source} which is based on @var{base}, then convert to a floating\n\
>>> >> +point value in @var{mode}.",
>>> >> + rtx, (machine_mode mode, rtx source, rtx base),
>>> >> + NULL)
>>> >> +
>>> >>  /* Returns true for a legitimate combined insn.  */
>>> >>  DEFHOOK
>>> >>  (legitimate_combined_insn,
>>> >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> >> b/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> >> index 769585052b5..c8995cae707 100644
>>> >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>>> >> @@ -5,7 +5,7 @@
>>> >> // Test that a zero-width bit field in an otherwise homogeneous aggregate
>>> >>  // generates a psabi warning and passes arguments in GPRs.
>>> >> 
>>> >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>>> >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>>> >> 
>>> >>  struct a_thing
>>> >>  {
>>> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c
>>> >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>>> >> new file mode 100644
>>> >> index 00000000000..aa02de56405
>>> >> --- /dev/null
>>> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>>> >> @@ -0,0 +1,24 @@
>>> >> +/* { dg-do run } */
>>> >> +/* { dg-options "-O2 -save-temps" } */
>>> >> +
>>> >> +typedef struct DF {double a[4]; long l; } DF;
>>> >> +typedef struct SF {float a[4];short l; } SF;
>>> >> +
>>> >> +/* Each of below function contains one mtvsrd.  */
>>> >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target {
>>> >> has_arch_ppc64 && has_arch_pwr8 } } } } */
>>> >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
>>> >> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
>>> >> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
>>> >> +
>>> >> +double gd = 4.0;
>>> >> +float gf1 = 3.0f, gf2 = 2.0f;
>>> >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
>>> >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
>>> >> +
>>> >> +int main()
>>> >> +{
>>> >> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) ==
>>> >> gf2))
>>> >> +    __builtin_abort ();
>>> >> +  return 0;
>>> >> +}
>>> >> +
>>> >> 
>>> 
>>>
  
Segher Boessenkool Dec. 23, 2022, 2:45 p.m. UTC | #9
Hi!

On Fri, Dec 23, 2022 at 08:36:36PM +0800, Jiufu Guo wrote:
> It seems some limitations there. e.g. 1. "subreg:DF on DI register"
> may not work well on pseudo,

It is perfectly normal:
  A hard register may be accessed in various modes throughout one
  function, but each pseudo register is given a natural mode
  and is accessed only in that mode.  When it is necessary to describe
  an access to a pseudo register using a nonnatural mode, a @code{subreg}
  expression is used.

and:
  @code{subreg} expressions are used to refer to a register in a machine
  mode other than its natural one, or to refer to one register of
  a multi-part @code{reg} that actually refers to several registers.

  Each pseudo register has a natural mode.  If it is necessary to
  operate on it in a different mode, the register must be
  enclosed in a @code{subreg}.

and we even have:
  @item hard registers
  It is seldom necessary to wrap hard registers in @code{subreg}s; such
  registers would normally reduce to a single @code{reg} rtx.  This use of
  @code{subreg}s is discouraged and may not be supported in the future.

> and 2. to convert high-part:DI to SF,
> a "shift/rotate" is needed, and then we need to "emit shift insn"
> in cse. I may need to update this patch.

Hrm.  The machine insns to do this is just mtvsrd;xscvspdpn, but for
converting the lowpart it is mtvsrws;xscvspdpn (this needs p9 or
later).  We should arrive at those patterns, and we should try to not
go via the more expensive formulations with shifts, which don't describe
the hardware well, and which overestimate the cost of it.

None of this belongs in generic code at all imo.  At expand time it
should be expanded to something that works and can be optimised well,
so not anything with :BLK (which has to be put in memory, something with
unbounded size cannot be put in registers), not anything specifically
tailored to any cpu, something nice and regular.  Using a subreg (of a
pseudo!) is the standard way of writing a bitcast.

So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
integer bitcast to an IEEE SP number, and our machine description should
make it work nicely.


Segher
  
Richard Biener Dec. 23, 2022, 4:20 p.m. UTC | #10
> Am 23.12.2022 um 15:48 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
> 
> Hi!
> 
>> On Fri, Dec 23, 2022 at 08:36:36PM +0800, Jiufu Guo wrote:
>> It seems some limitations there. e.g. 1. "subreg:DF on DI register"
>> may not work well on pseudo,
> 
> It is perfectly normal:
>  A hard register may be accessed in various modes throughout one
>  function, but each pseudo register is given a natural mode
>  and is accessed only in that mode.  When it is necessary to describe
>  an access to a pseudo register using a nonnatural mode, a @code{subreg}
>  expression is used.
> 
> and:
>  @code{subreg} expressions are used to refer to a register in a machine
>  mode other than its natural one, or to refer to one register of
>  a multi-part @code{reg} that actually refers to several registers.
> 
>  Each pseudo register has a natural mode.  If it is necessary to
>  operate on it in a different mode, the register must be
>  enclosed in a @code{subreg}.
> 
> and we even have:
>  @item hard registers
>  It is seldom necessary to wrap hard registers in @code{subreg}s; such
>  registers would normally reduce to a single @code{reg} rtx.  This use of
>  @code{subreg}s is discouraged and may not be supported in the future.
> 
>> and 2. to convert high-part:DI to SF,
>> a "shift/rotate" is needed, and then we need to "emit shift insn"
>> in cse. I may need to update this patch.
> 
> Hrm.  The machine insns to do this is just mtvsrd;xscvspdpn, but for
> converting the lowpart it is mtvsrws;xscvspdpn (this needs p9 or
> later).  We should arrive at those patterns, and we should try to not
> go via the more expensive formulations with shifts, which don't describe
> the hardware well, and which overestimate the cost of it.
> 
> None of this belongs in generic code at all imo.  At expand time it
> should be expanded to something that works and can be optimised well,
> so not anything with :BLK (which has to be put in memory, something with
> unbounded size cannot be put in registers), not anything specifically
> tailored to any cpu, something nice and regular.  Using a subreg (of a
> pseudo!) is the standard way of writing a bitcast.
> 
> So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
> integer bitcast to an IEEE SP number, and our machine description should
> make it work nicely.

There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a Highpart bitcast.  Note whether targets actually support subreg operations needs to be queried and I’m not sure how subreg with offset validation should work there.

Richard 

> 
> 
> Segher
  
Segher Boessenkool Dec. 23, 2022, 4:52 p.m. UTC | #11
On Fri, Dec 23, 2022 at 05:20:09PM +0100, Richard Biener wrote:
> > Am 23.12.2022 um 15:48 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
> > None of this belongs in generic code at all imo.  At expand time it
> > should be expanded to something that works and can be optimised well,
> > so not anything with :BLK (which has to be put in memory, something with
> > unbounded size cannot be put in registers), not anything specifically
> > tailored to any cpu, something nice and regular.  Using a subreg (of a
> > pseudo!) is the standard way of writing a bitcast.
> > 
> > So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
> > integer bitcast to an IEEE SP number, and our machine description should
> > make it work nicely.
> 
> There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a Highpart bitcast.

There are at least six very different kinds of subreg:

0) Lvalue subregs.  Most archs have no use for it, and it can be
   expressed much more clearly and cleanly always.
1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
   this will go away.
2) Subregs of hard registers.  Do not use, there are much better ways to
   write subregs of a non-zero byte offset, and for zero offset this is
   non-canonical RTL.
3) Bitcast subregs.  In principle they go from one mode to another mode
   of the same size (but read on).
4) Paradoxical subregs.  A concept completely separate from the rest,
   different rules for everything, it has to be special cased almost
   everywhere, it would be better if it was a separate rtx_code imo.
5) Finally, normal subregs, taking a contiguous span of bits from some
   value.

Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
written as just one subreg, as you say.  And a 4) of a 5) is just
invalid afaics (and let's not talk about 0)..2) anymore :-) )

> Note whether targets actually support subreg operations needs to be queried and I’m not sure how subreg with offset validation should work there.

But 3) is always valid, no?  On pseudos.


Segher
  
Richard Biener Dec. 23, 2022, 7:13 p.m. UTC | #12
> Am 23.12.2022 um 17:55 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
> 
> On Fri, Dec 23, 2022 at 05:20:09PM +0100, Richard Biener wrote:
>>>> Am 23.12.2022 um 15:48 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
>>> None of this belongs in generic code at all imo.  At expand time it
>>> should be expanded to something that works and can be optimised well,
>>> so not anything with :BLK (which has to be put in memory, something with
>>> unbounded size cannot be put in registers), not anything specifically
>>> tailored to any cpu, something nice and regular.  Using a subreg (of a
>>> pseudo!) is the standard way of writing a bitcast.
>>> 
>>> So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
>>> integer bitcast to an IEEE SP number, and our machine description should
>>> make it work nicely.
>> 
>> There’s also a byte offset in subreg, so (subreg:sf (reg:di) 4) is a Highpart bitcast.
> 
> There are at least six very different kinds of subreg:
> 
> 0) Lvalue subregs.  Most archs have no use for it, and it can be
>   expressed much more clearly and cleanly always.
> 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
>   this will go away.
> 2) Subregs of hard registers.  Do not use, there are much better ways to
>   write subregs of a non-zero byte offset, and for zero offset this is
>   non-canonical RTL.
> 3) Bitcast subregs.  In principle they go from one mode to another mode
>   of the same size (but read on).
> 4) Paradoxical subregs.  A concept completely separate from the rest,
>   different rules for everything, it has to be special cased almost
>   everywhere, it would be better if it was a separate rtx_code imo.
> 5) Finally, normal subregs, taking a contiguous span of bits from some
>   value.
> 
> Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
> written as just one subreg, as you say.  And a 4) of a 5) is just
> invalid afaics (and let's not talk about 0)..2) anymore :-) )
> 
>> Note whether targets actually support subreg operations needs to be queried and I’m not sure how subreg with offset validation should work there.
> 
> But 3) is always valid, no?  On pseudos.

Yes, but it will eventually result in a spill/reload which is undesirable when we created this from CSE from a load.  So I think for CSE we do want to know whether a spill will definitely not occur.

Richard 
> 
> Segher
  
Segher Boessenkool Dec. 23, 2022, 7:52 p.m. UTC | #13
On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
> > There are at least six very different kinds of subreg:
> > 
> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
> >   expressed much more clearly and cleanly always.
> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
> >   this will go away.
> > 2) Subregs of hard registers.  Do not use, there are much better ways to
> >   write subregs of a non-zero byte offset, and for zero offset this is
> >   non-canonical RTL.
> > 3) Bitcast subregs.  In principle they go from one mode to another mode
> >   of the same size (but read on).
> > 4) Paradoxical subregs.  A concept completely separate from the rest,
> >   different rules for everything, it has to be special cased almost
> >   everywhere, it would be better if it was a separate rtx_code imo.
> > 5) Finally, normal subregs, taking a contiguous span of bits from some
> >   value.
> > 
> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
> > written as just one subreg, as you say.  And a 4) of a 5) is just
> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
> > 
> >> Note whether targets actually support subreg operations needs to be queried and I’m not sure how subreg with offset validation should work there.
> > 
> > But 3) is always valid, no?  On pseudos.
> 
> Yes, but it will eventually result in a spill/reload which is undesirable when we created this from CSE from a load.  So I think for CSE we do want to know whether a spill will definitely not occur.

Does it cause reloads though?  On any sane backend?  If no movsf pattern
allows integer registers, can things work at all?

Anyway, the normal way to test if some RTL is valid is to just generate
it (using validate_change) and then do apply_change_group, which then
cancels the changes if they do not work.  CSE already does some of this.

(I am doubtful doing any of this in CSE is a good idea fwiw).


Segher
  
Jiufu Guo Dec. 27, 2022, 2:15 a.m. UTC | #14
Hi,

Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Fri, Dec 23, 2022 at 08:36:36PM +0800, Jiufu Guo wrote:
>> It seems some limitations there. e.g. 1. "subreg:DF on DI register"
>> may not work well on pseudo,
>
> It is perfectly normal:
>   A hard register may be accessed in various modes throughout one
>   function, but each pseudo register is given a natural mode
>   and is accessed only in that mode.  When it is necessary to describe
>   an access to a pseudo register using a nonnatural mode, a @code{subreg}
>   expression is used.
>
> and:
>   @code{subreg} expressions are used to refer to a register in a machine
>   mode other than its natural one, or to refer to one register of
>   a multi-part @code{reg} that actually refers to several registers.
>
>   Each pseudo register has a natural mode.  If it is necessary to
>   operate on it in a different mode, the register must be
>   enclosed in a @code{subreg}.
>
> and we even have:
>   @item hard registers
>   It is seldom necessary to wrap hard registers in @code{subreg}s; such
>   registers would normally reduce to a single @code{reg} rtx.  This use of
>   @code{subreg}s is discouraged and may not be supported in the future.
>
Thanks so much for detailed explaination!

>> and 2. to convert high-part:DI to SF,
>> a "shift/rotate" is needed, and then we need to "emit shift insn"
>> in cse. I may need to update this patch.
>
> Hrm.  The machine insns to do this is just mtvsrd;xscvspdpn, but for
> converting the lowpart it is mtvsrws;xscvspdpn (this needs p9 or
> later).  We should arrive at those patterns, and we should try to not
> go via the more expensive formulations with shifts, which don't describe
> the hardware well, and which overestimate the cost of it.
Yes, understant!
>
> None of this belongs in generic code at all imo.  At expand time it
> should be expanded to something that works and can be optimised well,
> so not anything with :BLK (which has to be put in memory, something with
> unbounded size cannot be put in registers), not anything specifically
> tailored to any cpu, something nice and regular.  Using a subreg (of a
> pseudo!) is the standard way of writing a bitcast.
>
> So generic code would do a  (subreg:SF (reg:SI) 0)  to express a 32-bit
> integer bitcast to an IEEE SP number, and our machine description should
> make it work nicely.
Right!  So, I'm thinking a way: in generic code, we may generated
"shift+(subreg:SF (reg:SI) 0)"; and at somewhere (maybe in combiner),
using "mtvsr.." to replace the "shift+subreg".

BR,
Jeff (Jiufu)

>
>
> Segher
  
Jiufu Guo Dec. 27, 2022, 3:03 a.m. UTC | #15
Hi,

Segher Boessenkool <segher@kernel.crashing.org> writes:

> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
>> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
>> > There are at least six very different kinds of subreg:
>> > 
>> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
>> >   expressed much more clearly and cleanly always.
>> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
>> >   this will go away.
>> > 2) Subregs of hard registers.  Do not use, there are much better ways to
>> >   write subregs of a non-zero byte offset, and for zero offset this is
>> >   non-canonical RTL.
>> > 3) Bitcast subregs.  In principle they go from one mode to another mode
>> >   of the same size (but read on).
>> > 4) Paradoxical subregs.  A concept completely separate from the rest,
>> >   different rules for everything, it has to be special cased almost
>> >   everywhere, it would be better if it was a separate rtx_code imo.
>> > 5) Finally, normal subregs, taking a contiguous span of bits from some
>> >   value.
>> > 
>> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
>> > written as just one subreg, as you say.  And a 4) of a 5) is just
>> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
>> > 
>> >> Note whether targets actually support subreg operations needs to be queried and I’m not sure how subreg with offset validation should work there.
>> > 
>> > But 3) is always valid, no?  On pseudos
I also has similar question: do we need to query/recog if "SF(SI#0)" is
valid on the target, or it would always work (even through reload)?
I also hit this during debugging on ppc64le: "SF(SI#0)" is valid,
and "SF(DI#4)" is not valid. 
>> 
>> Yes, but it will eventually result in a spill/reload which is
>> undesirable when we created this from CSE from a load.  So I think
>> for CSE we do want to know whether a spill will definitely not
>> occur.
>
> Does it cause reloads though?  On any sane backend?  If no movsf pattern
> allows integer registers, can things work at all?
>
> Anyway, the normal way to test if some RTL is valid is to just generate
> it (using validate_change) and then do apply_change_group, which then
> cancels the changes if they do not work.  CSE already does some of
> this.
validate_change seems ok. Thanks!
>
> (I am doubtful doing any of this in CSE is a good idea fwiw).
Understand your concern! Especially when we need to emit additional
inns in CSE.
While CSE does some similar work. It transforms
"[sf:DI]=%x:DI; %y:DI=[sf:DI]" to "%y:DI=%x:DI".
and "see if a MEM has already been loaded with a widening operation;
if it has, we can use a subreg of that." (only for int modes).
So, it may be acceptable to do this in CSE (maybe still seems
hacking).

Thanks for so great comments!

BR,
Jeff (Jiufu)

>
>
> Segher
  
Jiufu Guo Dec. 27, 2022, 2:16 p.m. UTC | #16
Hi,

Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
>
> Segher Boessenkool <segher@kernel.crashing.org> writes:
>
>> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
>>> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
>>> > There are at least six very different kinds of subreg:
>>> > 
>>> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
>>> >   expressed much more clearly and cleanly always.
>>> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
>>> >   this will go away.
>>> > 2) Subregs of hard registers.  Do not use, there are much better ways to
>>> >   write subregs of a non-zero byte offset, and for zero offset this is
>>> >   non-canonical RTL.
>>> > 3) Bitcast subregs.  In principle they go from one mode to another mode
>>> >   of the same size (but read on).
>>> > 4) Paradoxical subregs.  A concept completely separate from the rest,
>>> >   different rules for everything, it has to be special cased almost
>>> >   everywhere, it would be better if it was a separate rtx_code imo.
>>> > 5) Finally, normal subregs, taking a contiguous span of bits from some
>>> >   value.
>>> > 
>>> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
>>> > written as just one subreg, as you say.  And a 4) of a 5) is just
>>> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
>>> > 
>>> >> Note whether targets actually support subreg operations needs to be queried and I’m not sure how subreg with offset validation should work there.
>>> > 
>>> > But 3) is always valid, no?  On pseudos
> I also has similar question: do we need to query/recog if "SF(SI#0)" is
> valid on the target, or it would always work (even through reload)?
> I also hit this during debugging on ppc64le: "SF(SI#0)" is valid,
> and "SF(DI#4)" is not valid. 
>>> 
>>> Yes, but it will eventually result in a spill/reload which is
>>> undesirable when we created this from CSE from a load.  So I think
>>> for CSE we do want to know whether a spill will definitely not
>>> occur.
>>
>> Does it cause reloads though?  On any sane backend?  If no movsf pattern
>> allows integer registers, can things work at all?
>>
>> Anyway, the normal way to test if some RTL is valid is to just generate
>> it (using validate_change) and then do apply_change_group, which then
>> cancels the changes if they do not work.  CSE already does some of
>> this.
> validate_change seems ok. Thanks!
>>
>> (I am doubtful doing any of this in CSE is a good idea fwiw).
> Understand your concern! Especially when we need to emit additional
> inns in CSE.
> While CSE does some similar work. It transforms
> "[sf:DI]=%x:DI; %y:DI=[sf:DI]" to "%y:DI=%x:DI".
> and "see if a MEM has already been loaded with a widening operation;
> if it has, we can use a subreg of that." (only for int modes).
> So, it may be acceptable to do this in CSE (maybe still seems
> hacking).

This maybe works for "DI to DF", because "mode converting
subreg:DF(x:DI,0)" is cheaper than "mem load DF([sf:DI])". Then
"y:DF=[sf:DI]" can be replaced by "y:DF=x:DI#0".

While for "subreg:SF(x:SI,0)", in CSE, it may not cheaper.
So, it may be doubtful for "convert DI to SF" in CSE.

Any comments or suggestions?


BR,
Jeff (Jiufu)

>
> Thanks for so great comments!
>
> BR,
> Jeff (Jiufu)
>
>>
>>
>> Segher
  
Jiufu Guo Dec. 30, 2022, 2:22 a.m. UTC | #17
Hi,

Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
>
> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
>> Hi,
>>
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>>
>>> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
>>>> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool <segher@kernel.crashing.org>:
>>>> > There are at least six very different kinds of subreg:
>>>> > 
>>>> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
>>>> >   expressed much more clearly and cleanly always.
>>>> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes away
>>>> >   this will go away.
>>>> > 2) Subregs of hard registers.  Do not use, there are much better ways to
>>>> >   write subregs of a non-zero byte offset, and for zero offset this is
>>>> >   non-canonical RTL.
>>>> > 3) Bitcast subregs.  In principle they go from one mode to another mode
>>>> >   of the same size (but read on).
>>>> > 4) Paradoxical subregs.  A concept completely separate from the rest,
>>>> >   different rules for everything, it has to be special cased almost
>>>> >   everywhere, it would be better if it was a separate rtx_code imo.
>>>> > 5) Finally, normal subregs, taking a contiguous span of bits from some
>>>> >   value.
>>>> > 
>>>> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
>>>> > written as just one subreg, as you say.  And a 4) of a 5) is just
>>>> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
>>>> > 
>>>> >> Note whether targets actually support subreg operations needs to be queried and I’m not sure how subreg with offset validation should work there.
>>>> > 
>>>> > But 3) is always valid, no?  On pseudos
>> I also has similar question: do we need to query/recog if "SF(SI#0)" is
>> valid on the target, or it would always work (even through reload)?
>> I also hit this during debugging on ppc64le: "SF(SI#0)" is valid,
>> and "SF(DI#4)" is not valid. 
>>>> 
>>>> Yes, but it will eventually result in a spill/reload which is
>>>> undesirable when we created this from CSE from a load.  So I think
>>>> for CSE we do want to know whether a spill will definitely not
>>>> occur.
>>>
>>> Does it cause reloads though?  On any sane backend?  If no movsf pattern
>>> allows integer registers, can things work at all?
>>>
>>> Anyway, the normal way to test if some RTL is valid is to just generate
>>> it (using validate_change) and then do apply_change_group, which then
>>> cancels the changes if they do not work.  CSE already does some of
>>> this.
>> validate_change seems ok. Thanks!
>>>
>>> (I am doubtful doing any of this in CSE is a good idea fwiw).
>> Understand your concern! Especially when we need to emit additional
>> inns in CSE.
>> While CSE does some similar work. It transforms
>> "[sf:DI]=%x:DI; %y:DI=[sf:DI]" to "%y:DI=%x:DI".
>> and "see if a MEM has already been loaded with a widening operation;
>> if it has, we can use a subreg of that." (only for int modes).
>> So, it may be acceptable to do this in CSE (maybe still seems
>> hacking).
>
> This maybe works for "DI to DF", because "mode converting
> subreg:DF(x:DI,0)" is cheaper than "mem load DF([sf:DI])". Then
> "y:DF=[sf:DI]" can be replaced by "y:DF=x:DI#0".
>
> While for "subreg:SF(x:SI,0)", in CSE, it may not cheaper.
> So, it may be doubtful for "convert DI to SF" in CSE.

Considering the limitations of CSE, I try to find other places
to handle this issue, and notice DSE can optimize below code:
"[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0".

So, I drafted a patch to update DSE to handle DI->DF/SF.
The patch updates "extract_low_bits" to get mode change 
with subreg.

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index b12b0e000c2..5e36331082c 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
 
   if (!targetm.modes_tieable_p (src_int_mode, src_mode))
     return NULL_RTX;
-  if (!targetm.modes_tieable_p (int_mode, mode))
+  if (!targetm.modes_tieable_p (int_mode, mode)
+      && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
+	   && GET_MODE_CLASS (mode) == MODE_FLOAT
+	   && GET_MODE_CLASS (src_mode) == MODE_INT))
     return NULL_RTX;
 
   src = gen_lowpart (src_int_mode, src);
-----------

While I am aware that DSE is not good at cross basic-block.
e.g. for code, it was not optimized.

typedef struct SF {float a[4];int i1; int i2; } SF;
int foo_si (SF a, int flag)
{
  if (flag == 2)
    return a.i1 + a.i2;
  return 0;
}


So, we may back to previous ideas which we discussed in
early of this thread.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608872.html


BR,
Jeff (Jiufu)

>
> Any comments or suggestions?
>
>
> BR,
> Jeff (Jiufu)
>
>>
>> Thanks for so great comments!
>>
>> BR,
>> Jeff (Jiufu)
>>
>>>
>>>
>>> Segher
  
Segher Boessenkool Dec. 30, 2022, 7:44 a.m. UTC | #18
Hi!

On Fri, Dec 30, 2022 at 10:22:31AM +0800, Jiufu Guo wrote:
> Considering the limitations of CSE, I try to find other places
> to handle this issue, and notice DSE can optimize below code:
> "[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0".
> 
> So, I drafted a patch to update DSE to handle DI->DF/SF.
> The patch updates "extract_low_bits" to get mode change 
> with subreg.
> 
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index b12b0e000c2..5e36331082c 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
>  
>    if (!targetm.modes_tieable_p (src_int_mode, src_mode))
>      return NULL_RTX;
> -  if (!targetm.modes_tieable_p (int_mode, mode))
> +  if (!targetm.modes_tieable_p (int_mode, mode)
> +      && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
> +	   && GET_MODE_CLASS (mode) == MODE_FLOAT
> +	   && GET_MODE_CLASS (src_mode) == MODE_INT))
>      return NULL_RTX;
>  
>    src = gen_lowpart (src_int_mode, src);

Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
it does not allow tying a scalar float to anything else.  No such thing
is required, or good apparently.  I wonder why we have such restrictions
at all in rs6000; is it just unfortunate history, was it good at one
point in time?


Segher
  
Andrew Pinski Dec. 30, 2022, 8:30 a.m. UTC | #19
On Thu, Dec 29, 2022 at 11:45 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Fri, Dec 30, 2022 at 10:22:31AM +0800, Jiufu Guo wrote:
> > Considering the limitations of CSE, I try to find other places
> > to handle this issue, and notice DSE can optimize below code:
> > "[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0".
> >
> > So, I drafted a patch to update DSE to handle DI->DF/SF.
> > The patch updates "extract_low_bits" to get mode change
> > with subreg.
> >
> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> > index b12b0e000c2..5e36331082c 100644
> > --- a/gcc/expmed.cc
> > +++ b/gcc/expmed.cc
> > @@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
> >
> >    if (!targetm.modes_tieable_p (src_int_mode, src_mode))
> >      return NULL_RTX;
> > -  if (!targetm.modes_tieable_p (int_mode, mode))
> > +  if (!targetm.modes_tieable_p (int_mode, mode)
> > +      && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
> > +        && GET_MODE_CLASS (mode) == MODE_FLOAT
> > +        && GET_MODE_CLASS (src_mode) == MODE_INT))
> >      return NULL_RTX;
> >
> >    src = gen_lowpart (src_int_mode, src);
>
> Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
> it does not allow tying a scalar float to anything else.  No such thing
> is required, or good apparently.  I wonder why we have such restrictions
> at all in rs6000; is it just unfortunate history, was it good at one
> point in time?

The documentation for TARGET_MODES_TIEABLE_P says the following:
If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK
(r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P
(mode1, mode2) should be true. If they differ for any r, you should
define this hook to return false unless some other mechanism ensures
the accessibility of the value in a narrower mode.

even though rs6000_hard_regno_mode_ok_uncached's comment has the following:
  /* The float registers (except for VSX vector modes) can only hold floating
     modes and DImode.  */

TARGET_P8_VECTOR and TARGET_P9_VECTOR has special cased different modes now:
          if (TARGET_P8_VECTOR && (mode == SImode))
            return 1;

          if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
            return 1;
Which I suspect that means rs6000_modes_tieable_p should return true
for SImode and SFmode if TARGET_P8_VECTOR is true. Likewise for
TARGET_P9_VECTOR and SFmode and QImode/HImode too.


Thanks,
Andrew Pinski

>
>
> Segher
  
Jiufu Guo Jan. 3, 2023, 3:28 a.m. UTC | #20
Hi,

Andrew Pinski <pinskia@gmail.com> writes:

> On Thu, Dec 29, 2022 at 11:45 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>>
>> Hi!
>>
>> On Fri, Dec 30, 2022 at 10:22:31AM +0800, Jiufu Guo wrote:
>> > Considering the limitations of CSE, I try to find other places
>> > to handle this issue, and notice DSE can optimize below code:
>> > "[sfp:DI]=x:DI ; y:SI=[sfp:DI]" to "y:SI=x:DI#0".
>> >
>> > So, I drafted a patch to update DSE to handle DI->DF/SF.
>> > The patch updates "extract_low_bits" to get mode change
>> > with subreg.
>> >
>> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc
>> > index b12b0e000c2..5e36331082c 100644
>> > --- a/gcc/expmed.cc
>> > +++ b/gcc/expmed.cc
>> > @@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
>> >
>> >    if (!targetm.modes_tieable_p (src_int_mode, src_mode))
>> >      return NULL_RTX;
>> > -  if (!targetm.modes_tieable_p (int_mode, mode))
>> > +  if (!targetm.modes_tieable_p (int_mode, mode)
>> > +      && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
>> > +        && GET_MODE_CLASS (mode) == MODE_FLOAT
>> > +        && GET_MODE_CLASS (src_mode) == MODE_INT))
>> >      return NULL_RTX;
>> >
>> >    src = gen_lowpart (src_int_mode, src);
>>
>> Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
>> it does not allow tying a scalar float to anything else.  No such thing
>> is required, or good apparently.  I wonder why we have such restrictions
>> at all in rs6000; is it just unfortunate history, was it good at one
>> point in time?
>
> The documentation for TARGET_MODES_TIEABLE_P says the following:
> If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK
> (r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P
> (mode1, mode2) should be true. If they differ for any r, you should
> define this hook to return false unless some other mechanism ensures
> the accessibility of the value in a narrower mode.
>
> even though rs6000_hard_regno_mode_ok_uncached's comment has the following:
>   /* The float registers (except for VSX vector modes) can only hold floating
>      modes and DImode.  */
>
> TARGET_P8_VECTOR and TARGET_P9_VECTOR has special cased different modes now:
>           if (TARGET_P8_VECTOR && (mode == SImode))
>             return 1;
>
>           if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
>             return 1;
> Which I suspect that means rs6000_modes_tieable_p should return true
> for SImode and SFmode if TARGET_P8_VECTOR is true. Likewise for
> TARGET_P9_VECTOR and SFmode and QImode/HImode too.
>
Thanks for your great comments!

modes_tieable_p is invoked by a few places besides extract_low_bits, so
updating this hook to relax the restriction may benefit more passes.

We may update modes_tieable_p for more cases as possible.
A hacked patch for "float vs. int" is listed at the end of this mail.

While back to the issue of this PR: optimize float loading which is
stored from the int register.  DSE works more on basicblock, so updating
modes_tieable_p (or extract_low_bits) can not handle some cases like:

double __attribute__ ((noipa)) foo_df (DF arg, int flag)
{
  if (flag == 2)
     return arg.a[3];
  return 0.0;
}

I'm thinking a way to handle this case.


BR,
Jeff (Jiufu)

>
> Thanks,
> Andrew Pinski
>
>>
>>
>> Segher

(To be refined.)
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b3a609f3aa3..8088a608be6 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1959,6 +1959,17 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
+  
+  if ((GET_MODE_CLASS (mode1) == MODE_FLOAT
+       && (GET_MODE_SIZE (mode2) == UNITS_PER_FP_WORD
+	   || (TARGET_P8_VECTOR && (mode2 == SImode))
+	   || (TARGET_P9_VECTOR && (mode2 == QImode || mode2 == HImode))))
+      || (GET_MODE_CLASS (mode2) == MODE_FLOAT
+	  && (GET_MODE_SIZE (mode1) == UNITS_PER_FP_WORD
+	      || (TARGET_P8_VECTOR && (mode1 == SImode))
+	      || (TARGET_P9_VECTOR && (mode1 == QImode || mode1 == HImode)))))
+    return true;
+
   if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
       || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
     return mode1 == mode2;
-------
  
Segher Boessenkool Jan. 3, 2023, 8:59 a.m. UTC | #21
Hi!

On Fri, Dec 30, 2022 at 12:30:04AM -0800, Andrew Pinski wrote:
> On Thu, Dec 29, 2022 at 11:45 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
> > it does not allow tying a scalar float to anything else.  No such thing
> > is required, or good apparently.  I wonder why we have such restrictions
> > at all in rs6000; is it just unfortunate history, was it good at one
> > point in time?
> 
> The documentation for TARGET_MODES_TIEABLE_P says the following:
> If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK
> (r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P
> (mode1, mode2) should be true. If they differ for any r, you should
> define this hook to return false unless some other mechanism ensures
> the accessibility of the value in a narrower mode.
> 
> even though rs6000_hard_regno_mode_ok_uncached's comment has the following:
>   /* The float registers (except for VSX vector modes) can only hold floating
>      modes and DImode.  */

That comment is incorrect.  See fctiw for example, which defines only
the SImode part of the result (the other bits are undefined).

> TARGET_P8_VECTOR and TARGET_P9_VECTOR has special cased different modes now:
>           if (TARGET_P8_VECTOR && (mode == SImode))
>             return 1;
> 
>           if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
>             return 1;
> Which I suspect that means rs6000_modes_tieable_p should return true
> for SImode and SFmode if TARGET_P8_VECTOR is true. Likewise for
> TARGET_P9_VECTOR and SFmode and QImode/HImode too.

It means that older CPUs do not have as many instructions to do scalar
integer operations in vector registers, making it (almost) always a
losing proposition to put scalar integers there.  On newer CPUs it is
not quite as bad, there is a full(er) complement of instructions to do
such things in vector regs, just a bit slower than on GPRs.

But yeah we might need to fix hard_regno_mode_ok if we change tieable.


Segher
  
Li, Pan2 via Gcc-patches Jan. 3, 2023, 9:10 a.m. UTC | #22
Sorry for send this mail. I enter the wrong command line.

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+lin1.hu=intel.com@gcc.gnu.org> On Behalf Of Segher Boessenkool
Sent: Tuesday, January 3, 2023 5:00 PM
To: Andrew Pinski <pinskia@gmail.com>
Cc: Jiufu Guo <guojiufu@linux.ibm.com>; Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org>; Richard Biener <richard.guenther@gmail.com>; Richard Biener <rguenther@suse.de>; dje.gcc@gmail.com; linkw@gcc.gnu.org; jeffreyalaw@gmail.com
Subject: Re: [PATCH] loading float member of parameter stored via int registers

Hi!

On Fri, Dec 30, 2022 at 12:30:04AM -0800, Andrew Pinski wrote:
> On Thu, Dec 29, 2022 at 11:45 PM Segher Boessenkool 
> <segher@kernel.crashing.org> wrote:
> > Ah!  This simply shows rs6000_modes_tieable_p is decidedly non-optimal:
> > it does not allow tying a scalar float to anything else.  No such 
> > thing is required, or good apparently.  I wonder why we have such 
> > restrictions at all in rs6000; is it just unfortunate history, was 
> > it good at one point in time?
> 
> The documentation for TARGET_MODES_TIEABLE_P says the following:
> If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK 
> (r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P 
> (mode1, mode2) should be true. If they differ for any r, you should 
> define this hook to return false unless some other mechanism ensures 
> the accessibility of the value in a narrower mode.
> 
> even though rs6000_hard_regno_mode_ok_uncached's comment has the following:
>   /* The float registers (except for VSX vector modes) can only hold floating
>      modes and DImode.  */

That comment is incorrect.  See fctiw for example, which defines only the SImode part of the result (the other bits are undefined).

> TARGET_P8_VECTOR and TARGET_P9_VECTOR has special cased different modes now:
>           if (TARGET_P8_VECTOR && (mode == SImode))
>             return 1;
> 
>           if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
>             return 1;
> Which I suspect that means rs6000_modes_tieable_p should return true 
> for SImode and SFmode if TARGET_P8_VECTOR is true. Likewise for 
> TARGET_P9_VECTOR and SFmode and QImode/HImode too.

It means that older CPUs do not have as many instructions to do scalar integer operations in vector registers, making it (almost) always a losing proposition to put scalar integers there.  On newer CPUs it is not quite as bad, there is a full(er) complement of instructions to do such things in vector regs, just a bit slower than on GPRs.

But yeah we might need to fix hard_regno_mode_ok if we change tieable.


Segher
  

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b3a609f3aa3..af676eea276 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1559,6 +1559,9 @@  static const struct attribute_spec rs6000_attribute_table[] =
 #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
 #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN invalid_arg_for_unprototyped_fn
 
+#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
+#define TARGET_LOADING_INT_CONVERT_TO_FLOAT rs6000_loading_int_convert_to_float
+
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
 
@@ -24018,6 +24021,73 @@  invalid_arg_for_unprototyped_fn (const_tree typelist, const_tree funcdecl, const
 	  : NULL;
 }
 
+/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
+static rtx
+rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx base)
+{
+  rtx src_base = XEXP (source, 0);
+  poly_uint64 offset = MEM_OFFSET (source);
+
+  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
+    {
+      offset += INTVAL (XEXP (src_base, 1));
+      src_base = XEXP (src_base, 0);
+    }
+
+  if (!rtx_equal_p (XEXP (base, 0), src_base))
+    return NULL_RTX;
+
+  rtx temp_reg = gen_reg_rtx (word_mode);
+  rtx temp_mem = copy_rtx (source);
+  PUT_MODE (temp_mem, word_mode);
+
+  /* DI->DF */
+  if (word_mode == DImode && mode == DFmode)
+    {
+      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
+	{
+	  emit_move_insn (temp_reg, temp_mem);
+	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode, 0);
+	  rtx target_reg = gen_reg_rtx (mode);
+	  emit_move_insn (target_reg, float_subreg);
+	  return target_reg;
+	}
+      return NULL_RTX;
+    }
+
+  /* Sub DI#->SF */
+  if (word_mode == DImode && mode == SFmode)
+    {
+      poly_uint64 byte_off = 0;
+      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
+	byte_off = 0;
+      else if (multiple_p (offset - GET_MODE_SIZE (mode),
+			   GET_MODE_SIZE (word_mode)))
+	byte_off = GET_MODE_SIZE (mode);
+      else
+	return NULL_RTX;
+
+      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
+      emit_move_insn (temp_reg, temp_mem);
+
+      /* little endia only? */
+      poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
+      if (known_eq (byte_off, high_off))
+	{
+	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
+				   GET_MODE_PRECISION (SImode), temp_reg, 0);
+	}
+      rtx subreg_si = gen_reg_rtx (SImode);
+      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
+      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0);
+      rtx target_reg = gen_reg_rtx (mode);
+      emit_move_insn (target_reg, float_subreg);
+      return target_reg;
+    }
+
+  return NULL_RTX;
+}
+
 /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
    setup by using __stack_chk_fail_local hidden function instead of
    calling __stack_chk_fail directly.  Otherwise it is better to call
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8fe49c2ba3d..10f94af553d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11933,6 +11933,12 @@  or when the back end is in a partially-initialized state.
 outside of any function scope.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
+If the target is protifiable to load an integer in word_mode from
+@var{source} which is based on @var{base}, then convert to a floating
+point value in @var{mode}.
+@end deftypefn
+
 @defmac TARGET_OBJECT_SUFFIX
 Define this macro to be a C string representing the suffix for object
 files on your target machine.  If you do not define this macro, GCC will
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 62c49ac46de..1ca6a671d86 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7756,6 +7756,8 @@  to by @var{ce_info}.
 
 @hook TARGET_SET_CURRENT_FUNCTION
 
+@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
+
 @defmac TARGET_OBJECT_SUFFIX
 Define this macro to be a C string representing the suffix for object
 files on your target machine.  If you do not define this macro, GCC will
diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..466079220e7 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -11812,6 +11812,21 @@  expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 	    && modifier != EXPAND_WRITE)
 	  op0 = flip_storage_order (mode1, op0);
 
+	/* Accessing float field of struct parameter which passed via integer
+	   registers.  */
+	if (targetm.loading_int_convert_to_float && mode == mode1
+	    && GET_MODE_CLASS (mode) == MODE_FLOAT
+	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
+	    && REG_P (DECL_INCOMING_RTL (tem))
+	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
+	    && MEM_OFFSET_KNOWN_P (op0))
+	  {
+	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
+							    DECL_RTL (tem));
+	    if (res)
+	      op0 = res;
+	  }
+
 	if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
 	    || modifier == EXPAND_CONST_ADDRESS
 	    || modifier == EXPAND_INITIALIZER)
diff --git a/gcc/target.def b/gcc/target.def
index 082a7c62f34..837ce902489 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4491,6 +4491,17 @@  original and the returned modes should be @code{MODE_INT}.",
  (machine_mode mode),
  default_preferred_doloop_mode)
 
+
+/* Loading an integer value from memory first, then convert (bitcast) to\n\n
+   floating point value, if target is able to support such behavior.  */
+DEFHOOK
+(loading_int_convert_to_float,
+"If the target is protifiable to load an integer in word_mode from\n\
+@var{source} which is based on @var{base}, then convert to a floating\n\
+point value in @var{mode}.",
+ rtx, (machine_mode mode, rtx source, rtx base),
+ NULL)
+
 /* Returns true for a legitimate combined insn.  */
 DEFHOOK
 (legitimate_combined_insn,
diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C
index 769585052b5..c8995cae707 100644
--- a/gcc/testsuite/g++.target/powerpc/pr102024.C
+++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
@@ -5,7 +5,7 @@ 
 // Test that a zero-width bit field in an otherwise homogeneous aggregate
 // generates a psabi warning and passes arguments in GPRs.
 
-// { dg-final { scan-assembler-times {\mstd\M} 4 } }
+// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
 
 struct a_thing
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c
new file mode 100644
index 00000000000..aa02de56405
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
@@ -0,0 +1,24 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+typedef struct DF {double a[4]; long l; } DF;
+typedef struct SF {float a[4];short l; } SF;
+
+/* Each of below function contains one mtvsrd.  */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
+double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
+float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
+float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
+
+double gd = 4.0;
+float gf1 = 3.0f, gf2 = 2.0f;
+DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
+SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
+
+int main()
+{
+  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) == gf2))
+    __builtin_abort ();
+  return 0;
+}
+