aarch64, builtins: Include PR registers in FUNCTION_ARG_REGNO_P etc. [PR109254]

Message ID ZB4dSDShmhmy6Y6k@tucnak
State New
Headers
Series aarch64, builtins: Include PR registers in FUNCTION_ARG_REGNO_P etc. [PR109254] |

Commit Message

Jakub Jelinek March 24, 2023, 9:59 p.m. UTC
  Hi!

The testcase in the PR (which unfortunately because of my lack of experience
with SVE I'm not able to turn into a runtime testcase that verifies it)
is miscompiled on aarch64-linux in the regname pass, because while the
function takes arguments in the p0 register, FUNCTION_ARG_REGNO_P doesn't
reflect that, so DF doesn't know the register is used in register passing.
It sees 2 chains with p1 register and wants to replace the second one and
as DF doesn't know p0 is live at the start of the function, it will happily
use p0 register even when it is used in subsequent instructions.

The following patch fixes that.  FUNCTION_ARG_REGNO_P returns non-zero
for p0-p3 (unconditionally, seems for the floating/vector registers it
doesn't conditionalize them on TARGET_FLOAT either, but if you want,
I can conditionalize p0-p3 on TARGET_SVE), similarly
targetm.calls.function_value_regno_p returns true for p0 register
if TARGET_SVE (again for consistency, that function conditionalizes
the float/vector on TARGET_FLOAT); looking at the AAPCS, seems p1-p3
could be also used to return values in case of homogenous aggregates,
but it doesn't seem GCC supports putting svbool_t as a member of a
structure.

Now, that change broke bootstrap in libobjc and some
__builtin_apply_args/__builtin_apply/__builtin_return tests.  The
aarch64_get_reg_raw_mode hook already documents that SVE scalable arg/return
passing is fundamentally incompatible with those builtins, but unlike
the floating/vector regs where it forces a fixed vector mode, I think
there is no fixed mode which could be used for p0-p3.  So, I have tweaked
the generic code so that it uses VOIDmode return from that hook to signal
that a register shouldn't be touched by
__builtin_apply_args/__builtin_apply/__builtin_return
despite being mentioned in FUNCTION_ARG_REGNO_P or
targetm.calls.function_value_regno_p.

Bootstrapped/regtested on aarch64-linux, ok for trunk?

Could somebody please turn the testcase from the PR into something that
can be included into the testsuite?

2023-03-24  Jakub Jelinek  <jakub@redhat.com>

	PR target/109254
	* builtins.cc (apply_args_size): If targetm.calls.get_raw_arg_mode
	returns VOIDmode, handle it like if the register isn't used for
	passing arguments at all.
	(apply_result_size): If targetm.calls.get_raw_result_mode returns
	VOIDmode, handle it like if the register isn't used for returning
	results at all.
	* target.def (get_raw_result_mode, get_raw_arg_mode): Document what it
	means to return VOIDmode.
	* doc/tm.texi: Regenerated.
	* config/aarch64/aarch64.cc (aarch64_function_value_regno_p): Return
	TARGET_SVE for P0_REGNUM.
	(aarch64_function_arg_regno_p): Also return true for p0-p3.
	(aarch64_get_reg_raw_mode): Return VOIDmode for PR_REGNUM_P regs.


	Jakub
  

Comments

Jakub Jelinek March 31, 2023, 2:20 p.m. UTC | #1
Hi!

On Fri, Mar 24, 2023 at 10:59:36PM +0100, Jakub Jelinek via Gcc-patches wrote:
> 2023-03-24  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/109254
> 	* builtins.cc (apply_args_size): If targetm.calls.get_raw_arg_mode
> 	returns VOIDmode, handle it like if the register isn't used for
> 	passing arguments at all.
> 	(apply_result_size): If targetm.calls.get_raw_result_mode returns
> 	VOIDmode, handle it like if the register isn't used for returning
> 	results at all.
> 	* target.def (get_raw_result_mode, get_raw_arg_mode): Document what it
> 	means to return VOIDmode.
> 	* doc/tm.texi: Regenerated.
> 	* config/aarch64/aarch64.cc (aarch64_function_value_regno_p): Return
> 	TARGET_SVE for P0_REGNUM.
> 	(aarch64_function_arg_regno_p): Also return true for p0-p3.
> 	(aarch64_get_reg_raw_mode): Return VOIDmode for PR_REGNUM_P regs.

I'd like to ping this patch.
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614594.html

Thanks

	Jakub
  
Jeff Law March 31, 2023, 2:35 p.m. UTC | #2
On 3/24/23 15:59, Jakub Jelinek via Gcc-patches wrote:
> Hi!
> 
> The testcase in the PR (which unfortunately because of my lack of experience
> with SVE I'm not able to turn into a runtime testcase that verifies it)
> is miscompiled on aarch64-linux in the regname pass, because while the
> function takes arguments in the p0 register, FUNCTION_ARG_REGNO_P doesn't
> reflect that, so DF doesn't know the register is used in register passing.
> It sees 2 chains with p1 register and wants to replace the second one and
> as DF doesn't know p0 is live at the start of the function, it will happily
> use p0 register even when it is used in subsequent instructions.
> 
> The following patch fixes that.  FUNCTION_ARG_REGNO_P returns non-zero
> for p0-p3 (unconditionally, seems for the floating/vector registers it
> doesn't conditionalize them on TARGET_FLOAT either, but if you want,
> I can conditionalize p0-p3 on TARGET_SVE), similarly
> targetm.calls.function_value_regno_p returns true for p0 register
> if TARGET_SVE (again for consistency, that function conditionalizes
> the float/vector on TARGET_FLOAT); looking at the AAPCS, seems p1-p3
> could be also used to return values in case of homogenous aggregates,
> but it doesn't seem GCC supports putting svbool_t as a member of a
> structure.
> 
> Now, that change broke bootstrap in libobjc and some
> __builtin_apply_args/__builtin_apply/__builtin_return tests.  The
> aarch64_get_reg_raw_mode hook already documents that SVE scalable arg/return
> passing is fundamentally incompatible with those builtins, but unlike
> the floating/vector regs where it forces a fixed vector mode, I think
> there is no fixed mode which could be used for p0-p3.  So, I have tweaked
> the generic code so that it uses VOIDmode return from that hook to signal
> that a register shouldn't be touched by
> __builtin_apply_args/__builtin_apply/__builtin_return
> despite being mentioned in FUNCTION_ARG_REGNO_P or
> targetm.calls.function_value_regno_p.
> 
> Bootstrapped/regtested on aarch64-linux, ok for trunk?
> 
> Could somebody please turn the testcase from the PR into something that
> can be included into the testsuite?
> 
> 2023-03-24  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/109254
> 	* builtins.cc (apply_args_size): If targetm.calls.get_raw_arg_mode
> 	returns VOIDmode, handle it like if the register isn't used for
> 	passing arguments at all.
> 	(apply_result_size): If targetm.calls.get_raw_result_mode returns
> 	VOIDmode, handle it like if the register isn't used for returning
> 	results at all.
> 	* target.def (get_raw_result_mode, get_raw_arg_mode): Document what it
> 	means to return VOIDmode.
> 	* doc/tm.texi: Regenerated.
> 	* config/aarch64/aarch64.cc (aarch64_function_value_regno_p): Return
> 	TARGET_SVE for P0_REGNUM.
> 	(aarch64_function_arg_regno_p): Also return true for p0-p3.
> 	(aarch64_get_reg_raw_mode): Return VOIDmode for PR_REGNUM_P regs.
Generic bits are OK by me.  The aarch64 bits looks sensible, but I'd 
like to give the aarch folks one more chance to chime in.

So OK for the trunk Monday if you haven't heard otherwise.

jeff
  
Richard Sandiford March 31, 2023, 3:36 p.m. UTC | #3
Thanks for the patch and sorry for the slow reply.

Jakub Jelinek <jakub@redhat.com> writes:
> Hi!
>
> The testcase in the PR (which unfortunately because of my lack of experience
> with SVE I'm not able to turn into a runtime testcase that verifies it)
> is miscompiled on aarch64-linux in the regname pass, because while the
> function takes arguments in the p0 register, FUNCTION_ARG_REGNO_P doesn't
> reflect that, so DF doesn't know the register is used in register passing.
> It sees 2 chains with p1 register and wants to replace the second one and
> as DF doesn't know p0 is live at the start of the function, it will happily
> use p0 register even when it is used in subsequent instructions.
>
> The following patch fixes that.  FUNCTION_ARG_REGNO_P returns non-zero
> for p0-p3 (unconditionally, seems for the floating/vector registers it
> doesn't conditionalize them on TARGET_FLOAT either, but if you want,
> I can conditionalize p0-p3 on TARGET_SVE),

I agree doing it unconditionally makes sense.

> similarly targetm.calls.function_value_regno_p returns true for p0
> register if TARGET_SVE (again for consistency, that function
> conditionalizes the float/vector on TARGET_FLOAT); looking at the
> AAPCS, seems p1-p3 could be also used to return values in case of
> homogenous aggregates, but it doesn't seem GCC supports putting
> svbool_t as a member of a structure.

One testcase that uses p1-p3 for return values is:

typedef __SVBool_t fixed_bool_t __attribute__((arm_sve_vector_bits(256)));
struct s { fixed_bool_t x[4]; };
struct s f (struct s *ptr) { return *ptr; }

compiled with -msve-vector-bits=256.

> Now, that change broke bootstrap in libobjc and some
> __builtin_apply_args/__builtin_apply/__builtin_return tests.  The
> aarch64_get_reg_raw_mode hook already documents that SVE scalable arg/return
> passing is fundamentally incompatible with those builtins, but unlike
> the floating/vector regs where it forces a fixed vector mode, I think
> there is no fixed mode which could be used for p0-p3.  So, I have tweaked
> the generic code so that it uses VOIDmode return from that hook to signal
> that a register shouldn't be touched by
> __builtin_apply_args/__builtin_apply/__builtin_return
> despite being mentioned in FUNCTION_ARG_REGNO_P or
> targetm.calls.function_value_regno_p.
>
> Bootstrapped/regtested on aarch64-linux, ok for trunk?
>
> Could somebody please turn the testcase from the PR into something that
> can be included into the testsuite?

This seems to work:

/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -funroll-loops" } */

#include <stdio.h>
#include <arm_sve.h>

svfloat32_t __attribute__((noipa))
func_demo(svfloat32_t x, svfloat32_t y, svbool_t pg)
{
  svfloat32_t z = svadd_f32_x(pg, x, svdup_f32(0x1.800fep19f));
  svbool_t cmp = svcmplt_f32(pg, z, svdup_f32(0.0f));
  svfloat32_t zM1 = svsub_f32_x(pg, z, svdup_n_f32(1.0f));
  z = svsel_f32(cmp, zM1, z);
  svfloat32_t sum = svadd_f32_x(pg, z, y);
  return sum;
}

int
main()
{
  float arr[2];
  svfloat32_t x = svinsr_n_f32(svdup_f32(-0x1.880fep19f), 2.0f);
  svfloat32_t res = func_demo(x, svdup_f32(0.5f), svptrue_b32());
  svst1_f32(svptrue_pat_b32(SV_VL2), arr, res);
  if (arr[0] != 786561.500000 || arr[1] != -16384.500000)
    __builtin_abort ();
  return 0;
}

> 2023-03-24  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR target/109254
> 	* builtins.cc (apply_args_size): If targetm.calls.get_raw_arg_mode
> 	returns VOIDmode, handle it like if the register isn't used for
> 	passing arguments at all.
> 	(apply_result_size): If targetm.calls.get_raw_result_mode returns
> 	VOIDmode, handle it like if the register isn't used for returning
> 	results at all.
> 	* target.def (get_raw_result_mode, get_raw_arg_mode): Document what it
> 	means to return VOIDmode.
> 	* doc/tm.texi: Regenerated.
> 	* config/aarch64/aarch64.cc (aarch64_function_value_regno_p): Return
> 	TARGET_SVE for P0_REGNUM.
> 	(aarch64_function_arg_regno_p): Also return true for p0-p3.
> 	(aarch64_get_reg_raw_mode): Return VOIDmode for PR_REGNUM_P regs.
>
> @@ -7995,6 +7999,10 @@ aarch64_get_reg_raw_mode (int regno)
>         for SVE types are fundamentally incompatible with the
>         __builtin_return/__builtin_apply interface.  */
>      return as_a <fixed_size_mode> (V16QImode);
> +  if (PR_REGNUM_P (regno))
> +    /* For SVE PR regs, indicate that they should be ignored for
> +       __builtin_apply/__builtin_return.  */
> +    return as_a <fixed_size_mode> (VOIDmode);   

Reached me with trailing whitespace, but maybe that's a mailerism.

It's an interesting philosophical question whether VOIDmode is a
fixed-size mode, given that VOIDmode doesn't really have a size. :-)
But I'm sure in practice we already treat it like that elsewhere.
It's just the first time I remember it being so explicit.

OK for the aarch64 bits with aarch64_function_value_regno_p
changed to use the same range as aarch64_function_arg_regno_p.

Thanks,
Richard
  

Patch

--- gcc/builtins.cc.jj	2023-03-24 10:38:40.185097837 +0100
+++ gcc/builtins.cc	2023-03-24 11:06:49.781725290 +0100
@@ -1446,18 +1446,19 @@  apply_args_size (void)
 	  {
 	    fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
 
-	    gcc_assert (mode != VOIDmode);
-
-	    align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
-	    if (size % align != 0)
-	      size = CEIL (size, align) * align;
-	    size += GET_MODE_SIZE (mode);
-	    apply_args_mode[regno] = mode;
+	    if (mode != VOIDmode)
+	      {
+		align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
+		if (size % align != 0)
+		  size = CEIL (size, align) * align;
+		size += GET_MODE_SIZE (mode);
+		apply_args_mode[regno] = mode;
+	      }
+	    else
+	      apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
 	  }
 	else
-	  {
-	    apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
-	  }
+	  apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
     }
   return size;
 }
@@ -1481,13 +1482,16 @@  apply_result_size (void)
 	  {
 	    fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
 
-	    gcc_assert (mode != VOIDmode);
-
-	    align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
-	    if (size % align != 0)
-	      size = CEIL (size, align) * align;
-	    size += GET_MODE_SIZE (mode);
-	    apply_result_mode[regno] = mode;
+	    if (mode != VOIDmode)
+	      {
+		align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
+		if (size % align != 0)
+		  size = CEIL (size, align) * align;
+		size += GET_MODE_SIZE (mode);
+		apply_result_mode[regno] = mode;
+	      }
+	    else
+	      apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
 	  }
 	else
 	  apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
--- gcc/target.def.jj	2023-03-23 10:00:58.722094571 +0100
+++ gcc/target.def	2023-03-24 11:12:46.978585647 +0100
@@ -5324,7 +5324,8 @@  DEFHOOK
 (get_raw_result_mode,
  "This target hook returns the mode to be used when accessing raw return\n\
 registers in @code{__builtin_return}.  Define this macro if the value\n\
-in @var{reg_raw_mode} is not correct.",
+in @var{reg_raw_mode} is not correct.  Use @code{VOIDmode} if a register\n\
+should be ignored for @code{__builtin_return} purposes.",
  fixed_size_mode, (int regno),
  default_get_reg_raw_mode)
 
@@ -5334,7 +5335,8 @@  DEFHOOK
 (get_raw_arg_mode,
  "This target hook returns the mode to be used when accessing raw argument\n\
 registers in @code{__builtin_apply_args}.  Define this macro if the value\n\
-in @var{reg_raw_mode} is not correct.",
+in @var{reg_raw_mode} is not correct.  Use @code{VOIDmode} if a register\n\
+should be ignored for @code{__builtin_apply_args} purposes.",
  fixed_size_mode, (int regno),
  default_get_reg_raw_mode)
 
--- gcc/doc/tm.texi.jj	2023-03-23 10:00:58.631095885 +0100
+++ gcc/doc/tm.texi	2023-03-24 11:12:52.062512496 +0100
@@ -4820,13 +4820,15 @@  nothing when you use @option{-freg-struc
 @deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
 This target hook returns the mode to be used when accessing raw return
 registers in @code{__builtin_return}.  Define this macro if the value
-in @var{reg_raw_mode} is not correct.
+in @var{reg_raw_mode} is not correct.  Use @code{VOIDmode} if a register
+should be ignored for @code{__builtin_return} purposes.
 @end deftypefn
 
 @deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
 This target hook returns the mode to be used when accessing raw argument
 registers in @code{__builtin_apply_args}.  Define this macro if the value
-in @var{reg_raw_mode} is not correct.
+in @var{reg_raw_mode} is not correct.  Use @code{VOIDmode} if a register
+should be ignored for @code{__builtin_apply_args} purposes.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_EMPTY_RECORD_P (const_tree @var{type})
--- gcc/config/aarch64/aarch64.cc.jj	2023-03-23 19:50:46.766715343 +0100
+++ gcc/config/aarch64/aarch64.cc	2023-03-24 11:10:29.166568603 +0100
@@ -7388,6 +7388,9 @@  aarch64_function_value_regno_p (const un
   if (regno >= V0_REGNUM && regno < V0_REGNUM + HA_MAX_NUM_FLDS)
     return TARGET_FLOAT;
 
+  if (regno == P0_REGNUM)
+    return TARGET_SVE;
+
   return false;
 }
 
@@ -7959,7 +7962,8 @@  bool
 aarch64_function_arg_regno_p (unsigned regno)
 {
   return ((GP_REGNUM_P (regno) && regno < R0_REGNUM + NUM_ARG_REGS)
-	  || (FP_REGNUM_P (regno) && regno < V0_REGNUM + NUM_FP_ARG_REGS));
+	  || (FP_REGNUM_P (regno) && regno < V0_REGNUM + NUM_FP_ARG_REGS)
+	  || (PR_REGNUM_P (regno) && regno < P0_REGNUM + NUM_PR_ARG_REGS));
 }
 
 /* Implement FUNCTION_ARG_BOUNDARY.  Every parameter gets at least
@@ -7995,6 +7999,10 @@  aarch64_get_reg_raw_mode (int regno)
        for SVE types are fundamentally incompatible with the
        __builtin_return/__builtin_apply interface.  */
     return as_a <fixed_size_mode> (V16QImode);
+  if (PR_REGNUM_P (regno))
+    /* For SVE PR regs, indicate that they should be ignored for
+       __builtin_apply/__builtin_return.  */
+    return as_a <fixed_size_mode> (VOIDmode);   
   return default_get_reg_raw_mode (regno);
 }