[take,2] middle-end: Support ABIs that pass FP values as wider integers.

Message ID 028601d8895b$f60bfe40$e223fac0$@nextmovesoftware.com
State New
Headers
Series [take,2] middle-end: Support ABIs that pass FP values as wider integers. |

Commit Message

Roger Sayle June 26, 2022, 12:55 p.m. UTC
  Hi Jeff,
Sorry for the long delay getting back to this, but after deeper
investigation, it turns out that your tingling spider senses that
the original patch wasn't updating everywhere that was required
were spot on.  Although my nvptx testing showed no problems with -O2,
compiling the same tests with -O0 found several additional assertion
ICEs (exactly where you'd predicted they should/would be).

Here's a revised patch that updates five locations (up from the
previous two).  Finding any remaining locations (if any) might be
easier once folks are able to test things on their targets.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and on nvptx-none, where it is
the middle-end portion of a pair of patches to allow the default ISA to
be advanced.  Ok for mainline?


2022-06-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR target/104489
	* calls.cc (precompute_register_parameters): Allow promotion
	of floating point values to be passed in wider integer modes.
	(expand_call): Allow floating point results to be returned in
	wider integer modes.
	* cfgexpand.cc (expand_value_return): Allow backends to promote
	a scalar floating point return value to a wider integer mode.
	* expr.cc (expand_expr_real_1) <expand_decl_rtl>: Likewise, allow
	backends to promote scalar FP PARM_DECLs to wider integer modes.
	* function.cc (assign_parm_setup_stack): Allow floating point
	values to be passed on the stack as wider integer modes.

Thanks in advance,
Roger
--

> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: 14 March 2022 15:30
> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider
> integers.
> 
> 
> 
> On 2/9/2022 1:12 PM, Roger Sayle wrote:
> > This patch adds middle-end support for target ABIs that pass/return
> > floating point values in integer registers with precision wider than
> > the original FP mode.  An example, is the nvptx backend where 16-bit
> > HFmode registers are passed/returned as (promoted to) SImode registers.
> > Unfortunately, this currently falls foul of the various (recent?)
> > sanity checks that (very sensibly) prevent creating paradoxical
> > SUBREGs of floating point registers.  The approach below is to
> > explicitly perform the conversion/promotion in two steps, via an
> > integer mode of same precision as the floating point value.  So on
> > nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using
> > SUBREG), then zero-extended to SImode, and likewise when going the
> > other way, parameters truncated to HImode then converted to HFmode
> > (using SUBREG).  These changes are localized to expand_value_return
> > and expanding DECL_RTL to support strange ABIs, rather than inside
> > convert_modes or gen_lowpart, as mismatched precision integer/FP
> > conversions should be explicit in the RTL, and these semantics not generally
> visible/implicit in user code.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures, and on nvptx-none, where it is
> > the middle-end portion of a pair of patches to allow the default ISA
> > to be advanced.  Ok for mainline?
> >
> > 2022-02-09  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * cfgexpand.cc (expand_value_return): Allow backends to promote
> >         a scalar floating point return value to a wider integer mode.
> >         * expr.cc (expand_expr_real_1) [expand_decl_rtl]: Likewise, allow
> >         backends to promote scalar FP PARM_DECLs to wider integer modes.
> 
> Buried somewhere in our calling conventions code is the ability to pass around
> BLKmode objects in registers along with the ability to tune left vs right padding
> adjustments.   Much of this support grew out of the PA
> 32 bit SOM ABI.
> 
> While I think we could probably make those bits do what we want, I suspect the
> result will actually be uglier than what you've done here and I wouldn't be
> surprised if there was a performance hit as the code to handle those cases was
> pretty dumb in its implementation.
> 
> What I find rather surprising is the location of your changes -- they feel
> incomplete.  For example, you fix the callee side of returns in
> expand_value_return, but I don't analogous code for the caller side.
> 
> Similarly while you fix things for arguments in expand_expr_real_1, that's again
> just the callee side.  Don't you need to so something on the caller side too?
> 
> Jeff
>
  

Comments

Jeff Law July 2, 2022, 5:15 p.m. UTC | #1
On 6/26/2022 6:55 AM, Roger Sayle wrote:
> Hi Jeff,
> Sorry for the long delay getting back to this, but after deeper
> investigation, it turns out that your tingling spider senses that
> the original patch wasn't updating everywhere that was required
> were spot on.  Although my nvptx testing showed no problems with -O2,
> compiling the same tests with -O0 found several additional assertion
> ICEs (exactly where you'd predicted they should/would be).
>
> Here's a revised patch that updates five locations (up from the
> previous two).  Finding any remaining locations (if any) might be
> easier once folks are able to test things on their targets.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures, and on nvptx-none, where it is
> the middle-end portion of a pair of patches to allow the default ISA to
> be advanced.  Ok for mainline?
>
>
> 2022-06-26  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> 	PR target/104489
> 	* calls.cc (precompute_register_parameters): Allow promotion
> 	of floating point values to be passed in wider integer modes.
> 	(expand_call): Allow floating point results to be returned in
> 	wider integer modes.
> 	* cfgexpand.cc (expand_value_return): Allow backends to promote
> 	a scalar floating point return value to a wider integer mode.
> 	* expr.cc (expand_expr_real_1) <expand_decl_rtl>: Likewise, allow
> 	backends to promote scalar FP PARM_DECLs to wider integer modes.
> 	* function.cc (assign_parm_setup_stack): Allow floating point
> 	values to be passed on the stack as wider integer modes.
OK.  Though please consider factoring the code which reinterprets the 
original into a new mode.  It looks to me like we've got 5 copies of it 
now :-)



Jeff
  

Patch

diff --git a/gcc/calls.cc b/gcc/calls.cc
index f4e1299..06d8a95 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -992,11 +992,24 @@  precompute_register_parameters (int num_actuals, struct arg_data *args,
 	/* If we are to promote the function arg to a wider mode,
 	   do it now.  */
 
-	if (args[i].mode != TYPE_MODE (TREE_TYPE (args[i].tree_value)))
-	  args[i].value
-	    = convert_modes (args[i].mode,
-			     TYPE_MODE (TREE_TYPE (args[i].tree_value)),
-			     args[i].value, args[i].unsignedp);
+	machine_mode old_mode = TYPE_MODE (TREE_TYPE (args[i].tree_value));
+
+	/* Some ABIs require scalar floating point modes to be passed
+	   in a wider scalar integer mode.  We need to explicitly
+	   reinterpret to an integer mode of the correct precision
+	   before extending to the desired result.  */
+	if (SCALAR_INT_MODE_P (args[i].mode)
+	    && SCALAR_FLOAT_MODE_P (old_mode)
+	    && known_gt (GET_MODE_SIZE (args[i].mode),
+			 GET_MODE_SIZE (old_mode)))
+	  {
+	    scalar_int_mode imode = int_mode_for_mode (old_mode).require ();
+	    rtx tmp = force_reg (imode, gen_lowpart (imode, args[i].value));
+	    args[i].value = convert_modes (args[i].mode, imode, tmp, 1);
+	  }
+	else if (args[i].mode != old_mode)
+	  args[i].value = convert_modes (args[i].mode, old_mode,
+					 args[i].value, args[i].unsignedp);
 
 	/* If the value is a non-legitimate constant, force it into a
 	   pseudo now.  TLS symbols sometimes need a call to resolve.  */
@@ -3825,18 +3838,28 @@  expand_call (tree exp, rtx target, int ignore)
 	{
 	  tree type = rettype;
 	  int unsignedp = TYPE_UNSIGNED (type);
+	  machine_mode ret_mode = TYPE_MODE (type);
 	  machine_mode pmode;
 
 	  /* Ensure we promote as expected, and get the new unsignedness.  */
-	  pmode = promote_function_mode (type, TYPE_MODE (type), &unsignedp,
+	  pmode = promote_function_mode (type, ret_mode, &unsignedp,
 					 funtype, 1);
 	  gcc_assert (GET_MODE (target) == pmode);
 
-	  poly_uint64 offset = subreg_lowpart_offset (TYPE_MODE (type),
-						      GET_MODE (target));
-	  target = gen_rtx_SUBREG (TYPE_MODE (type), target, offset);
-	  SUBREG_PROMOTED_VAR_P (target) = 1;
-	  SUBREG_PROMOTED_SET (target, unsignedp);
+	  if (SCALAR_INT_MODE_P (pmode)
+	      && SCALAR_FLOAT_MODE_P (ret_mode)
+	      && known_gt (GET_MODE_SIZE (pmode), GET_MODE_SIZE (ret_mode)))
+	    {
+	      scalar_int_mode imode = int_mode_for_mode (ret_mode).require ();
+	      target = force_reg (imode, gen_lowpart (imode, target));
+	      target = gen_lowpart_SUBREG (ret_mode, target);
+	    }
+	  else
+	    {
+	      target = gen_lowpart_SUBREG (ret_mode, target);
+	      SUBREG_PROMOTED_VAR_P (target) = 1;
+	      SUBREG_PROMOTED_SET (target, unsignedp);
+	    }
 	}
 
       /* If size of args is variable or this was a constructor call for a stack
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index bb33c1b..a9afec6 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -3720,7 +3720,22 @@  expand_value_return (rtx val)
         mode = promote_function_mode (type, old_mode, &unsignedp, funtype, 1);
 
       if (mode != old_mode)
-	val = convert_modes (mode, old_mode, val, unsignedp);
+	{
+	  /* Some ABIs require scalar floating point modes to be returned
+	     in a wider scalar integer mode.  We need to explicitly
+	     reinterpret to an integer mode of the correct precision
+	     before extending to the desired result.  */
+	  if (SCALAR_INT_MODE_P (mode)
+	      && SCALAR_FLOAT_MODE_P (old_mode)
+	      && known_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (old_mode)))
+	    {
+	      scalar_int_mode imode = int_mode_for_mode (old_mode).require ();
+	      val = force_reg (imode, gen_lowpart (imode, val));
+	      val = convert_modes (mode, imode, val, 1);
+	    }
+	  else
+	    val = convert_modes (mode, old_mode, val, unsignedp);
+	}
 
       if (GET_CODE (return_reg) == PARALLEL)
 	emit_group_load (return_reg, val, type, int_size_in_bytes (type));
diff --git a/gcc/expr.cc b/gcc/expr.cc
index c90cde3..c1ce49e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10771,6 +10771,19 @@  expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 	    pmode = promote_ssa_mode (ssa_name, &unsignedp);
 	  gcc_assert (GET_MODE (decl_rtl) == pmode);
 
+	  /* Some ABIs require scalar floating point modes to be passed
+	     in a wider scalar integer mode.  We need to explicitly
+	     truncate to an integer mode of the correct precision before
+	     using a SUBREG to reinterpret as a floating point value.  */
+	  if (SCALAR_FLOAT_MODE_P (mode)
+	      && SCALAR_INT_MODE_P (pmode)
+	      && known_lt (GET_MODE_SIZE (mode), GET_MODE_SIZE (pmode)))
+	    {
+	      scalar_int_mode imode = int_mode_for_mode (mode).require ();
+	      temp = force_reg (imode, gen_lowpart (imode, decl_rtl));
+	      return gen_lowpart_SUBREG (mode, temp);
+	    }
+
 	  temp = gen_lowpart_SUBREG (mode, decl_rtl);
 	  SUBREG_PROMOTED_VAR_P (temp) = 1;
 	  SUBREG_PROMOTED_SET (temp, unsignedp);
diff --git a/gcc/function.cc b/gcc/function.cc
index ad0096a..657b95d 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -3472,6 +3472,21 @@  assign_parm_setup_stack (struct assign_parm_data_all *all, tree parm,
 
       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
 
+      /* Some ABIs require scalar floating point modes to be passed
+	 in a wider scalar integer mode.  We need to explicitly
+	 truncate to an integer mode of the correct precision before
+	 using a SUBREG to reinterpret as a floating point value.  */
+      if (SCALAR_FLOAT_MODE_P (data->nominal_mode)
+	  && SCALAR_INT_MODE_P (data->arg.mode)
+	  && known_lt (GET_MODE_SIZE (data->nominal_mode),
+		       GET_MODE_SIZE (data->arg.mode)))
+	{
+	  scalar_int_mode imode
+	    = int_mode_for_mode (data->nominal_mode).require ();
+	  tempreg = force_reg (imode, gen_lowpart (imode, tempreg));
+	  tempreg = gen_lowpart_SUBREG (data->nominal_mode, tempreg);
+	}
+
       push_to_sequence2 (all->first_conversion_insn, all->last_conversion_insn);
       to_conversion = true;