[RFC] aarch64: Add support for __BitInt

Message ID 289f93f5-fe3a-4e04-bd49-390ba7a84a94@arm.com
State New
Headers
Series [RFC] aarch64: Add support for __BitInt |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 fail Testing failed

Commit Message

Andre Vieira (lists) Jan. 10, 2024, 7:05 p.m. UTC
  Hi,

This patch is still work in progress, but posting to show failure with 
bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's 
because the idx (3) is out of range for the __BitInt(135) with a 
limb_prec of 64.

I hacked gcc locally to work around this issue and still have one 
outstanding failure, so will look to resolve that failure before posting 
a new version.

Kind Regards,
Andre
  

Comments

Andrew Pinski Jan. 10, 2024, 8:59 p.m. UTC | #1
On Wed, Jan 10, 2024 at 11:06 AM Andre Vieira (lists)
<andre.simoesdiasvieira@arm.com> wrote:
>
> Hi,
>
> This patch is still work in progress, but posting to show failure with
> bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's
> because the idx (3) is out of range for the __BitInt(135) with a
> limb_prec of 64.

Looks like the same issue can happen on x86_64, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113315 .

Thanks,
Andrew

>
> I hacked gcc locally to work around this issue and still have one
> outstanding failure, so will look to resolve that failure before posting
> a new version.
>
> Kind Regards,
> Andre
  
Jakub Jelinek Jan. 11, 2024, 8:53 a.m. UTC | #2
On Wed, Jan 10, 2024 at 07:05:39PM +0000, Andre Vieira (lists) wrote:
> This patch is still work in progress, but posting to show failure with
> bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's
> because the idx (3) is out of range for the __BitInt(135) with a limb_prec
> of 64.

I can reproduce it, will debug it momentarily.

> I hacked gcc locally to work around this issue and still have one
> outstanding failure, so will look to resolve that failure before posting a
> new version.

I see 2 issues in your patch:
1) as written earlier, the bitint lowering pass doesn't yet support big
   endian, so I think at least temporarily we shouldn't pretend it works
   until it is fixed;
   https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626847.html
   mentions the big-endian issues
2) for n in [65, 128] I believe the aarch64 ABI says that _BitInt should be
   treated the same as __int128; so for those cases we want limb_mode =
   abi_limb_mode = TImode; those cases are just lowered to TImode
   arithmetics, there is no attempt to treat those as arrays of smaller
   limbs.
   It is just the [129, 65535] cases where we want limb_mode DImode (for
   efficiency and implementability) and abi_limb_mode TImode (for the ABI
   reasons, alignment, sizing, ...), i.e. the cases where the ABI says
   it is treated as an array of the 128-bit limbs

--- gcc/config/aarch64/aarch64.cc.jj	2024-01-11 09:37:28.000000000 +0100
+++ gcc/config/aarch64/aarch64.cc	2024-01-11 09:41:03.440202903 +0100
@@ -28279,12 +28279,20 @@ aarch64_excess_precision (enum excess_pr
 bool
 aarch64_bitint_type_info (int n, struct bitint_info *info)
 {
+  /* For now, the middle-end bitint lowering doesn't handle big endian.  */
+  if (TARGET_BIG_END)
+    return false;
+
   if (n <= 8)
     info->limb_mode = QImode;
   else if (n <= 16)
     info->limb_mode = HImode;
   else if (n <= 32)
     info->limb_mode = SImode;
+  else if (n <= 64)
+    info->limb_mode = DImode;
+  else if (n <= 128)
+    info->limb_mode = TImode;
   else
     info->limb_mode = DImode;
 


	Jakub
  
Jakub Jelinek Jan. 11, 2024, 11:12 a.m. UTC | #3
On Thu, Jan 11, 2024 at 09:53:33AM +0100, Jakub Jelinek wrote:
> On Wed, Jan 10, 2024 at 07:05:39PM +0000, Andre Vieira (lists) wrote:
> > This patch is still work in progress, but posting to show failure with
> > bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's
> > because the idx (3) is out of range for the __BitInt(135) with a limb_prec
> > of 64.
> 
> I can reproduce it, will debug it momentarily.

So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
BITINT_TYPEs to determine if it can be handled cheaply.
On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
if TYPE_SIZE is equal, it means it has the same number of limbs.
But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
both are 256-bit storage, but because DImode is used as limb_mode, the
former actually needs just 3 limbs, while the latter needs 4 limbs.
And limb_access_type was asserting that we don't try to access 4th limb
on types which actually have a precision which needs just 3 limbs.

The following patch (so far tested on x86_64 with all the bitint tests plus
on the bitint-7.c testcase in a cross to aarch64) should fix that.

Note, for the info.extended targets (currently none, but I think arm 32-bit
in the ABI is meant like that), we'll need to do something different,
because the upper bits aren't just padding and should be zero/sign extended,
so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
treat _BitInt(135) not as 5 SImode limbs, but 6.  For !info.extended targets
I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.

2024-01-11  Jakub Jelinek  <jakub@redhat.com>

	* gimple-lower-bitint.cc (mergeable_op): Instead of comparing
	TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
	CEIL (TYPE_PRECISION (t), limb_prec).
	(bitint_large_huge::handle_cast): Likewise.

--- gcc/gimple-lower-bitint.cc.jj	2024-01-08 13:58:21.448176859 +0100
+++ gcc/gimple-lower-bitint.cc	2024-01-11 11:46:49.147779946 +0100
@@ -231,7 +231,8 @@ mergeable_op (gimple *stmt)
 	    && TREE_CODE (rhs_type) == BITINT_TYPE
 	    && bitint_precision_kind (lhs_type) >= bitint_prec_large
 	    && bitint_precision_kind (rhs_type) >= bitint_prec_large
-	    && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type)))
+	    && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
+		== CEIL (TYPE_PRECISION (rhs_type), limb_prec)))
 	  {
 	    if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type))
 	      return true;
@@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs
 	     if m_upwards_2limb * limb_prec is equal to
 	     lhs precision that is not the case.  */
 	  || (!m_var_msb
-	      && tree_int_cst_equal (TYPE_SIZE (rhs_type),
-				     TYPE_SIZE (lhs_type))
+	      && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
+		  == CEIL (TYPE_PRECISION (rhs_type), limb_prec))
 	      && (!m_upwards_2limb
 		  || (m_upwards_2limb * limb_prec
 		      < TYPE_PRECISION (lhs_type)))))

	Jakub
  
Jakub Jelinek Jan. 12, 2024, 9:13 a.m. UTC | #4
On Thu, Jan 11, 2024 at 12:12:59PM +0100, Jakub Jelinek wrote:
> So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
> BITINT_TYPEs to determine if it can be handled cheaply.
> On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
> if TYPE_SIZE is equal, it means it has the same number of limbs.
> But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
> both are 256-bit storage, but because DImode is used as limb_mode, the
> former actually needs just 3 limbs, while the latter needs 4 limbs.
> And limb_access_type was asserting that we don't try to access 4th limb
> on types which actually have a precision which needs just 3 limbs.
> 
> The following patch (so far tested on x86_64 with all the bitint tests plus
> on the bitint-7.c testcase in a cross to aarch64) should fix that.
> 
> Note, for the info.extended targets (currently none, but I think arm 32-bit
> in the ABI is meant like that), we'll need to do something different,
> because the upper bits aren't just padding and should be zero/sign extended,
> so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
> treat _BitInt(135) not as 5 SImode limbs, but 6.  For !info.extended targets
> I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.
> 
> 2024-01-11  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* gimple-lower-bitint.cc (mergeable_op): Instead of comparing
> 	TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
> 	CEIL (TYPE_PRECISION (t), limb_prec).
> 	(bitint_large_huge::handle_cast): Likewise.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

> --- gcc/gimple-lower-bitint.cc.jj	2024-01-08 13:58:21.448176859 +0100
> +++ gcc/gimple-lower-bitint.cc	2024-01-11 11:46:49.147779946 +0100
> @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt)
>  	    && TREE_CODE (rhs_type) == BITINT_TYPE
>  	    && bitint_precision_kind (lhs_type) >= bitint_prec_large
>  	    && bitint_precision_kind (rhs_type) >= bitint_prec_large
> -	    && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type)))
> +	    && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> +		== CEIL (TYPE_PRECISION (rhs_type), limb_prec)))
>  	  {
>  	    if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type))
>  	      return true;
> @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs
>  	     if m_upwards_2limb * limb_prec is equal to
>  	     lhs precision that is not the case.  */
>  	  || (!m_var_msb
> -	      && tree_int_cst_equal (TYPE_SIZE (rhs_type),
> -				     TYPE_SIZE (lhs_type))
> +	      && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> +		  == CEIL (TYPE_PRECISION (rhs_type), limb_prec))
>  	      && (!m_upwards_2limb
>  		  || (m_upwards_2limb * limb_prec
>  		      < TYPE_PRECISION (lhs_type)))))
> 
> 	Jakub

	Jakub
  
Richard Biener Jan. 12, 2024, 9:42 a.m. UTC | #5
On Fri, 12 Jan 2024, Jakub Jelinek wrote:

> On Thu, Jan 11, 2024 at 12:12:59PM +0100, Jakub Jelinek wrote:
> > So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
> > BITINT_TYPEs to determine if it can be handled cheaply.
> > On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
> > if TYPE_SIZE is equal, it means it has the same number of limbs.
> > But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
> > both are 256-bit storage, but because DImode is used as limb_mode, the
> > former actually needs just 3 limbs, while the latter needs 4 limbs.
> > And limb_access_type was asserting that we don't try to access 4th limb
> > on types which actually have a precision which needs just 3 limbs.
> > 
> > The following patch (so far tested on x86_64 with all the bitint tests plus
> > on the bitint-7.c testcase in a cross to aarch64) should fix that.
> > 
> > Note, for the info.extended targets (currently none, but I think arm 32-bit
> > in the ABI is meant like that), we'll need to do something different,
> > because the upper bits aren't just padding and should be zero/sign extended,
> > so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
> > treat _BitInt(135) not as 5 SImode limbs, but 6.  For !info.extended targets
> > I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.
> > 
> > 2024-01-11  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	* gimple-lower-bitint.cc (mergeable_op): Instead of comparing
> > 	TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
> > 	CEIL (TYPE_PRECISION (t), limb_prec).
> > 	(bitint_large_huge::handle_cast): Likewise.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> > --- gcc/gimple-lower-bitint.cc.jj	2024-01-08 13:58:21.448176859 +0100
> > +++ gcc/gimple-lower-bitint.cc	2024-01-11 11:46:49.147779946 +0100
> > @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt)
> >  	    && TREE_CODE (rhs_type) == BITINT_TYPE
> >  	    && bitint_precision_kind (lhs_type) >= bitint_prec_large
> >  	    && bitint_precision_kind (rhs_type) >= bitint_prec_large
> > -	    && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type)))
> > +	    && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> > +		== CEIL (TYPE_PRECISION (rhs_type), limb_prec)))
> >  	  {
> >  	    if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type))
> >  	      return true;
> > @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs
> >  	     if m_upwards_2limb * limb_prec is equal to
> >  	     lhs precision that is not the case.  */
> >  	  || (!m_var_msb
> > -	      && tree_int_cst_equal (TYPE_SIZE (rhs_type),
> > -				     TYPE_SIZE (lhs_type))
> > +	      && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> > +		  == CEIL (TYPE_PRECISION (rhs_type), limb_prec))
> >  	      && (!m_upwards_2limb
> >  		  || (m_upwards_2limb * limb_prec
> >  		      < TYPE_PRECISION (lhs_type)))))
> > 
> > 	Jakub
> 
> 	Jakub
> 
>
  

Patch

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a5a6b52730d6c5013346d128e89915883f1707ae..15fb0ece5256f25c2ca8bb5cb82fc61488d0393e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -6534,7 +6534,7 @@  aarch64_return_in_memory_1 (const_tree type)
   machine_mode ag_mode;
   int count;
 
-  if (!AGGREGATE_TYPE_P (type)
+  if (!(AGGREGATE_TYPE_P (type) || TREE_CODE (type) == BITINT_TYPE)
       && TREE_CODE (type) != COMPLEX_TYPE
       && TREE_CODE (type) != VECTOR_TYPE)
     /* Simple scalar types always returned in registers.  */
@@ -6618,6 +6618,10 @@  aarch64_function_arg_alignment (machine_mode mode, const_tree type,
 
   gcc_assert (TYPE_MODE (type) == mode);
 
+  if (TREE_CODE (type) == BITINT_TYPE
+      && int_size_in_bytes (type) > 16)
+    return GET_MODE_ALIGNMENT (TImode);
+
   if (!AGGREGATE_TYPE_P (type))
     {
       /* The ABI alignment is the natural alignment of the type, without
@@ -21773,6 +21777,11 @@  aarch64_composite_type_p (const_tree type,
   if (type && (AGGREGATE_TYPE_P (type) || TREE_CODE (type) == COMPLEX_TYPE))
     return true;
 
+  if (type
+      && TREE_CODE (type) == BITINT_TYPE
+      && int_size_in_bytes (type) > 16)
+    return true;
+
   if (mode == BLKmode
       || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
       || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
@@ -28265,6 +28274,29 @@  aarch64_excess_precision (enum excess_precision_type type)
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Implement TARGET_C_BITINT_TYPE_INFO.
+   Return true if _BitInt(N) is supported and fill its details into *INFO.  */
+bool
+aarch64_bitint_type_info (int n, struct bitint_info *info)
+{
+  if (n <= 8)
+    info->limb_mode = QImode;
+  else if (n <= 16)
+    info->limb_mode = HImode;
+  else if (n <= 32)
+    info->limb_mode = SImode;
+  else
+    info->limb_mode = DImode;
+
+  if (n > 128)
+    info->abi_limb_mode = TImode;
+  else
+    info->abi_limb_mode = info->limb_mode;
+  info->big_endian = TARGET_BIG_END;
+  info->extended = false;
+  return true;
+}
+
 /* Implement TARGET_SCHED_CAN_SPECULATE_INSN.  Return true if INSN can be
    scheduled for speculative execution.  Reject the long-running division
    and square-root instructions.  */
@@ -30374,6 +30406,9 @@  aarch64_run_selftests (void)
 #undef TARGET_C_EXCESS_PRECISION
 #define TARGET_C_EXCESS_PRECISION aarch64_excess_precision
 
+#undef TARGET_C_BITINT_TYPE_INFO
+#define TARGET_C_BITINT_TYPE_INFO aarch64_bitint_type_info
+
 #undef  TARGET_EXPAND_BUILTIN
 #define TARGET_EXPAND_BUILTIN aarch64_expand_builtin
 
diff --git a/libgcc/config/aarch64/t-softfp b/libgcc/config/aarch64/t-softfp
index 2e32366f891361e2056c680b2e36edb1871c7670..4302ad52eb881825d0fb65b9ebd21031781781f5 100644
--- a/libgcc/config/aarch64/t-softfp
+++ b/libgcc/config/aarch64/t-softfp
@@ -4,7 +4,8 @@  softfp_extensions := sftf dftf hftf bfsf
 softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
 softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
-		 floatdibf floatundibf floattibf floatuntibf
+		 floatdibf floatundibf floattibf floatuntibf \
+		 fixtfbitint floatbitinttf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes