[RFC] aarch64: Add support for __BitInt
Checks
Context |
Check |
Description |
linaro-tcwg-bot/tcwg_gcc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 |
fail
|
Testing failed
|
Commit Message
Hi,
This patch is still work in progress, but posting to show failure with
bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's
because the idx (3) is out of range for the __BitInt(135) with a
limb_prec of 64.
I hacked gcc locally to work around this issue and still have one
outstanding failure, so will look to resolve that failure before posting
a new version.
Kind Regards,
Andre
Comments
On Wed, Jan 10, 2024 at 11:06 AM Andre Vieira (lists)
<andre.simoesdiasvieira@arm.com> wrote:
>
> Hi,
>
> This patch is still work in progress, but posting to show failure with
> bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's
> because the idx (3) is out of range for the __BitInt(135) with a
> limb_prec of 64.
Looks like the same issue can happen on x86_64, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113315 .
Thanks,
Andrew
>
> I hacked gcc locally to work around this issue and still have one
> outstanding failure, so will look to resolve that failure before posting
> a new version.
>
> Kind Regards,
> Andre
On Wed, Jan 10, 2024 at 07:05:39PM +0000, Andre Vieira (lists) wrote:
> This patch is still work in progress, but posting to show failure with
> bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's
> because the idx (3) is out of range for the __BitInt(135) with a limb_prec
> of 64.
I can reproduce it, will debug it momentarily.
> I hacked gcc locally to work around this issue and still have one
> outstanding failure, so will look to resolve that failure before posting a
> new version.
I see 2 issues in your patch:
1) as written earlier, the bitint lowering pass doesn't yet support big
endian, so I think at least temporarily we shouldn't pretend it works
until it is fixed;
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626847.html
mentions the big-endian issues
2) for n in [65, 128] I believe the aarch64 ABI says that _BitInt should be
treated the same as __int128; so for those cases we want limb_mode =
abi_limb_mode = TImode; those cases are just lowered to TImode
arithmetics, there is no attempt to treat those as arrays of smaller
limbs.
It is just the [129, 65535] cases where we want limb_mode DImode (for
efficiency and implementability) and abi_limb_mode TImode (for the ABI
reasons, alignment, sizing, ...), i.e. the cases where the ABI says
it is treated as an array of the 128-bit limbs
--- gcc/config/aarch64/aarch64.cc.jj 2024-01-11 09:37:28.000000000 +0100
+++ gcc/config/aarch64/aarch64.cc 2024-01-11 09:41:03.440202903 +0100
@@ -28279,12 +28279,20 @@ aarch64_excess_precision (enum excess_pr
bool
aarch64_bitint_type_info (int n, struct bitint_info *info)
{
+ /* For now, the middle-end bitint lowering doesn't handle big endian. */
+ if (TARGET_BIG_END)
+ return false;
+
if (n <= 8)
info->limb_mode = QImode;
else if (n <= 16)
info->limb_mode = HImode;
else if (n <= 32)
info->limb_mode = SImode;
+ else if (n <= 64)
+ info->limb_mode = DImode;
+ else if (n <= 128)
+ info->limb_mode = TImode;
else
info->limb_mode = DImode;
Jakub
On Thu, Jan 11, 2024 at 09:53:33AM +0100, Jakub Jelinek wrote:
> On Wed, Jan 10, 2024 at 07:05:39PM +0000, Andre Vieira (lists) wrote:
> > This patch is still work in progress, but posting to show failure with
> > bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's
> > because the idx (3) is out of range for the __BitInt(135) with a limb_prec
> > of 64.
>
> I can reproduce it, will debug it momentarily.
So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
BITINT_TYPEs to determine if it can be handled cheaply.
On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
if TYPE_SIZE is equal, it means it has the same number of limbs.
But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
both are 256-bit storage, but because DImode is used as limb_mode, the
former actually needs just 3 limbs, while the latter needs 4 limbs.
And limb_access_type was asserting that we don't try to access 4th limb
on types which actually have a precision which needs just 3 limbs.
The following patch (so far tested on x86_64 with all the bitint tests plus
on the bitint-7.c testcase in a cross to aarch64) should fix that.
Note, for the info.extended targets (currently none, but I think arm 32-bit
in the ABI is meant like that), we'll need to do something different,
because the upper bits aren't just padding and should be zero/sign extended,
so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
treat _BitInt(135) not as 5 SImode limbs, but 6. For !info.extended targets
I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.
2024-01-11 Jakub Jelinek <jakub@redhat.com>
* gimple-lower-bitint.cc (mergeable_op): Instead of comparing
TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
CEIL (TYPE_PRECISION (t), limb_prec).
(bitint_large_huge::handle_cast): Likewise.
--- gcc/gimple-lower-bitint.cc.jj 2024-01-08 13:58:21.448176859 +0100
+++ gcc/gimple-lower-bitint.cc 2024-01-11 11:46:49.147779946 +0100
@@ -231,7 +231,8 @@ mergeable_op (gimple *stmt)
&& TREE_CODE (rhs_type) == BITINT_TYPE
&& bitint_precision_kind (lhs_type) >= bitint_prec_large
&& bitint_precision_kind (rhs_type) >= bitint_prec_large
- && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type)))
+ && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
+ == CEIL (TYPE_PRECISION (rhs_type), limb_prec)))
{
if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type))
return true;
@@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs
if m_upwards_2limb * limb_prec is equal to
lhs precision that is not the case. */
|| (!m_var_msb
- && tree_int_cst_equal (TYPE_SIZE (rhs_type),
- TYPE_SIZE (lhs_type))
+ && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
+ == CEIL (TYPE_PRECISION (rhs_type), limb_prec))
&& (!m_upwards_2limb
|| (m_upwards_2limb * limb_prec
< TYPE_PRECISION (lhs_type)))))
Jakub
On Thu, Jan 11, 2024 at 12:12:59PM +0100, Jakub Jelinek wrote:
> So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
> BITINT_TYPEs to determine if it can be handled cheaply.
> On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
> if TYPE_SIZE is equal, it means it has the same number of limbs.
> But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
> both are 256-bit storage, but because DImode is used as limb_mode, the
> former actually needs just 3 limbs, while the latter needs 4 limbs.
> And limb_access_type was asserting that we don't try to access 4th limb
> on types which actually have a precision which needs just 3 limbs.
>
> The following patch (so far tested on x86_64 with all the bitint tests plus
> on the bitint-7.c testcase in a cross to aarch64) should fix that.
>
> Note, for the info.extended targets (currently none, but I think arm 32-bit
> in the ABI is meant like that), we'll need to do something different,
> because the upper bits aren't just padding and should be zero/sign extended,
> so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
> treat _BitInt(135) not as 5 SImode limbs, but 6. For !info.extended targets
> I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.
>
> 2024-01-11 Jakub Jelinek <jakub@redhat.com>
>
> * gimple-lower-bitint.cc (mergeable_op): Instead of comparing
> TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
> CEIL (TYPE_PRECISION (t), limb_prec).
> (bitint_large_huge::handle_cast): Likewise.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> --- gcc/gimple-lower-bitint.cc.jj 2024-01-08 13:58:21.448176859 +0100
> +++ gcc/gimple-lower-bitint.cc 2024-01-11 11:46:49.147779946 +0100
> @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt)
> && TREE_CODE (rhs_type) == BITINT_TYPE
> && bitint_precision_kind (lhs_type) >= bitint_prec_large
> && bitint_precision_kind (rhs_type) >= bitint_prec_large
> - && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type)))
> + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> + == CEIL (TYPE_PRECISION (rhs_type), limb_prec)))
> {
> if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type))
> return true;
> @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs
> if m_upwards_2limb * limb_prec is equal to
> lhs precision that is not the case. */
> || (!m_var_msb
> - && tree_int_cst_equal (TYPE_SIZE (rhs_type),
> - TYPE_SIZE (lhs_type))
> + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> + == CEIL (TYPE_PRECISION (rhs_type), limb_prec))
> && (!m_upwards_2limb
> || (m_upwards_2limb * limb_prec
> < TYPE_PRECISION (lhs_type)))))
>
> Jakub
Jakub
On Fri, 12 Jan 2024, Jakub Jelinek wrote:
> On Thu, Jan 11, 2024 at 12:12:59PM +0100, Jakub Jelinek wrote:
> > So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge
> > BITINT_TYPEs to determine if it can be handled cheaply.
> > On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine,
> > if TYPE_SIZE is equal, it means it has the same number of limbs.
> > But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same,
> > both are 256-bit storage, but because DImode is used as limb_mode, the
> > former actually needs just 3 limbs, while the latter needs 4 limbs.
> > And limb_access_type was asserting that we don't try to access 4th limb
> > on types which actually have a precision which needs just 3 limbs.
> >
> > The following patch (so far tested on x86_64 with all the bitint tests plus
> > on the bitint-7.c testcase in a cross to aarch64) should fix that.
> >
> > Note, for the info.extended targets (currently none, but I think arm 32-bit
> > in the ABI is meant like that), we'll need to do something different,
> > because the upper bits aren't just padding and should be zero/sign extended,
> > so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to
> > treat _BitInt(135) not as 5 SImode limbs, but 6. For !info.extended targets
> > I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine.
> >
> > 2024-01-11 Jakub Jelinek <jakub@redhat.com>
> >
> > * gimple-lower-bitint.cc (mergeable_op): Instead of comparing
> > TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare
> > CEIL (TYPE_PRECISION (t), limb_prec).
> > (bitint_large_huge::handle_cast): Likewise.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK
> > --- gcc/gimple-lower-bitint.cc.jj 2024-01-08 13:58:21.448176859 +0100
> > +++ gcc/gimple-lower-bitint.cc 2024-01-11 11:46:49.147779946 +0100
> > @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt)
> > && TREE_CODE (rhs_type) == BITINT_TYPE
> > && bitint_precision_kind (lhs_type) >= bitint_prec_large
> > && bitint_precision_kind (rhs_type) >= bitint_prec_large
> > - && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type)))
> > + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> > + == CEIL (TYPE_PRECISION (rhs_type), limb_prec)))
> > {
> > if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type))
> > return true;
> > @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs
> > if m_upwards_2limb * limb_prec is equal to
> > lhs precision that is not the case. */
> > || (!m_var_msb
> > - && tree_int_cst_equal (TYPE_SIZE (rhs_type),
> > - TYPE_SIZE (lhs_type))
> > + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec)
> > + == CEIL (TYPE_PRECISION (rhs_type), limb_prec))
> > && (!m_upwards_2limb
> > || (m_upwards_2limb * limb_prec
> > < TYPE_PRECISION (lhs_type)))))
> >
> > Jakub
>
> Jakub
>
>
@@ -6534,7 +6534,7 @@ aarch64_return_in_memory_1 (const_tree type)
machine_mode ag_mode;
int count;
- if (!AGGREGATE_TYPE_P (type)
+ if (!(AGGREGATE_TYPE_P (type) || TREE_CODE (type) == BITINT_TYPE)
&& TREE_CODE (type) != COMPLEX_TYPE
&& TREE_CODE (type) != VECTOR_TYPE)
/* Simple scalar types always returned in registers. */
@@ -6618,6 +6618,10 @@ aarch64_function_arg_alignment (machine_mode mode, const_tree type,
gcc_assert (TYPE_MODE (type) == mode);
+ if (TREE_CODE (type) == BITINT_TYPE
+ && int_size_in_bytes (type) > 16)
+ return GET_MODE_ALIGNMENT (TImode);
+
if (!AGGREGATE_TYPE_P (type))
{
/* The ABI alignment is the natural alignment of the type, without
@@ -21773,6 +21777,11 @@ aarch64_composite_type_p (const_tree type,
if (type && (AGGREGATE_TYPE_P (type) || TREE_CODE (type) == COMPLEX_TYPE))
return true;
+ if (type
+ && TREE_CODE (type) == BITINT_TYPE
+ && int_size_in_bytes (type) > 16)
+ return true;
+
if (mode == BLKmode
|| GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
|| GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
@@ -28265,6 +28274,29 @@ aarch64_excess_precision (enum excess_precision_type type)
return FLT_EVAL_METHOD_UNPREDICTABLE;
}
+/* Implement TARGET_C_BITINT_TYPE_INFO.
+ Return true if _BitInt(N) is supported and fill its details into *INFO. */
+bool
+aarch64_bitint_type_info (int n, struct bitint_info *info)
+{
+ if (n <= 8)
+ info->limb_mode = QImode;
+ else if (n <= 16)
+ info->limb_mode = HImode;
+ else if (n <= 32)
+ info->limb_mode = SImode;
+ else
+ info->limb_mode = DImode;
+
+ if (n > 128)
+ info->abi_limb_mode = TImode;
+ else
+ info->abi_limb_mode = info->limb_mode;
+ info->big_endian = TARGET_BIG_END;
+ info->extended = false;
+ return true;
+}
+
/* Implement TARGET_SCHED_CAN_SPECULATE_INSN. Return true if INSN can be
scheduled for speculative execution. Reject the long-running division
and square-root instructions. */
@@ -30374,6 +30406,9 @@ aarch64_run_selftests (void)
#undef TARGET_C_EXCESS_PRECISION
#define TARGET_C_EXCESS_PRECISION aarch64_excess_precision
+#undef TARGET_C_BITINT_TYPE_INFO
+#define TARGET_C_BITINT_TYPE_INFO aarch64_bitint_type_info
+
#undef TARGET_EXPAND_BUILTIN
#define TARGET_EXPAND_BUILTIN aarch64_expand_builtin
@@ -4,7 +4,8 @@ softfp_extensions := sftf dftf hftf bfsf
softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
softfp_exclude_libgcc2 := n
softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
- floatdibf floatundibf floattibf floatuntibf
+ floatdibf floatundibf floattibf floatuntibf \
+ fixtfbitint floatbitinttf
TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes