match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]
Checks
Context 
Check 
Description 
linarotcwgbot/tcwg_gcc_buildmasterarm 
success

Testing passed

linarotcwgbot/tcwg_gcc_buildmasteraarch64 
success

Testing passed

linarotcwgbot/tcwg_gcc_checkmasterarm 
success

Testing passed

Commit Message
Hi!
The following patch attempts to fix the (view_convert (convert@0 @1))
optimization. If TREE_TYPE (@0) is a _BitInt type with padding bits
and @0 has the same precision as @1 and it has a different sign
and _BitInt with padding bits are extended on the target (x86 doesn't,
aarch64 doesn't, but arm plans to do that), then optimizing it to
just (view_convert @1) is wrong, the padding bits wouldn't be what
it should be.
E.g. bitint64.c test with O2 has
_5 = (unsigned _BitInt(5)) _4;
_7 = (unsigned _BitInt(5)) e.0_1;
_8 = _5 + _7;
_9 = (_BitInt(5)) _8;
_10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
and forwprop1 changes that to just
_5 = (unsigned _BitInt(5)) _4;
_7 = (unsigned _BitInt(5)) e.0_1;
_8 = _5 + _7;
_10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
The former makes the padding bits well defined (at least on arm in
the future), while the latter has those bits zero extended.
Bootstrapped/regtested on x86_64linux and i686linux, ok for trunk?
20240326 Jakub Jelinek <jakub@redhat.com>
PR treeoptimization/114469
* match.pd ((view_convert (convert@0 @1))): Don't optimize if
TREE_TYPE (@0) is a BITINT_TYPE with padding bits which are supposed
to be extended by the ABI.
Jakub
Comments
On Wed, 27 Mar 2024, Jakub Jelinek wrote:
> Hi!
>
> The following patch attempts to fix the (view_convert (convert@0 @1))
> optimization. If TREE_TYPE (@0) is a _BitInt type with padding bits
> and @0 has the same precision as @1 and it has a different sign
> and _BitInt with padding bits are extended on the target (x86 doesn't,
> aarch64 doesn't, but arm plans to do that), then optimizing it to
> just (view_convert @1) is wrong, the padding bits wouldn't be what
> it should be.
> E.g. bitint64.c test with O2 has
> _5 = (unsigned _BitInt(5)) _4;
> _7 = (unsigned _BitInt(5)) e.0_1;
> _8 = _5 + _7;
> _9 = (_BitInt(5)) _8;
> _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
> and forwprop1 changes that to just
> _5 = (unsigned _BitInt(5)) _4;
> _7 = (unsigned _BitInt(5)) e.0_1;
> _8 = _5 + _7;
> _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
> The former makes the padding bits well defined (at least on arm in
> the future), while the latter has those bits zero extended.
So I think the existing rule, when extended to
 (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE
(@1))
&& TYPE_UNSIGNED (TREE_TYPE (@1))
assumes padding is extended according to the signedness. But the
original
&& (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
(@1))
rule has the padding undefined. I think that's inconsistent at least.
I'll note that we do not constrain 'type' so the V_C_E could
reinterpret the integer (with or without padding) as floatingpoint.
Given the previous
/* For integral conversions with the same precision or pointer
conversions use a NOP_EXPR instead. */
(simplify
(view_convert @0)
should match first we should only ever see noninteger/pointer here
or cases where the V_C_E looks at padding. IMO we should change
this to
&& ((TYPE_PRECISION (TREE_TYPE (@0)) >= TYPE_PRECISION (TREE_TYPE
(@1))
&& TYPE_SIGN (TREE_TYPE (@0)) == TYPE_SIGN (TREE_TYPE (@1))))
 (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE
(@1))
&& TYPE_UNSIGNED (TREE_TYPE (@1))))
? Having this inconsistent treatment of padding is bad.
In your case the precision is equal so this should fix it, right?
Not sure if it's worth adding a INTEGERAL_TYPE_P (type) case with
TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1)), see the
previous pattern.
> Bootstrapped/regtested on x86_64linux and i686linux, ok for trunk?
>
> 20240326 Jakub Jelinek <jakub@redhat.com>
>
> PR treeoptimization/114469
> * match.pd ((view_convert (convert@0 @1))): Don't optimize if
> TREE_TYPE (@0) is a BITINT_TYPE with padding bits which are supposed
> to be extended by the ABI.
>
>  gcc/match.pd.jj 20240315 11:04:24.672914747 +0100
> +++ gcc/match.pd 20240326 15:49:44.177864509 +0100
> @@ 4699,13 +4699,38 @@ (define_operator_list SYNC_FETCH_AND_AND
> zeroextend while keeping the same size (for booltochar). */
> (simplify
> (view_convert (convert@0 @1))
>  (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0))  POINTER_TYPE_P (TREE_TYPE (@0)))
>  && (INTEGRAL_TYPE_P (TREE_TYPE (@1))  POINTER_TYPE_P (TREE_TYPE (@1)))
>  && TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1))
>  && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
>   (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
>  && TYPE_UNSIGNED (TREE_TYPE (@1)))))
>  (view_convert @1)))
> + (with { tree type0 = TREE_TYPE (@0);
> + tree type1 = TREE_TYPE (@1);
> + bool ok = false;
> + if ((INTEGRAL_TYPE_P (type0)  POINTER_TYPE_P (type0))
> + && (INTEGRAL_TYPE_P (type1)  POINTER_TYPE_P (type1))
> + && TYPE_SIZE (type0) == TYPE_SIZE (type1))
> + {
> + if (TYPE_PRECISION (type0) == TYPE_PRECISION (type1))
> + {
> + if (TREE_CODE (type0) != BITINT_TYPE)
> + ok = true;
> + else
> + {
> + /* Avoid optimizing this if type0 is a _BitInt
> + type with padding bits which are supposed to be
> + extended. */
> + struct bitint_info info;
> + targetm.c.bitint_type_info (TYPE_PRECISION (type0),
> + &info);
> + if (!info.extended)
> + ok = true;
> + else
> + ok = (TYPE_PRECISION (type0)
> + == tree_to_uhwi (TYPE_SIZE (type0)));
> + }
> + }
> + else if (TYPE_PRECISION (type0) > TYPE_PRECISION (type1)
> + && TYPE_UNSIGNED (type1))
> + ok = true;
> + } }
> + (if (ok)
> + (view_convert @1))))
>
> /* Simplify a viewconverted empty or singleelement constructor. */
> (simplify
>
> Jakub
>
>
On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > The following patch attempts to fix the (view_convert (convert@0 @1))
> > optimization. If TREE_TYPE (@0) is a _BitInt type with padding bits
> > and @0 has the same precision as @1 and it has a different sign
> > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > it should be.
> > E.g. bitint64.c test with O2 has
> > _5 = (unsigned _BitInt(5)) _4;
> > _7 = (unsigned _BitInt(5)) e.0_1;
> > _8 = _5 + _7;
> > _9 = (_BitInt(5)) _8;
> > _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
> > and forwprop1 changes that to just
> > _5 = (unsigned _BitInt(5)) _4;
> > _7 = (unsigned _BitInt(5)) e.0_1;
> > _8 = _5 + _7;
> > _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
> > The former makes the padding bits well defined (at least on arm in
> > the future), while the latter has those bits zero extended.
>
> So I think the existing rule, when extended to
>
>  (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE
> (@1))
> && TYPE_UNSIGNED (TREE_TYPE (@1))
>
> assumes padding is extended according to the signedness. But the
> original
>
> && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
> (@1))
>
> rule has the padding undefined. I think that's inconsistent at least.
The whole simplification is just weird, all it tests are the precisions
of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).
I've tried to construct a non_BitInt testcase with
struct S { signed long long a : 37; unsigned long long b : 37; } s;
unsigned long long
foo (long long x, long long y)
{
__typeof (s.a + 0) c = x, d = y;
__typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
union U { __typeof (s.b + 0) f; unsigned long long g; } u;
u.f = e;
return u.g;
}
unsigned long long
bar (unsigned long long x, unsigned long long y)
{
__typeof (s.b + 0) c = x, d = y;
__typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
union U { __typeof (s.a + 0) f; unsigned long long g; } u;
u.f = e;
return u.g;
}
int
main ()
{
unsigned long long x = foo (53245722422LL, 5719971797LL);
unsigned long long y = bar (136917222900, 5719971797LL);
__builtin_printf ("%016llx %016llx\n", x, y);
return 0;
}
which seems to print
00000012455ee8f5 0000000135d6ddc9
at O0 and
fffffff2455ee8f5 0000000135d6ddc9
at O2/O3, dunno if in that case we consider those padding bits
uninitialized/anything can happen, then it is fine, or something else.
> I'll note that we do not constrain 'type' so the V_C_E could
> reinterpret the integer (with or without padding) as floatingpoint.
Anyway, if we have the partial int types with undefined padding bits
(the above __typeof of a bitfield type or _BitInt on x86/aarch64),
I guess the precision0 == precision1 case is ok, we have
say 3 padding bits before/after, a valid program better should mask
away those bits or it will be UB.
If precision0 == precision1 and type1 has the extended padding bits
and type0 has undefined padding bits, then I guess it is also ok.
Other cases of same precision are not ok, turning well defined bits
into undefined or changing sign extended to zero extended or vice versa.
For the precision0 > precision1 && unsigned1 case on the other side,
I don't see how it is ever correct for the undefined padding bits cases,
the padding bits above type0 are undefined before/after the simplification,
but the bits in between are changed from previously zero to undefined.
For the precision0 > precision1 cases if both type0 and type1 are
extended like _BitInt on arm, then the simplification is reasonable,
all the padding bits above type0 are zeros and all the padding bits between
type1 and type0 precision are zeros too, before and after.
So, indeed my patch isn't correct.
And because we don't have the _BitInt arm support for GCC 14, perhaps
we could defer that part for GCC 15, though I wonder if we shouldn't just
kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
&& TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong
for the types with undefined padding bits (everything on trunk right now).
If I have
struct S { signed long long a : 42; unsigned long long b : 37; } s;
__typeof (s.b) v1 = ...;
__typeof (s.a) v2;
unsigned long long v3;
v2 = v1; // This zero extends the 5 bits
v3 = VCE <unsigned long long> (v2);
return v3 & 0x3ffffffffffULL;
then without the folding it used to be valid, those 5 bits were zeros,
with the optimization it is not. Similarly to VCE to struct or double
and the like. And similarly for unsigned _BitInt(37)/ signed _BitInt(42)
on x86/aarch64.
Jakub
On Wed, 27 Mar 2024, Jakub Jelinek wrote:
> On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > > The following patch attempts to fix the (view_convert (convert@0 @1))
> > > optimization. If TREE_TYPE (@0) is a _BitInt type with padding bits
> > > and @0 has the same precision as @1 and it has a different sign
> > > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > > it should be.
> > > E.g. bitint64.c test with O2 has
> > > _5 = (unsigned _BitInt(5)) _4;
> > > _7 = (unsigned _BitInt(5)) e.0_1;
> > > _8 = _5 + _7;
> > > _9 = (_BitInt(5)) _8;
> > > _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
> > > and forwprop1 changes that to just
> > > _5 = (unsigned _BitInt(5)) _4;
> > > _7 = (unsigned _BitInt(5)) e.0_1;
> > > _8 = _5 + _7;
> > > _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
> > > The former makes the padding bits well defined (at least on arm in
> > > the future), while the latter has those bits zero extended.
> >
> > So I think the existing rule, when extended to
> >
> >  (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE
> > (@1))
> > && TYPE_UNSIGNED (TREE_TYPE (@1))
> >
> > assumes padding is extended according to the signedness. But the
> > original
> >
> > && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
> > (@1))
> >
> > rule has the padding undefined. I think that's inconsistent at least.
>
> The whole simplification is just weird, all it tests are the precisions
> of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
> are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).
>
> I've tried to construct a non_BitInt testcase with
> struct S { signed long long a : 37; unsigned long long b : 37; } s;
>
> unsigned long long
> foo (long long x, long long y)
> {
> __typeof (s.a + 0) c = x, d = y;
> __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
> union U { __typeof (s.b + 0) f; unsigned long long g; } u;
> u.f = e;
> return u.g;
> }
>
> unsigned long long
> bar (unsigned long long x, unsigned long long y)
> {
> __typeof (s.b + 0) c = x, d = y;
> __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
> union U { __typeof (s.a + 0) f; unsigned long long g; } u;
> u.f = e;
> return u.g;
> }
>
> int
> main ()
> {
> unsigned long long x = foo (53245722422LL, 5719971797LL);
> unsigned long long y = bar (136917222900, 5719971797LL);
> __builtin_printf ("%016llx %016llx\n", x, y);
> return 0;
> }
> which seems to print
> 00000012455ee8f5 0000000135d6ddc9
> at O0 and
> fffffff2455ee8f5 0000000135d6ddc9
> at O2/O3, dunno if in that case we consider those padding bits
> uninitialized/anything can happen, then it is fine, or something else.
>
> > I'll note that we do not constrain 'type' so the V_C_E could
> > reinterpret the integer (with or without padding) as floatingpoint.
>
> Anyway, if we have the partial int types with undefined padding bits
> (the above __typeof of a bitfield type or _BitInt on x86/aarch64),
> I guess the precision0 == precision1 case is ok, we have
> say 3 padding bits before/after, a valid program better should mask
> away those bits or it will be UB.
> If precision0 == precision1 and type1 has the extended padding bits
> and type0 has undefined padding bits, then I guess it is also ok.
> Other cases of same precision are not ok, turning well defined bits
> into undefined or changing sign extended to zero extended or vice versa.
>
> For the precision0 > precision1 && unsigned1 case on the other side,
> I don't see how it is ever correct for the undefined padding bits cases,
> the padding bits above type0 are undefined before/after the simplification,
> but the bits in between are changed from previously zero to undefined.
>
> For the precision0 > precision1 cases if both type0 and type1 are
> extended like _BitInt on arm, then the simplification is reasonable,
> all the padding bits above type0 are zeros and all the padding bits between
> type1 and type0 precision are zeros too, before and after.
>
> So, indeed my patch isn't correct.
>
> And because we don't have the _BitInt arm support for GCC 14, perhaps
> we could defer that part for GCC 15, though I wonder if we shouldn't just
> kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
> && TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong
> for the types with undefined padding bits (everything on trunk right now).
The comment says it was added for (char)_Bool, probably
gcc.dg/treessa/vce1.c will fail. But I'm not sure this optimization
is important, it seems early SRA introduces a V_C_E here and then
phiopt the conversion to unsigned char. But this is a V_C_E of
_Bool > unsigned char > _Bool. We seem to apply both this
pattern and then drop the V_C_E.
Maybe we can reasonably optimize the case where type == TREE_TYPE (@1)
and the intermediate precision is bigger or equal?
Richard.
> If I have
> struct S { signed long long a : 42; unsigned long long b : 37; } s;
> __typeof (s.b) v1 = ...;
> __typeof (s.a) v2;
> unsigned long long v3;
> v2 = v1; // This zero extends the 5 bits
> v3 = VCE <unsigned long long> (v2);
> return v3 & 0x3ffffffffffULL;
> then without the folding it used to be valid, those 5 bits were zeros,
> with the optimization it is not. Similarly to VCE to struct or double
> and the like. And similarly for unsigned _BitInt(37)/ signed _BitInt(42)
> on x86/aarch64.
>
> Jakub
>
>
On Wed, Mar 27, 2024 at 01:42:03PM +0100, Richard Biener wrote:
> The comment says it was added for (char)_Bool, probably
> gcc.dg/treessa/vce1.c will fail. But I'm not sure this optimization
> is important, it seems early SRA introduces a V_C_E here and then
> phiopt the conversion to unsigned char. But this is a V_C_E of
> _Bool > unsigned char > _Bool. We seem to apply both this
> pattern and then drop the V_C_E.
>
> Maybe we can reasonably optimize the case where type == TREE_TYPE (@1)
> and the intermediate precision is bigger or equal?
If type is INTEGRAL_TYPE_P and has TYPE_PRECISION <= that of TREE_TYPE (@1),
sure, it can be optimized whenever TREE_TYPE (@0) has precision >= that,
no matter what TYPE_SIGN TREE_TYPE (@1) has.
Anyway, will try to cook up something.
Jakub
@@ 4699,13 +4699,38 @@ (define_operator_list SYNC_FETCH_AND_AND
zeroextend while keeping the same size (for booltochar). */
(simplify
(view_convert (convert@0 @1))
 (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0))  POINTER_TYPE_P (TREE_TYPE (@0)))
 && (INTEGRAL_TYPE_P (TREE_TYPE (@1))  POINTER_TYPE_P (TREE_TYPE (@1)))
 && TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1))
 && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
  (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
 && TYPE_UNSIGNED (TREE_TYPE (@1)))))
 (view_convert @1)))
+ (with { tree type0 = TREE_TYPE (@0);
+ tree type1 = TREE_TYPE (@1);
+ bool ok = false;
+ if ((INTEGRAL_TYPE_P (type0)  POINTER_TYPE_P (type0))
+ && (INTEGRAL_TYPE_P (type1)  POINTER_TYPE_P (type1))
+ && TYPE_SIZE (type0) == TYPE_SIZE (type1))
+ {
+ if (TYPE_PRECISION (type0) == TYPE_PRECISION (type1))
+ {
+ if (TREE_CODE (type0) != BITINT_TYPE)
+ ok = true;
+ else
+ {
+ /* Avoid optimizing this if type0 is a _BitInt
+ type with padding bits which are supposed to be
+ extended. */
+ struct bitint_info info;
+ targetm.c.bitint_type_info (TYPE_PRECISION (type0),
+ &info);
+ if (!info.extended)
+ ok = true;
+ else
+ ok = (TYPE_PRECISION (type0)
+ == tree_to_uhwi (TYPE_SIZE (type0)));
+ }
+ }
+ else if (TYPE_PRECISION (type0) > TYPE_PRECISION (type1)
+ && TYPE_UNSIGNED (type1))
+ ok = true;
+ } }
+ (if (ok)
+ (view_convert @1))))
/* Simplify a viewconverted empty or singleelement constructor. */
(simplify