match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]

Message ID ZgP2/cAVfcq8G9up@tucnak
State New
Headers
Series match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed

Commit Message

Jakub Jelinek March 27, 2024, 10:37 a.m. UTC
  Hi!

The following patch attempts to fix the (view_convert (convert@0 @1))
optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
and @0 has the same precision as @1 and it has a different sign
and _BitInt with padding bits are extended on the target (x86 doesn't,
aarch64 doesn't, but arm plans to do that), then optimizing it to
just (view_convert @1) is wrong, the padding bits wouldn't be what
it should be.
E.g. bitint-64.c test with -O2 has
  _5 = (unsigned _BitInt(5)) _4;
  _7 = (unsigned _BitInt(5)) e.0_1;
  _8 = _5 + _7;
  _9 = (_BitInt(5)) _8;
  _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
and forwprop1 changes that to just
  _5 = (unsigned _BitInt(5)) _4;
  _7 = (unsigned _BitInt(5)) e.0_1;
  _8 = _5 + _7;
  _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
The former makes the padding bits well defined (at least on arm in
the future), while the latter has those bits zero extended.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-26  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/114469
	* match.pd ((view_convert (convert@0 @1))): Don't optimize if
	TREE_TYPE (@0) is a BITINT_TYPE with padding bits which are supposed
	to be extended by the ABI.


	Jakub
  

Comments

Richard Biener March 27, 2024, 11:48 a.m. UTC | #1
On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following patch attempts to fix the (view_convert (convert@0 @1))
> optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> and @0 has the same precision as @1 and it has a different sign
> and _BitInt with padding bits are extended on the target (x86 doesn't,
> aarch64 doesn't, but arm plans to do that), then optimizing it to
> just (view_convert @1) is wrong, the padding bits wouldn't be what
> it should be.
> E.g. bitint-64.c test with -O2 has
>   _5 = (unsigned _BitInt(5)) _4;
>   _7 = (unsigned _BitInt(5)) e.0_1;
>   _8 = _5 + _7;
>   _9 = (_BitInt(5)) _8;
>   _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
> and forwprop1 changes that to just
>   _5 = (unsigned _BitInt(5)) _4;
>   _7 = (unsigned _BitInt(5)) e.0_1;
>   _8 = _5 + _7;
>   _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
> The former makes the padding bits well defined (at least on arm in
> the future), while the latter has those bits zero extended.

So I think the existing rule, when extended to

           || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
(@1))
               && TYPE_UNSIGNED (TREE_TYPE (@1))

assumes padding is extended according to the signedness.  But the
original

       && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
(@1))

rule has the padding undefined.  I think that's inconsistent at least.
I'll note that we do not constrain 'type' so the V_C_E could
re-interpret the integer (with or without padding) as floating-point.

Given the previous

/* For integral conversions with the same precision or pointer
   conversions use a NOP_EXPR instead.  */
(simplify 
  (view_convert @0)

should match first we should only ever see non-integer/pointer here
or cases where the V_C_E looks at padding.  IMO we should change
this to

       && ((TYPE_PRECISION (TREE_TYPE (@0)) >= TYPE_PRECISION (TREE_TYPE 
(@1))
            && TYPE_SIGN (TREE_TYPE (@0)) == TYPE_SIGN (TREE_TYPE (@1))))
           || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
(@1))
               && TYPE_UNSIGNED (TREE_TYPE (@1))))

?  Having this inconsistent treatment of padding is bad.

In your case the precision is equal so this should fix it, right?
Not sure if it's worth adding a INTEGERAL_TYPE_P (type) case with
TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1)), see the
previous pattern.


> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-03-26  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/114469
> 	* match.pd ((view_convert (convert@0 @1))): Don't optimize if
> 	TREE_TYPE (@0) is a BITINT_TYPE with padding bits which are supposed
> 	to be extended by the ABI.
> 
> --- gcc/match.pd.jj	2024-03-15 11:04:24.672914747 +0100
> +++ gcc/match.pd	2024-03-26 15:49:44.177864509 +0100
> @@ -4699,13 +4699,38 @@ (define_operator_list SYNC_FETCH_AND_AND
>     zero-extend while keeping the same size (for bool-to-char).  */
>  (simplify
>    (view_convert (convert@0 @1))
> -  (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> -       && (INTEGRAL_TYPE_P (TREE_TYPE (@1)) || POINTER_TYPE_P (TREE_TYPE (@1)))
> -       && TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1))
> -       && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
> -	   || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
> -	       && TYPE_UNSIGNED (TREE_TYPE (@1)))))
> -   (view_convert @1)))
> +  (with { tree type0 = TREE_TYPE (@0);
> +	  tree type1 = TREE_TYPE (@1);
> +	  bool ok = false;
> +	  if ((INTEGRAL_TYPE_P (type0) || POINTER_TYPE_P (type0))
> +	      && (INTEGRAL_TYPE_P (type1) || POINTER_TYPE_P (type1))
> +	      && TYPE_SIZE (type0) == TYPE_SIZE (type1))
> +	    {
> +	      if (TYPE_PRECISION (type0) == TYPE_PRECISION (type1))
> +		{
> +		  if (TREE_CODE (type0) != BITINT_TYPE)
> +		    ok = true;
> +		  else
> +		    {
> +		      /* Avoid optimizing this if type0 is a _BitInt
> +			 type with padding bits which are supposed to be
> +			 extended.  */
> +		      struct bitint_info info;
> +		      targetm.c.bitint_type_info (TYPE_PRECISION (type0),
> +						  &info);
> +		      if (!info.extended)
> +			ok = true;
> +		      else
> +			ok = (TYPE_PRECISION (type0)
> +			      == tree_to_uhwi (TYPE_SIZE (type0)));
> +		    }
> +		}
> +	      else if (TYPE_PRECISION (type0) > TYPE_PRECISION (type1)
> +		       && TYPE_UNSIGNED (type1))
> +		ok = true;
> +	    } }
> +   (if (ok)
> +    (view_convert @1))))
>  
>  /* Simplify a view-converted empty or single-element constructor.  */
>  (simplify
> 
> 	Jakub
> 
>
  
Jakub Jelinek March 27, 2024, 12:18 p.m. UTC | #2
On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > The following patch attempts to fix the (view_convert (convert@0 @1))
> > optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> > and @0 has the same precision as @1 and it has a different sign
> > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > it should be.
> > E.g. bitint-64.c test with -O2 has
> >   _5 = (unsigned _BitInt(5)) _4;
> >   _7 = (unsigned _BitInt(5)) e.0_1;
> >   _8 = _5 + _7;
> >   _9 = (_BitInt(5)) _8;
> >   _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
> > and forwprop1 changes that to just
> >   _5 = (unsigned _BitInt(5)) _4;
> >   _7 = (unsigned _BitInt(5)) e.0_1;
> >   _8 = _5 + _7;
> >   _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
> > The former makes the padding bits well defined (at least on arm in
> > the future), while the latter has those bits zero extended.
> 
> So I think the existing rule, when extended to
> 
>            || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
> (@1))
>                && TYPE_UNSIGNED (TREE_TYPE (@1))
> 
> assumes padding is extended according to the signedness.  But the
> original
> 
>        && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
> (@1))
> 
> rule has the padding undefined.  I think that's inconsistent at least.

The whole simplification is just weird, all it tests are the precisions
of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).

I've tried to construct a non-_BitInt testcase with
struct S { signed long long a : 37; unsigned long long b : 37; } s;

unsigned long long
foo (long long x, long long y)
{
  __typeof (s.a + 0) c = x, d = y;
  __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
  union U { __typeof (s.b + 0) f; unsigned long long g; } u;
  u.f = e;
  return u.g;
}

unsigned long long
bar (unsigned long long x, unsigned long long y)
{
  __typeof (s.b + 0) c = x, d = y;
  __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
  union U { __typeof (s.a + 0) f; unsigned long long g; } u;
  u.f = e;
  return u.g;
}

int
main ()
{
  unsigned long long x = foo (-53245722422LL, -5719971797LL);
  unsigned long long y = bar (136917222900, 5719971797LL);
  __builtin_printf ("%016llx %016llx\n", x, y);
  return 0;
}
which seems to print
00000012455ee8f5 0000000135d6ddc9
at -O0 and
fffffff2455ee8f5 0000000135d6ddc9
at -O2/-O3, dunno if in that case we consider those padding bits
uninitialized/anything can happen, then it is fine, or something else.

> I'll note that we do not constrain 'type' so the V_C_E could
> re-interpret the integer (with or without padding) as floating-point.

Anyway, if we have the partial int types with undefined padding bits
(the above __typeof of a bitfield type or _BitInt on x86/aarch64),
I guess the precision0 == precision1 case is ok, we have
say 3 padding bits before/after, a valid program better should mask
away those bits or it will be UB.
If precision0 == precision1 and type1 has the extended padding bits
and type0 has undefined padding bits, then I guess it is also ok.
Other cases of same precision are not ok, turning well defined bits
into undefined or changing sign extended to zero extended or vice versa.

For the precision0 > precision1 && unsigned1 case on the other side,
I don't see how it is ever correct for the undefined padding bits cases,
the padding bits above type0 are undefined before/after the simplification,
but the bits in between are changed from previously zero to undefined.

For the precision0 > precision1 cases if both type0 and type1 are
extended like _BitInt on arm, then the simplification is reasonable,
all the padding bits above type0 are zeros and all the padding bits between
type1 and type0 precision are zeros too, before and after.

So, indeed my patch isn't correct.

And because we don't have the _BitInt arm support for GCC 14, perhaps
we could defer that part for GCC 15, though I wonder if we shouldn't just
kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
&& TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong
for the types with undefined padding bits (everything on trunk right now).

If I have
struct S { signed long long a : 42; unsigned long long b : 37; } s;
__typeof (s.b) v1 = ...;
__typeof (s.a) v2;
unsigned long long v3;
v2 = v1; // This zero extends the 5 bits
v3 = VCE <unsigned long long> (v2);
return v3 & 0x3ffffffffffULL;
then without the folding it used to be valid, those 5 bits were zeros,
with the optimization it is not.  Similarly to VCE to struct or double
and the like.  And similarly for unsigned _BitInt(37)/ signed _BitInt(42)
on x86/aarch64.

	Jakub
  
Richard Biener March 27, 2024, 12:42 p.m. UTC | #3
On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > > The following patch attempts to fix the (view_convert (convert@0 @1))
> > > optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> > > and @0 has the same precision as @1 and it has a different sign
> > > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > > it should be.
> > > E.g. bitint-64.c test with -O2 has
> > >   _5 = (unsigned _BitInt(5)) _4;
> > >   _7 = (unsigned _BitInt(5)) e.0_1;
> > >   _8 = _5 + _7;
> > >   _9 = (_BitInt(5)) _8;
> > >   _10 = VIEW_CONVERT_EXPR<unsigned char>(_9);
> > > and forwprop1 changes that to just
> > >   _5 = (unsigned _BitInt(5)) _4;
> > >   _7 = (unsigned _BitInt(5)) e.0_1;
> > >   _8 = _5 + _7;
> > >   _10 = VIEW_CONVERT_EXPR<unsigned char>(_8);
> > > The former makes the padding bits well defined (at least on arm in
> > > the future), while the latter has those bits zero extended.
> > 
> > So I think the existing rule, when extended to
> > 
> >            || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
> > (@1))
> >                && TYPE_UNSIGNED (TREE_TYPE (@1))
> > 
> > assumes padding is extended according to the signedness.  But the
> > original
> > 
> >        && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
> > (@1))
> > 
> > rule has the padding undefined.  I think that's inconsistent at least.
> 
> The whole simplification is just weird, all it tests are the precisions
> of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
> are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).
> 
> I've tried to construct a non-_BitInt testcase with
> struct S { signed long long a : 37; unsigned long long b : 37; } s;
> 
> unsigned long long
> foo (long long x, long long y)
> {
>   __typeof (s.a + 0) c = x, d = y;
>   __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
>   union U { __typeof (s.b + 0) f; unsigned long long g; } u;
>   u.f = e;
>   return u.g;
> }
> 
> unsigned long long
> bar (unsigned long long x, unsigned long long y)
> {
>   __typeof (s.b + 0) c = x, d = y;
>   __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
>   union U { __typeof (s.a + 0) f; unsigned long long g; } u;
>   u.f = e;
>   return u.g;
> }
> 
> int
> main ()
> {
>   unsigned long long x = foo (-53245722422LL, -5719971797LL);
>   unsigned long long y = bar (136917222900, 5719971797LL);
>   __builtin_printf ("%016llx %016llx\n", x, y);
>   return 0;
> }
> which seems to print
> 00000012455ee8f5 0000000135d6ddc9
> at -O0 and
> fffffff2455ee8f5 0000000135d6ddc9
> at -O2/-O3, dunno if in that case we consider those padding bits
> uninitialized/anything can happen, then it is fine, or something else.
> 
> > I'll note that we do not constrain 'type' so the V_C_E could
> > re-interpret the integer (with or without padding) as floating-point.
> 
> Anyway, if we have the partial int types with undefined padding bits
> (the above __typeof of a bitfield type or _BitInt on x86/aarch64),
> I guess the precision0 == precision1 case is ok, we have
> say 3 padding bits before/after, a valid program better should mask
> away those bits or it will be UB.
> If precision0 == precision1 and type1 has the extended padding bits
> and type0 has undefined padding bits, then I guess it is also ok.
> Other cases of same precision are not ok, turning well defined bits
> into undefined or changing sign extended to zero extended or vice versa.
> 
> For the precision0 > precision1 && unsigned1 case on the other side,
> I don't see how it is ever correct for the undefined padding bits cases,
> the padding bits above type0 are undefined before/after the simplification,
> but the bits in between are changed from previously zero to undefined.
> 
> For the precision0 > precision1 cases if both type0 and type1 are
> extended like _BitInt on arm, then the simplification is reasonable,
> all the padding bits above type0 are zeros and all the padding bits between
> type1 and type0 precision are zeros too, before and after.
> 
> So, indeed my patch isn't correct.
> 
> And because we don't have the _BitInt arm support for GCC 14, perhaps
> we could defer that part for GCC 15, though I wonder if we shouldn't just
> kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
> && TYPE_UNSIGNED (TREE_TYPE (@1)) case, because it is clearly wrong
> for the types with undefined padding bits (everything on trunk right now).

The comment says it was added for (char)_Bool, probably 
gcc.dg/tree-ssa/vce-1.c will fail.  But I'm not sure this optimization
is important, it seems early SRA introduces a V_C_E here and then
phiopt the conversion to unsigned char.  But this is a V_C_E of
_Bool -> unsigned char -> _Bool.  We seem to apply both this
pattern and then drop the V_C_E.

Maybe we can reasonably optimize the case where type == TREE_TYPE (@1)
and the intermediate precision is bigger or equal?

Richard.


> If I have
> struct S { signed long long a : 42; unsigned long long b : 37; } s;
> __typeof (s.b) v1 = ...;
> __typeof (s.a) v2;
> unsigned long long v3;
> v2 = v1; // This zero extends the 5 bits
> v3 = VCE <unsigned long long> (v2);
> return v3 & 0x3ffffffffffULL;
> then without the folding it used to be valid, those 5 bits were zeros,
> with the optimization it is not.  Similarly to VCE to struct or double
> and the like.  And similarly for unsigned _BitInt(37)/ signed _BitInt(42)
> on x86/aarch64.
> 
> 	Jakub
> 
>
  
Jakub Jelinek March 27, 2024, 2:54 p.m. UTC | #4
On Wed, Mar 27, 2024 at 01:42:03PM +0100, Richard Biener wrote:
> The comment says it was added for (char)_Bool, probably 
> gcc.dg/tree-ssa/vce-1.c will fail.  But I'm not sure this optimization
> is important, it seems early SRA introduces a V_C_E here and then
> phiopt the conversion to unsigned char.  But this is a V_C_E of
> _Bool -> unsigned char -> _Bool.  We seem to apply both this
> pattern and then drop the V_C_E.
> 
> Maybe we can reasonably optimize the case where type == TREE_TYPE (@1)
> and the intermediate precision is bigger or equal?

If type is INTEGRAL_TYPE_P and has TYPE_PRECISION <= that of TREE_TYPE (@1),
sure, it can be optimized whenever TREE_TYPE (@0) has precision >= that,
no matter what TYPE_SIGN TREE_TYPE (@1) has.
Anyway, will try to cook up something.

	Jakub
  

Patch

--- gcc/match.pd.jj	2024-03-15 11:04:24.672914747 +0100
+++ gcc/match.pd	2024-03-26 15:49:44.177864509 +0100
@@ -4699,13 +4699,38 @@  (define_operator_list SYNC_FETCH_AND_AND
    zero-extend while keeping the same size (for bool-to-char).  */
 (simplify
   (view_convert (convert@0 @1))
-  (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
-       && (INTEGRAL_TYPE_P (TREE_TYPE (@1)) || POINTER_TYPE_P (TREE_TYPE (@1)))
-       && TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1))
-       && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
-	   || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
-	       && TYPE_UNSIGNED (TREE_TYPE (@1)))))
-   (view_convert @1)))
+  (with { tree type0 = TREE_TYPE (@0);
+	  tree type1 = TREE_TYPE (@1);
+	  bool ok = false;
+	  if ((INTEGRAL_TYPE_P (type0) || POINTER_TYPE_P (type0))
+	      && (INTEGRAL_TYPE_P (type1) || POINTER_TYPE_P (type1))
+	      && TYPE_SIZE (type0) == TYPE_SIZE (type1))
+	    {
+	      if (TYPE_PRECISION (type0) == TYPE_PRECISION (type1))
+		{
+		  if (TREE_CODE (type0) != BITINT_TYPE)
+		    ok = true;
+		  else
+		    {
+		      /* Avoid optimizing this if type0 is a _BitInt
+			 type with padding bits which are supposed to be
+			 extended.  */
+		      struct bitint_info info;
+		      targetm.c.bitint_type_info (TYPE_PRECISION (type0),
+						  &info);
+		      if (!info.extended)
+			ok = true;
+		      else
+			ok = (TYPE_PRECISION (type0)
+			      == tree_to_uhwi (TYPE_SIZE (type0)));
+		    }
+		}
+	      else if (TYPE_PRECISION (type0) > TYPE_PRECISION (type1)
+		       && TYPE_UNSIGNED (type1))
+		ok = true;
+	    } }
+   (if (ok)
+    (view_convert @1))))
 
 /* Simplify a view-converted empty or single-element constructor.  */
 (simplify