ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034]

Message ID Zp9LSqpoZShk+eYT@tucnak
State New
Headers
Series ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 fail Patch failed to apply

Commit Message

Jakub Jelinek July 23, 2024, 6:18 a.m. UTC
  Hi!

The folding into REALPART_EXPR is correct, used only when the mem_offset
is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just
that it is not 0).
The following patch fixes that by using IMAGPART_EXPR only if the offset
is right and using BITFIELD_REF or whatever else otherwise.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2
and other release branches?

2024-07-23  Jakub Jelinek  <jakub@redhat.com>
	    Andrew Pinski  <quic_apinski@quicinc.com>

	PR tree-optimization/116034
	* tree-sra.cc (maybe_rewrite_mem_ref_base): Only use IMAGPART_EXPR
	if MEM_REF offset is equal to element type size.

	* gcc.dg/pr116034.c: New test.


	Jakub
  

Comments

Richard Biener July 23, 2024, 6:42 a.m. UTC | #1
On Tue, 23 Jul 2024, Jakub Jelinek wrote:

> Hi!
> 
> The folding into REALPART_EXPR is correct, used only when the mem_offset
> is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just
> that it is not 0).
> The following patch fixes that by using IMAGPART_EXPR only if the offset
> is right and using BITFIELD_REF or whatever else otherwise.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2
> and other release branches?

I think this is not enough - what kind of GIMPLE does this result in?
You should amend the checking in non_rewritable_mem_ref_base as well,
it should fail when the corresponding rewrite doesn't work.

I suspect it falls through to the BIT_FIELD_REF code?

That said, can you match up the offset check with that of
non_rewritable_mem_ref_base then, particularly

      if ((VECTOR_TYPE_P (TREE_TYPE (decl))
           || TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE)
          && useless_type_conversion_p (TREE_TYPE (base),
                                        TREE_TYPE (TREE_TYPE (decl)))
          && known_ge (mem_ref_offset (base), 0)
          && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
(decl))),
                       mem_ref_offset (base))
          && multiple_p (mem_ref_offset (base),
                         wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
(base)))))

I suppose this check should be adjusted to use the three arg multiple_p
and check the factor against 0/1 for COMPLEX_TYPE.

It would probably better maintainance-wise to merge the checking
and transform routines and make it have two modes, check only or
check-and-transform.  That makes it easier to keep them in sync.

Richard.

> 2024-07-23  Jakub Jelinek  <jakub@redhat.com>
> 	    Andrew Pinski  <quic_apinski@quicinc.com>
> 
> 	PR tree-optimization/116034
> 	* tree-sra.cc (maybe_rewrite_mem_ref_base): Only use IMAGPART_EXPR
> 	if MEM_REF offset is equal to element type size.
> 
> 	* gcc.dg/pr116034.c: New test.
> 
> --- gcc/tree-ssa.cc.jj	2024-03-11 11:00:46.768915988 +0100
> +++ gcc/tree-ssa.cc	2024-07-22 21:27:02.115530861 +0200
> @@ -1506,7 +1506,10 @@ maybe_rewrite_mem_ref_base (tree *tp, bi
>  	}
>        else if (TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE
>  	       && useless_type_conversion_p (TREE_TYPE (*tp),
> -					     TREE_TYPE (TREE_TYPE (sym))))
> +					     TREE_TYPE (TREE_TYPE (sym)))
> +	       && (integer_zerop (TREE_OPERAND (*tp, 1))
> +		   || tree_int_cst_equal (TREE_OPERAND (*tp, 1),
> +					  TYPE_SIZE_UNIT (TREE_TYPE (*tp)))))
>  	{
>  	  *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
>  			? REALPART_EXPR : IMAGPART_EXPR,
> --- gcc/testsuite/gcc.dg/pr116034.c.jj	2024-07-22 21:39:50.050994243 +0200
> +++ gcc/testsuite/gcc.dg/pr116034.c	2024-07-22 21:39:32.432213042 +0200
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/116034 */
> +/* { dg-do run } */
> +/* { dg-options "-O1 -fno-strict-aliasing" } */
> +
> +int g;
> +
> +static inline int
> +foo (_Complex unsigned short c)
> +{
> +  __builtin_memmove (&g, 1 + (char *) &c, 2);
> +  return g;
> +}
> +
> +int
> +main ()
> +{
> +  if (__SIZEOF_SHORT__ == 2
> +      && __CHAR_BIT__ == 8
> +      && foo (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? 0x0100 : 1) != 1)
> +    __builtin_abort ();
> +}
> 
> 	Jakub
> 
>
  
Jakub Jelinek July 23, 2024, 8:07 a.m. UTC | #2
On Tue, Jul 23, 2024 at 08:42:24AM +0200, Richard Biener wrote:
> On Tue, 23 Jul 2024, Jakub Jelinek wrote:
> > The folding into REALPART_EXPR is correct, used only when the mem_offset
> > is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just
> > that it is not 0).
> > The following patch fixes that by using IMAGPART_EXPR only if the offset
> > is right and using BITFIELD_REF or whatever else otherwise.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2
> > and other release branches?
> 
> I think this is not enough - what kind of GIMPLE does this result in?

If it is
  __builtin_memmove (&g, 2 + (char *) &c, 2);
then
  _1 = &c + 2;
  _3 = MEM <unsigned short> [(char * {ref-all})_1];
is optimized to
  _3 = IMAGPART_EXPR <c_6(D)>;
as before.  If it is
  __builtin_memmove (&g, 1 + (char *) &c, 2);
from the testcase, then
  _1 = &c + 1;
  _3 = MEM <unsigned short> [(char * {ref-all})_1];
is optimized to
  _3 = BIT_FIELD_REF <c_6(D), 16, 8>;
and that is expanded correctly.

> You should amend the checking in non_rewritable_mem_ref_base as well,
> it should fail when the corresponding rewrite doesn't work.

That is already the case I believe.
non_rewritable_mem_ref_base rejects it in the VECTOR_TYPE/COMPLEX_TYPE
case (so doesn't return NULL), but then falls through the
      /* For integral typed extracts we can use a BIT_FIELD_REF.  */
check and returns NULL_TREE there.
But then maybe_rewrite_mem_ref_base triggers already on the COMPLEX_TYPE
case.

> I suspect it falls through to the BIT_FIELD_REF code?
> 
> That said, can you match up the offset check with that of
> non_rewritable_mem_ref_base then, particularly
> 
>       if ((VECTOR_TYPE_P (TREE_TYPE (decl))
>            || TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE)
>           && useless_type_conversion_p (TREE_TYPE (base),
>                                         TREE_TYPE (TREE_TYPE (decl)))
>           && known_ge (mem_ref_offset (base), 0)
>           && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> (decl))),
>                        mem_ref_offset (base))
>           && multiple_p (mem_ref_offset (base),
>                          wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> (base)))))
> 
> I suppose this check should be adjusted to use the three arg multiple_p
> and check the factor against 0/1 for COMPLEX_TYPE.

Isn't that just too complex/expensive for something as simple as
mem_ref_offset (base) is 0 or TYPE_SIZE_UNIT?
That
integer_zerop () || tree_int_cst_equal looked much cheaper.
Sure, it could be done on poly_int_cst instead if that looks better:
  && (known_eq (mem_ref_offset (base), 0)
      || known_eq (mem_ref_offset (base),
		   wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))))
And shouldn't we cache those mem_ref_offset (base) and
wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl)))
which are used in multiple spots?

Anyway, yet another option because non_rewritable_mem_ref_base has
the VECTOR/COMPLEX types cases together would be to have them together
in maybe_rewrite_mem_ref_base too, so do:
      if ((VECTOR_TYPE_P (TREE_TYPE (sym))
	   || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE)
          && useless_type_conversion_p (TREE_TYPE (*tp),
                                        TREE_TYPE (TREE_TYPE (sym)))
          && multiple_p (mem_ref_offset (*tp),
                         wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp)))))
        {
	  if (VECTOR_TYPE_P (TREE_TYPE (sym)))
	    *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, 
			  TYPE_SIZE (TREE_TYPE (*tp)),
			  int_const_binop (MULT_EXPR,
					   bitsize_int (BITS_PER_UNIT),
					   TREE_OPERAND (*tp, 1)));
	  else
	    *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
			  ? REALPART_EXPR : IMAGPART_EXPR,
			  TREE_TYPE (*tp), sym);
        }

	Jakub
  
Jakub Jelinek July 23, 2024, 8:33 a.m. UTC | #3
On Tue, Jul 23, 2024 at 10:07:08AM +0200, Jakub Jelinek wrote:
> Anyway, yet another option because non_rewritable_mem_ref_base has
> the VECTOR/COMPLEX types cases together would be to have them together
> in maybe_rewrite_mem_ref_base too, so do:

In patch form that would be:

2024-07-23  Jakub Jelinek  <jakub@redhat.com>
	    Andrew Pinski  <quic_apinski@quicinc.com>

	PR tree-optimization/116034
	* tree-ssa.cc (maybe_rewrite_mem_ref_base): Merge the VECTOR and COMPLEX
	type checks.

	* gcc.dg/pr116034.c: New test.

--- gcc/tree-ssa.cc.jj	2024-03-11 11:00:46.768915988 +0100
+++ gcc/tree-ssa.cc	2024-07-23 10:24:54.564568968 +0200
@@ -1492,25 +1492,23 @@ maybe_rewrite_mem_ref_base (tree *tp, bi
       && is_gimple_reg_type (TREE_TYPE (*tp))
       && ! VOID_TYPE_P (TREE_TYPE (*tp)))
     {
-      if (VECTOR_TYPE_P (TREE_TYPE (sym))
+      if ((VECTOR_TYPE_P (TREE_TYPE (sym))
+	   || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE)
 	  && useless_type_conversion_p (TREE_TYPE (*tp),
 					TREE_TYPE (TREE_TYPE (sym)))
 	  && multiple_p (mem_ref_offset (*tp),
 			 wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp)))))
 	{
-	  *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, 
-			TYPE_SIZE (TREE_TYPE (*tp)),
-			int_const_binop (MULT_EXPR,
-					 bitsize_int (BITS_PER_UNIT),
-					 TREE_OPERAND (*tp, 1)));
-	}
-      else if (TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE
-	       && useless_type_conversion_p (TREE_TYPE (*tp),
-					     TREE_TYPE (TREE_TYPE (sym))))
-	{
-	  *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
-			? REALPART_EXPR : IMAGPART_EXPR,
-			TREE_TYPE (*tp), sym);
+	  if (VECTOR_TYPE_P (TREE_TYPE (sym)))
+	    *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, 
+			  TYPE_SIZE (TREE_TYPE (*tp)),
+			  int_const_binop (MULT_EXPR,
+					   bitsize_int (BITS_PER_UNIT),
+					   TREE_OPERAND (*tp, 1)));
+	  else
+	    *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
+			  ? REALPART_EXPR : IMAGPART_EXPR,
+			  TREE_TYPE (*tp), sym);
 	}
       else if (integer_zerop (TREE_OPERAND (*tp, 1))
 	       && DECL_SIZE (sym) == TYPE_SIZE (TREE_TYPE (*tp)))
--- gcc/testsuite/gcc.dg/pr116034.c.jj	2024-07-22 21:39:50.050994243 +0200
+++ gcc/testsuite/gcc.dg/pr116034.c	2024-07-23 10:26:29.940340508 +0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/116034 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-strict-aliasing" } */
+
+int g;
+
+static inline int
+foo (_Complex unsigned short c)
+{
+  __builtin_memmove (&g, 1 + (char *) &c, 2);
+  return g;
+}
+
+int
+main ()
+{
+  if (__SIZEOF_SHORT__ == 2
+      && __CHAR_BIT__ == 8
+      && (foo (__BYTE_ORDER__ != __ORDER_BIG_ENDIAN__ ? 0x100 : 1)
+	  != (__BYTE_ORDER__ != __ORDER_BIG_ENDIAN__ ? 1 : 0x100)))
+    __builtin_abort ();
+}


	Jakub
  
Richard Biener July 23, 2024, 8:44 a.m. UTC | #4
On Tue, 23 Jul 2024, Jakub Jelinek wrote:

> On Tue, Jul 23, 2024 at 08:42:24AM +0200, Richard Biener wrote:
> > On Tue, 23 Jul 2024, Jakub Jelinek wrote:
> > > The folding into REALPART_EXPR is correct, used only when the mem_offset
> > > is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just
> > > that it is not 0).
> > > The following patch fixes that by using IMAGPART_EXPR only if the offset
> > > is right and using BITFIELD_REF or whatever else otherwise.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2
> > > and other release branches?
> > 
> > I think this is not enough - what kind of GIMPLE does this result in?
> 
> If it is
>   __builtin_memmove (&g, 2 + (char *) &c, 2);
> then
>   _1 = &c + 2;
>   _3 = MEM <unsigned short> [(char * {ref-all})_1];
> is optimized to
>   _3 = IMAGPART_EXPR <c_6(D)>;
> as before.  If it is
>   __builtin_memmove (&g, 1 + (char *) &c, 2);
> from the testcase, then
>   _1 = &c + 1;
>   _3 = MEM <unsigned short> [(char * {ref-all})_1];
> is optimized to
>   _3 = BIT_FIELD_REF <c_6(D), 16, 8>;
> and that is expanded correctly.
> 
> > You should amend the checking in non_rewritable_mem_ref_base as well,
> > it should fail when the corresponding rewrite doesn't work.
> 
> That is already the case I believe.
> non_rewritable_mem_ref_base rejects it in the VECTOR_TYPE/COMPLEX_TYPE
> case (so doesn't return NULL), but then falls through the
>       /* For integral typed extracts we can use a BIT_FIELD_REF.  */
> check and returns NULL_TREE there.
> But then maybe_rewrite_mem_ref_base triggers already on the COMPLEX_TYPE
> case.
> 
> > I suspect it falls through to the BIT_FIELD_REF code?
> > 
> > That said, can you match up the offset check with that of
> > non_rewritable_mem_ref_base then, particularly
> > 
> >       if ((VECTOR_TYPE_P (TREE_TYPE (decl))
> >            || TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE)
> >           && useless_type_conversion_p (TREE_TYPE (base),
> >                                         TREE_TYPE (TREE_TYPE (decl)))
> >           && known_ge (mem_ref_offset (base), 0)
> >           && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> > (decl))),
> >                        mem_ref_offset (base))
> >           && multiple_p (mem_ref_offset (base),
> >                          wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> > (base)))))
> > 
> > I suppose this check should be adjusted to use the three arg multiple_p
> > and check the factor against 0/1 for COMPLEX_TYPE.

Ah, reading the above again it should alreeady ensure the offset is
for the real or imag part only - I was thinking it might allow
larger aligned offsets.

Thus your original patch is OK.

I think an improvement would really be to merge the two functions.

Thanks,
Richard.

> Isn't that just too complex/expensive for something as simple as
> mem_ref_offset (base) is 0 or TYPE_SIZE_UNIT?
> That
> integer_zerop () || tree_int_cst_equal looked much cheaper.
> Sure, it could be done on poly_int_cst instead if that looks better:
>   && (known_eq (mem_ref_offset (base), 0)
>       || known_eq (mem_ref_offset (base),
> 		   wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))))
> And shouldn't we cache those mem_ref_offset (base) and
> wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl)))
> which are used in multiple spots?
> 
> Anyway, yet another option because non_rewritable_mem_ref_base has
> the VECTOR/COMPLEX types cases together would be to have them together
> in maybe_rewrite_mem_ref_base too, so do:
>       if ((VECTOR_TYPE_P (TREE_TYPE (sym))
> 	   || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE)
>           && useless_type_conversion_p (TREE_TYPE (*tp),
>                                         TREE_TYPE (TREE_TYPE (sym)))
>           && multiple_p (mem_ref_offset (*tp),
>                          wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp)))))
>         {
> 	  if (VECTOR_TYPE_P (TREE_TYPE (sym)))
> 	    *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, 
> 			  TYPE_SIZE (TREE_TYPE (*tp)),
> 			  int_const_binop (MULT_EXPR,
> 					   bitsize_int (BITS_PER_UNIT),
> 					   TREE_OPERAND (*tp, 1)));
> 	  else
> 	    *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
> 			  ? REALPART_EXPR : IMAGPART_EXPR,
> 			  TREE_TYPE (*tp), sym);
>         }
> 
> 	Jakub
> 
>
  

Patch

--- gcc/tree-ssa.cc.jj	2024-03-11 11:00:46.768915988 +0100
+++ gcc/tree-ssa.cc	2024-07-22 21:27:02.115530861 +0200
@@ -1506,7 +1506,10 @@  maybe_rewrite_mem_ref_base (tree *tp, bi
 	}
       else if (TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE
 	       && useless_type_conversion_p (TREE_TYPE (*tp),
-					     TREE_TYPE (TREE_TYPE (sym))))
+					     TREE_TYPE (TREE_TYPE (sym)))
+	       && (integer_zerop (TREE_OPERAND (*tp, 1))
+		   || tree_int_cst_equal (TREE_OPERAND (*tp, 1),
+					  TYPE_SIZE_UNIT (TREE_TYPE (*tp)))))
 	{
 	  *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
 			? REALPART_EXPR : IMAGPART_EXPR,
--- gcc/testsuite/gcc.dg/pr116034.c.jj	2024-07-22 21:39:50.050994243 +0200
+++ gcc/testsuite/gcc.dg/pr116034.c	2024-07-22 21:39:32.432213042 +0200
@@ -0,0 +1,21 @@ 
+/* PR tree-optimization/116034 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-strict-aliasing" } */
+
+int g;
+
+static inline int
+foo (_Complex unsigned short c)
+{
+  __builtin_memmove (&g, 1 + (char *) &c, 2);
+  return g;
+}
+
+int
+main ()
+{
+  if (__SIZEOF_SHORT__ == 2
+      && __CHAR_BIT__ == 8
+      && foo (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? 0x0100 : 1) != 1)
+    __builtin_abort ();
+}