cfgexpand: Special case expansion of dead COMPLEX_EXPRs at -O0 [PR119120]

Message ID Z8/xaV0bfNxKtGJ1@tucnak
State New
Headers
Series cfgexpand: Special case expansion of dead COMPLEX_EXPRs at -O0 [PR119120] |

Commit Message

Jakub Jelinek March 11, 2025, 8:16 a.m. UTC
  Hi!

The PR119190 patch I've posted regresses the PR119120 testcase (not adding
to testsuite, as it is fairly hard to scan for that problem).
The issue is that for the partial setting of _Complex floating vars
through __real__ on it first and __imag__ later (or vice versa) and since
we forced all complex vars into SSA form we often have undefined (D)
arguments of those COMPLEX_EXPRs.  When we don't DCE them (for -O0 debug
info reasons), their expansion will copy both the real and imag parts
using the floating mode and on some targets like 387 that copying alone can
unfortunately trigger exceptions on sNaNs or other problematic bit patterns
and for uninitialized memory it can be triggered randomly based on whatever
is on the stack before.

The following patch deals just with the unused COMPLEX_EXPRs at -O0,
if it is used, either complex lowering should have replaced its parts or
it represents a user store somewhere, or passing to function argument etc.,
all that should be really avoided for partially uninitialized complex.
And the patch attempts to find the uninitialized arguments (unfortunately
at -O0 we don't forward propagate the (D) SSA_NAMEs) and only copy the
initialized parts.  Another option would be to arrange for the copying
to be done in integral mode rather than floating, kind of VCE and back.
Though, not sure how to handle then XFmode or on 32-bit targets TFmode
when there is no corresponding intergral mode or it isn't usable.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Or shall I go the VCE route instead?

2025-03-11  Jakub Jelinek  <jakub@redhat.com>

	PR target/119120
	* cfgexpand.cc (expand_gimple_stmt_1): At -O0 expand a COMPLEX_EXPR
	with zero uses of lhs if it has complex floating point mode as
	stores of just the parts which aren't known to be uninitialized.


	Jakub
  

Comments

Richard Biener March 11, 2025, 9:18 a.m. UTC | #1
On Tue, 11 Mar 2025, Jakub Jelinek wrote:

> Hi!
> 
> The PR119190 patch I've posted regresses the PR119120 testcase (not adding
> to testsuite, as it is fairly hard to scan for that problem).
> The issue is that for the partial setting of _Complex floating vars
> through __real__ on it first and __imag__ later (or vice versa) and since
> we forced all complex vars into SSA form we often have undefined (D)
> arguments of those COMPLEX_EXPRs.  When we don't DCE them (for -O0 debug
> info reasons), their expansion will copy both the real and imag parts
> using the floating mode and on some targets like 387 that copying alone can
> unfortunately trigger exceptions on sNaNs or other problematic bit patterns
> and for uninitialized memory it can be triggered randomly based on whatever
> is on the stack before.
> 
> The following patch deals just with the unused COMPLEX_EXPRs at -O0,
> if it is used, either complex lowering should have replaced its parts or
> it represents a user store somewhere, or passing to function argument etc.,
> all that should be really avoided for partially uninitialized complex.
> And the patch attempts to find the uninitialized arguments (unfortunately
> at -O0 we don't forward propagate the (D) SSA_NAMEs) and only copy the
> initialized parts.  Another option would be to arrange for the copying
> to be done in integral mode rather than floating, kind of VCE and back.
> Though, not sure how to handle then XFmode or on 32-bit targets TFmode
> when there is no corresponding intergral mode or it isn't usable.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Or shall I go the VCE route instead?

I think the patch as-is is more robust, but still - ugh ... I wonder
whether we can instead avoid introducing the COMPLEX_EXPR at all
at -O0?

> 2025-03-11  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/119120
> 	* cfgexpand.cc (expand_gimple_stmt_1): At -O0 expand a COMPLEX_EXPR
> 	with zero uses of lhs if it has complex floating point mode as
> 	stores of just the parts which aren't known to be uninitialized.
> 
> --- gcc/cfgexpand.cc.jj	2025-01-07 20:11:04.632662813 +0100
> +++ gcc/cfgexpand.cc	2025-03-10 21:28:01.071078448 +0100
> @@ -4294,6 +4294,70 @@ expand_gimple_stmt_1 (gimple *stmt)
>  	    if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
>  	      promoted = true;
>  
> +	    /* At -O0 an unused COMPLEX_EXPR might be kept in the IL by
> +	       cplxlower0 pass to ensure correct debug info.  If one or both
> +	       arguments of COMPLEX_EXPR is unitialized and it is a complex
> +	       floating-point mode, don't actually copy the uninitialized
> +	       part(s) using floating-point mode, as that could cause extra
> +	       exceptions.  */
> +	    if (!optimize
> +		&& gimple_assign_rhs_code (assign_stmt) == COMPLEX_EXPR
> +		&& TREE_CODE (lhs) == SSA_NAME
> +		&& has_zero_uses (lhs)
> +		&& MEM_P (target)
> +		&& SCALAR_FLOAT_MODE_P (GET_MODE_INNER (GET_MODE (target))))
> +	      {
> +		op0 = gimple_assign_rhs1 (assign_stmt);
> +		tree op1 = gimple_assign_rhs2 (assign_stmt);
> +		gimple *g;
> +		bool ignore_op0 = (TREE_CODE (op0) == SSA_NAME
> +				   && SSA_NAME_IS_DEFAULT_DEF (op0)
> +				   && (!SSA_NAME_VAR (op0)
> +				       || VAR_P (SSA_NAME_VAR (op0))));
> +		bool ignore_op1 = (TREE_CODE (op1) == SSA_NAME
> +				   && SSA_NAME_IS_DEFAULT_DEF (op1)
> +				   && (!SSA_NAME_VAR (op1)
> +				       || VAR_P (SSA_NAME_VAR (op1))));
> +		tree tem = op0;
> +		while (!ignore_op0
> +		       && TREE_CODE (tem) == SSA_NAME
> +		       && (g = SSA_NAME_DEF_STMT (tem))
> +		       && is_gimple_assign (g)
> +		       && gimple_assign_rhs_code (g) == SSA_NAME)
> +		  {
> +		    tem = gimple_assign_rhs1 (g);
> +		    if (SSA_NAME_IS_DEFAULT_DEF (tem)
> +			&& (!SSA_NAME_VAR (tem) || VAR_P (SSA_NAME_VAR (tem))))
> +		      ignore_op0 = true;
> +		  }
> +		tem = op1;
> +		while (!ignore_op1
> +		       && TREE_CODE (tem) == SSA_NAME
> +		       && (g = SSA_NAME_DEF_STMT (tem))
> +		       && is_gimple_assign (g)
> +		       && gimple_assign_rhs_code (g) == SSA_NAME)
> +		  {
> +		    tem = gimple_assign_rhs1 (g);
> +		    if (SSA_NAME_IS_DEFAULT_DEF (tem)
> +			&& (!SSA_NAME_VAR (tem) || VAR_P (SSA_NAME_VAR (tem))))
> +		      ignore_op1 = true;
> +		  }
> +		if (ignore_op0 || ignore_op1)
> +		  {
> +		    if (!ignore_op0)
> +		      {
> +			rtx rop0 = expand_normal (op0);
> +			write_complex_part (target, rop0, 0, true);
> +		      }
> +		    else if (!ignore_op1)
> +		      {
> +			rtx rop1 = expand_normal (op1);
> +			write_complex_part (target, rop1, 1, true);
> +		      }
> +		    break;
> +		  }
> +	      }
> +
>  	   /* If we store into a promoted register, don't directly
>  	      expand to target.  */
>  	    temp = promoted ? NULL_RTX : target;
> 
> 	Jakub
> 
>
  
Jakub Jelinek March 11, 2025, 11:05 a.m. UTC | #2
On Tue, Mar 11, 2025 at 10:18:18AM +0100, Richard Biener wrote:
> I think the patch as-is is more robust, but still - ugh ... I wonder
> whether we can instead avoid introducing the COMPLEX_EXPR at all
> at -O0?

Can we set DECL_NOT_GIMPLE_REG_P at -O0 during gimplification (where
we've already handled some uses/setters of it), at least when
gimplify_modify_expr_complex_part sees {REAL,IMAG}PART_EXPR on
{VAR,PARM,RESULT}_DECL?
Or would we need to basically revert to the old way (for -O0 only) where
we assumed all complex vars aren't gimple regs unless proven otherwise
(not assume it for getting initialized internal vars of course)?

	Jakub
  
Richard Biener March 11, 2025, 11:13 a.m. UTC | #3
On Tue, 11 Mar 2025, Jakub Jelinek wrote:

> On Tue, Mar 11, 2025 at 10:18:18AM +0100, Richard Biener wrote:
> > I think the patch as-is is more robust, but still - ugh ... I wonder
> > whether we can instead avoid introducing the COMPLEX_EXPR at all
> > at -O0?
> 
> Can we set DECL_NOT_GIMPLE_REG_P at -O0 during gimplification (where
> we've already handled some uses/setters of it), at least when
> gimplify_modify_expr_complex_part sees {REAL,IMAG}PART_EXPR on
> {VAR,PARM,RESULT}_DECL?

Yes, that should work for LHS __real / __imag.

> Or would we need to basically revert to the old way (for -O0 only) where
> we assumed all complex vars aren't gimple regs unless proven otherwise
> (not assume it for getting initialized internal vars of course)?

No, I wouldn't go this far at this point.

Richard.
  
Jakub Jelinek March 12, 2025, 12:40 p.m. UTC | #4
On Tue, Mar 11, 2025 at 12:13:13PM +0100, Richard Biener wrote:
> On Tue, 11 Mar 2025, Jakub Jelinek wrote:
> 
> > On Tue, Mar 11, 2025 at 10:18:18AM +0100, Richard Biener wrote:
> > > I think the patch as-is is more robust, but still - ugh ... I wonder
> > > whether we can instead avoid introducing the COMPLEX_EXPR at all
> > > at -O0?
> > 
> > Can we set DECL_NOT_GIMPLE_REG_P at -O0 during gimplification (where
> > we've already handled some uses/setters of it), at least when
> > gimplify_modify_expr_complex_part sees {REAL,IMAG}PART_EXPR on
> > {VAR,PARM,RESULT}_DECL?
> 
> Yes, that should work for LHS __real / __imag.

Unfortunately it doesn't.

Although successfully bootstrapped on x86_64-linux and i686-linux,
it caused g++.dg/cpp1z/decomp2.C, g++.dg/torture/pr109262.C and
g++.dg/torture/pr88149.C regressions.

Minimal testcase is -O0:
void
foo (float x, float y)
{
  __complex__ float z = x + y * 1.0fi;
  __real__ z = 1.0f;
}
which ICEs with
pr88149.c: In function ‘foo’:
pr88149.c:2:1: error: non-register as LHS of binary operation
    2 | foo (float x, float y)
      | ^~~
z = COMPLEX_EXPR <_2, y.0>;
pr88149.c:2:1: internal compiler error: ‘verify_gimple’ failed
When the initialization is being gimplified, z is still
not DECL_NOT_GIMPLE_REG_P and so is_gimple_reg is true for it and
so it gimplifies it as
  z = COMPLEX_EXPR <_2, y.0>;
later, instead of building
  _3 = IMAGPART_EXPR <z>;
  z = COMPLEX_EXPR <1.0e+0, _3>;
like before, the patch forces z to be not a gimple reg and uses
  REALPART_EXPR <z> = 1.0e+0;
but it is too late, nothing fixes up the gimplification of the COMPLEX_EXPR
anymore.

So, I think we'd really need to do it the old way with adjusted naming
of the flag, so assume for all non-addressable
VAR_DECLs/PARM_DECLs/RESULT_DECLs with COMPLEX_TYPE if (!optimize) they
are DECL_NOT_GIMPLE_REG_P (perhaps with the exception of
get_internal_tmp_var), and at some point (what) if at all optimize that
away if the partial accesses aren't done.

Thoughts on this?

2025-03-11  Jakub Jelinek  <jakub@redhat.com>

	PR target/119120
	* gimplify.cc (gimplify_modify_expr): Don't call
	gimplify_modify_expr_complex_part for -O0, instead set
	DECL_NOT_GIMPLE_REG_P on the {REAL,IMAG}PART_EXPR operand.

--- gcc/gimplify.cc.jj	2025-03-10 09:31:20.579772627 +0100
+++ gcc/gimplify.cc	2025-03-11 15:03:27.633636148 +0100
@@ -7237,7 +7237,15 @@ gimplify_modify_expr (tree *expr_p, gimp
   if ((TREE_CODE (*to_p) == REALPART_EXPR
        || TREE_CODE (*to_p) == IMAGPART_EXPR)
       && is_gimple_reg (TREE_OPERAND (*to_p, 0)))
-    return gimplify_modify_expr_complex_part (expr_p, pre_p, want_value);
+    {
+      if (optimize)
+	return gimplify_modify_expr_complex_part (expr_p, pre_p, want_value);
+      /* When not optimizing, instead set DECL_NOT_GIMPLE_REG_P on it and
+	 keep using {REAL,IMAG}PART_EXPR on the partial initializations
+	 in order to avoid copying around uninitialized parts.  See
+	 PR119120.  */
+      DECL_NOT_GIMPLE_REG_P (TREE_OPERAND (*to_p, 0)) = 1;
+    }
 
   /* Try to alleviate the effects of the gimplification creating artificial
      temporaries (see for example is_gimple_reg_rhs) on the debug info, but


	Jakub
  
Richard Biener March 12, 2025, 1:01 p.m. UTC | #5
On Wed, 12 Mar 2025, Jakub Jelinek wrote:

> On Tue, Mar 11, 2025 at 12:13:13PM +0100, Richard Biener wrote:
> > On Tue, 11 Mar 2025, Jakub Jelinek wrote:
> > 
> > > On Tue, Mar 11, 2025 at 10:18:18AM +0100, Richard Biener wrote:
> > > > I think the patch as-is is more robust, but still - ugh ... I wonder
> > > > whether we can instead avoid introducing the COMPLEX_EXPR at all
> > > > at -O0?
> > > 
> > > Can we set DECL_NOT_GIMPLE_REG_P at -O0 during gimplification (where
> > > we've already handled some uses/setters of it), at least when
> > > gimplify_modify_expr_complex_part sees {REAL,IMAG}PART_EXPR on
> > > {VAR,PARM,RESULT}_DECL?
> > 
> > Yes, that should work for LHS __real / __imag.
> 
> Unfortunately it doesn't.
> 
> Although successfully bootstrapped on x86_64-linux and i686-linux,
> it caused g++.dg/cpp1z/decomp2.C, g++.dg/torture/pr109262.C and
> g++.dg/torture/pr88149.C regressions.
> 
> Minimal testcase is -O0:
> void
> foo (float x, float y)
> {
>   __complex__ float z = x + y * 1.0fi;
>   __real__ z = 1.0f;
> }
> which ICEs with
> pr88149.c: In function ‘foo’:
> pr88149.c:2:1: error: non-register as LHS of binary operation
>     2 | foo (float x, float y)
>       | ^~~
> z = COMPLEX_EXPR <_2, y.0>;
> pr88149.c:2:1: internal compiler error: ‘verify_gimple’ failed
> When the initialization is being gimplified, z is still
> not DECL_NOT_GIMPLE_REG_P and so is_gimple_reg is true for it and
> so it gimplifies it as
>   z = COMPLEX_EXPR <_2, y.0>;
> later, instead of building
>   _3 = IMAGPART_EXPR <z>;
>   z = COMPLEX_EXPR <1.0e+0, _3>;
> like before, the patch forces z to be not a gimple reg and uses
>   REALPART_EXPR <z> = 1.0e+0;
> but it is too late, nothing fixes up the gimplification of the COMPLEX_EXPR
> anymore.

Ah, yeah - setting DECL_NOT_GIMPLE_REG_P "after the fact" doesn't work.

> So, I think we'd really need to do it the old way with adjusted naming
> of the flag, so assume for all non-addressable
> VAR_DECLs/PARM_DECLs/RESULT_DECLs with COMPLEX_TYPE if (!optimize) they
> are DECL_NOT_GIMPLE_REG_P (perhaps with the exception of
> get_internal_tmp_var), and at some point (what) if at all optimize that
> away if the partial accesses aren't done.

We could of course do that in is_gimple_reg (), but I'm not sure if
all places that would need to check do so.  Alternatively gimplify

__real x = ..

into

tem[DECL_NOT_GIMPLE_REG_P] = x;
__real tem = ...;
x = tem;

when 'x' is a is_gimple_reg?  Of course for -O0 this would be quite bad.
Likewise for your idea - where would we do this optimization when not
optimizing?

So it would need to be the frontend(s) setting DECL_NOT_GIMPLE_REG_P
when producing lvalue __real/__imag accesses?

Richard.


> Thoughts on this?
> 
> 2025-03-11  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/119120
> 	* gimplify.cc (gimplify_modify_expr): Don't call
> 	gimplify_modify_expr_complex_part for -O0, instead set
> 	DECL_NOT_GIMPLE_REG_P on the {REAL,IMAG}PART_EXPR operand.
> 
> --- gcc/gimplify.cc.jj	2025-03-10 09:31:20.579772627 +0100
> +++ gcc/gimplify.cc	2025-03-11 15:03:27.633636148 +0100
> @@ -7237,7 +7237,15 @@ gimplify_modify_expr (tree *expr_p, gimp
>    if ((TREE_CODE (*to_p) == REALPART_EXPR
>         || TREE_CODE (*to_p) == IMAGPART_EXPR)
>        && is_gimple_reg (TREE_OPERAND (*to_p, 0)))
> -    return gimplify_modify_expr_complex_part (expr_p, pre_p, want_value);
> +    {
> +      if (optimize)
> +	return gimplify_modify_expr_complex_part (expr_p, pre_p, want_value);
> +      /* When not optimizing, instead set DECL_NOT_GIMPLE_REG_P on it and
> +	 keep using {REAL,IMAG}PART_EXPR on the partial initializations
> +	 in order to avoid copying around uninitialized parts.  See
> +	 PR119120.  */
> +      DECL_NOT_GIMPLE_REG_P (TREE_OPERAND (*to_p, 0)) = 1;
> +    }
>  
>    /* Try to alleviate the effects of the gimplification creating artificial
>       temporaries (see for example is_gimple_reg_rhs) on the debug info, but
> 
> 
> 	Jakub
> 
>
  

Patch

--- gcc/cfgexpand.cc.jj	2025-01-07 20:11:04.632662813 +0100
+++ gcc/cfgexpand.cc	2025-03-10 21:28:01.071078448 +0100
@@ -4294,6 +4294,70 @@  expand_gimple_stmt_1 (gimple *stmt)
 	    if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
 	      promoted = true;
 
+	    /* At -O0 an unused COMPLEX_EXPR might be kept in the IL by
+	       cplxlower0 pass to ensure correct debug info.  If one or both
+	       arguments of COMPLEX_EXPR is unitialized and it is a complex
+	       floating-point mode, don't actually copy the uninitialized
+	       part(s) using floating-point mode, as that could cause extra
+	       exceptions.  */
+	    if (!optimize
+		&& gimple_assign_rhs_code (assign_stmt) == COMPLEX_EXPR
+		&& TREE_CODE (lhs) == SSA_NAME
+		&& has_zero_uses (lhs)
+		&& MEM_P (target)
+		&& SCALAR_FLOAT_MODE_P (GET_MODE_INNER (GET_MODE (target))))
+	      {
+		op0 = gimple_assign_rhs1 (assign_stmt);
+		tree op1 = gimple_assign_rhs2 (assign_stmt);
+		gimple *g;
+		bool ignore_op0 = (TREE_CODE (op0) == SSA_NAME
+				   && SSA_NAME_IS_DEFAULT_DEF (op0)
+				   && (!SSA_NAME_VAR (op0)
+				       || VAR_P (SSA_NAME_VAR (op0))));
+		bool ignore_op1 = (TREE_CODE (op1) == SSA_NAME
+				   && SSA_NAME_IS_DEFAULT_DEF (op1)
+				   && (!SSA_NAME_VAR (op1)
+				       || VAR_P (SSA_NAME_VAR (op1))));
+		tree tem = op0;
+		while (!ignore_op0
+		       && TREE_CODE (tem) == SSA_NAME
+		       && (g = SSA_NAME_DEF_STMT (tem))
+		       && is_gimple_assign (g)
+		       && gimple_assign_rhs_code (g) == SSA_NAME)
+		  {
+		    tem = gimple_assign_rhs1 (g);
+		    if (SSA_NAME_IS_DEFAULT_DEF (tem)
+			&& (!SSA_NAME_VAR (tem) || VAR_P (SSA_NAME_VAR (tem))))
+		      ignore_op0 = true;
+		  }
+		tem = op1;
+		while (!ignore_op1
+		       && TREE_CODE (tem) == SSA_NAME
+		       && (g = SSA_NAME_DEF_STMT (tem))
+		       && is_gimple_assign (g)
+		       && gimple_assign_rhs_code (g) == SSA_NAME)
+		  {
+		    tem = gimple_assign_rhs1 (g);
+		    if (SSA_NAME_IS_DEFAULT_DEF (tem)
+			&& (!SSA_NAME_VAR (tem) || VAR_P (SSA_NAME_VAR (tem))))
+		      ignore_op1 = true;
+		  }
+		if (ignore_op0 || ignore_op1)
+		  {
+		    if (!ignore_op0)
+		      {
+			rtx rop0 = expand_normal (op0);
+			write_complex_part (target, rop0, 0, true);
+		      }
+		    else if (!ignore_op1)
+		      {
+			rtx rop1 = expand_normal (op1);
+			write_complex_part (target, rop1, 1, true);
+		      }
+		    break;
+		  }
+	      }
+
 	   /* If we store into a promoted register, don't directly
 	      expand to target.  */
 	    temp = promoted ? NULL_RTX : target;