simplify-rtx: Punt on simplify_associative_operation with large operands [PR102356]

Message ID 20211130092613.GL2646553@tucnak
State Superseded
Headers
Series simplify-rtx: Punt on simplify_associative_operation with large operands [PR102356] |

Commit Message

Jakub Jelinek Nov. 30, 2021, 9:26 a.m. UTC
  Hi!

Seems simplify_associate_operation is quadratic, which isn't a big deal
for use during combine and other similar RTL passes, because those never
try to combine expressions from more than a few instructions and because
those instructions need to be recognized the machine description also bounds
how many expressions can appear in there.
var-tracking has depth limits only for some cases and unlimited depth
for the vt_expand_loc though:
/* This is the value used during expansion of locations.  We want it
   to be unbounded, so that variables expanded deep in a recursion
   nest are fully evaluated, so that their values are cached
   correctly.  We avoid recursion cycles through other means, and we
   don't unshare RTL, so excess complexity is not a problem.  */
#define EXPR_DEPTH (INT_MAX)
/* We use this to keep too-complex expressions from being emitted as
   location notes, and then to debug information.  Users can trade
   compile time for ridiculously complex expressions, although they're
   seldom useful, and they may often have to be discarded as not
   representable anyway.  */
#define EXPR_USE_DEPTH (param_max_vartrack_expr_depth)

IMO for very large expressions it isn't worth trying to reassociate though,
in fact e.g. for the new testcase below keeping it as is has bigger chance
of generating smaller debug info which the dwarf2out.c part of the change
tries to achieve - if a binary operation has the same operands, we can
use DW_OP_dup and not bother computing the possibly large operand again.

This patch punts if the associate operands contain together more than
64 same operations, which can happen only during var-tracking.
During bootstrap/regtest on x86_64-linux and i686-linux, this triggers
only on the new testcase and on gcc.dg/torture/pr88597.c.
I think given the 16 element static buffer in subrtx_iterator::array_type
it shouldn't slow down the common case of small expressions, but have
been wondering whether we shouldn't have some in_vartrack global flag
or guard it with
(current_pass && strcmp (current_pass->name, "vartrack") == 0).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Another possibility to deal with the power expressions in debug info
would be to introduce some new RTL operation for the pow{,i} (x, n)
case, allow that solely in debug insns and expand those into DWARF
using a loop.  But that seems like quite a lot of work for something rarely
used (especially when powi for larger n is only useful for 0 and 1 inputs
because anything else overflows).

2021-11-30  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/102356
	* simplify-rtx.c: Include rtl-iter.h.
	(simplify_associative_operation): Don't reassociate very large
	expressions with 64 or more CODE subrtxes in the operands.
	* dwarf2out.c (mem_loc_descriptor): Optimize binary operation
	with both operands the same using DW_OP_dup.

	* gcc.dg/pr102356.c: New test.


	Jakub
  

Comments

Richard Biener Nov. 30, 2021, 9:43 a.m. UTC | #1
On Tue, 30 Nov 2021, Jakub Jelinek wrote:

> Hi!
> 
> Seems simplify_associate_operation is quadratic, which isn't a big deal
> for use during combine and other similar RTL passes, because those never
> try to combine expressions from more than a few instructions and because
> those instructions need to be recognized the machine description also bounds
> how many expressions can appear in there.

Well ...

> var-tracking has depth limits only for some cases and unlimited depth
> for the vt_expand_loc though:
> /* This is the value used during expansion of locations.  We want it
>    to be unbounded, so that variables expanded deep in a recursion
>    nest are fully evaluated, so that their values are cached
>    correctly.  We avoid recursion cycles through other means, and we
>    don't unshare RTL, so excess complexity is not a problem.  */
> #define EXPR_DEPTH (INT_MAX)
> /* We use this to keep too-complex expressions from being emitted as
>    location notes, and then to debug information.  Users can trade
>    compile time for ridiculously complex expressions, although they're
>    seldom useful, and they may often have to be discarded as not
>    representable anyway.  */
> #define EXPR_USE_DEPTH (param_max_vartrack_expr_depth)
> 
> IMO for very large expressions it isn't worth trying to reassociate though,
> in fact e.g. for the new testcase below keeping it as is has bigger chance
> of generating smaller debug info which the dwarf2out.c part of the change
> tries to achieve - if a binary operation has the same operands, we can
> use DW_OP_dup and not bother computing the possibly large operand again.
> 
> This patch punts if the associate operands contain together more than
> 64 same operations, which can happen only during var-tracking.
> During bootstrap/regtest on x86_64-linux and i686-linux, this triggers
> only on the new testcase and on gcc.dg/torture/pr88597.c.
> I think given the 16 element static buffer in subrtx_iterator::array_type
> it shouldn't slow down the common case of small expressions, but have
> been wondering whether we shouldn't have some in_vartrack global flag
> or guard it with
> (current_pass && strcmp (current_pass->name, "vartrack") == 0).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Another possibility to deal with the power expressions in debug info
> would be to introduce some new RTL operation for the pow{,i} (x, n)
> case, allow that solely in debug insns and expand those into DWARF
> using a loop.  But that seems like quite a lot of work for something rarely
> used (especially when powi for larger n is only useful for 0 and 1 inputs
> because anything else overflows).

I wonder given we now have 'simplify_context' whether we can
track a re-association budget we can eat from.  At least your
code to determine whether the expression is too large is
quadratic as well (but bound to 64, so just a very large constant
overhead for an outermost expression of size 63).  We already
have a mem_depth there, so just have reassoc_times and punt
if that reaches --param max-simplify-reassoc-times, incrementing
it each time simplify_associative_operation is entered?

Thanks,
Richard.

> 2021-11-30  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/102356
> 	* simplify-rtx.c: Include rtl-iter.h.
> 	(simplify_associative_operation): Don't reassociate very large
> 	expressions with 64 or more CODE subrtxes in the operands.
> 	* dwarf2out.c (mem_loc_descriptor): Optimize binary operation
> 	with both operands the same using DW_OP_dup.
> 
> 	* gcc.dg/pr102356.c: New test.
> 
> --- gcc/simplify-rtx.c.jj	2021-11-05 00:43:22.576624649 +0100
> +++ gcc/simplify-rtx.c	2021-11-29 19:46:29.674750656 +0100
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
>  #include "selftest-rtl.h"
>  #include "rtx-vector-builder.h"
>  #include "rtlanal.h"
> +#include "rtl-iter.h"
>  
>  /* Simplification and canonicalization of RTL.  */
>  
> @@ -2263,9 +2264,40 @@ simplify_context::simplify_associative_o
>  {
>    rtx tem;
>  
> +  if (GET_CODE (op0) == code || GET_CODE (op1) == code)
> +    {
> +      /* During vartrack, the expressions can grow arbitrarily large.
> +	 Reassociation isn't really useful for larger expressions
> +	 and can be very compile time expensive.  */
> +      unsigned count = 0;
> +      subrtx_iterator::array_type array;
> +      FOR_EACH_SUBRTX (iter, array, op0, NONCONST)
> +	{
> +	  const_rtx x = *iter;
> +	  if (GET_CODE (x) == code)
> +	    {
> +	      if (count++ >= 64)
> +		return NULL_RTX;
> +	    }
> +	  else
> +	    iter.skip_subrtxes ();
> +	}
> +      FOR_EACH_SUBRTX (iter, array, op1, NONCONST)
> +	{
> +	  const_rtx x = *iter;
> +	  if (GET_CODE (x) == code)
> +	    {
> +	      if (count++ >= 64)
> +		return NULL_RTX;
> +	    }
> +	  else
> +	    iter.skip_subrtxes ();
> +	}
> +    }
> +
>    /* Linearize the operator to the left.  */
>    if (GET_CODE (op1) == code)
>      {
>        /* "(a op b) op (c op d)" becomes "((a op b) op c) op d)".  */
>        if (GET_CODE (op0) == code)
>  	{
> --- gcc/dwarf2out.c.jj	2021-11-29 14:24:14.053634713 +0100
> +++ gcc/dwarf2out.c	2021-11-29 19:44:54.192113401 +0100
> @@ -16363,6 +16363,15 @@ mem_loc_descriptor (rtx rtl, machine_mod
>      do_binop:
>        op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
>  				VAR_INIT_STATUS_INITIALIZED);
> +      if (XEXP (rtl, 0) == XEXP (rtl, 1))
> +	{
> +	  if (op0 == 0)
> +	    break;
> +	  mem_loc_result = op0;
> +	  add_loc_descr (&mem_loc_result, new_loc_descr (DW_OP_dup, 0, 0));
> +	  add_loc_descr (&mem_loc_result, new_loc_descr (op, 0, 0));
> +	  break;
> +	}
>        op1 = mem_loc_descriptor (XEXP (rtl, 1), mode, mem_mode,
>  				VAR_INIT_STATUS_INITIALIZED);
>  
> --- gcc/testsuite/gcc.dg/pr102356.c.jj	2021-11-29 19:57:47.061075856 +0100
> +++ gcc/testsuite/gcc.dg/pr102356.c	2021-11-29 19:57:26.852364489 +0100
> @@ -0,0 +1,33 @@
> +/* PR rtl-optimization/102356 */
> +/* { dg-do compile { target int32plus } } */
> +/* { dg-options "-O3 -g" } */
> +
> +signed char a = 0;
> +unsigned char b = 9;
> +unsigned long long c = 0xF1FBFC17225F7A57ULL;
> +int d = 0x3A6667C6;
> +
> +unsigned char
> +foo (unsigned int x)
> +{
> +  unsigned int *e = &x;
> +  if ((c /= ((0 * (*e *= b)) <= 0)))
> +    ;
> +  for (d = 9; d > 2; d -= 2)
> +    {
> +      c = -2;
> +      do
> +	if ((*e *= *e))
> +	  {
> +	    a = 4;
> +	    do
> +	      {
> +		a -= 3;
> +		if ((*e *= *e))
> +		  b = 9;
> +	      }
> +	    while (a > 2);
> +	  }
> +      while (c++);
> +    }
> +}
> 
> 	Jakub
> 
>
  

Patch

--- gcc/simplify-rtx.c.jj	2021-11-05 00:43:22.576624649 +0100
+++ gcc/simplify-rtx.c	2021-11-29 19:46:29.674750656 +0100
@@ -37,6 +37,7 @@  along with GCC; see the file COPYING3.
 #include "selftest-rtl.h"
 #include "rtx-vector-builder.h"
 #include "rtlanal.h"
+#include "rtl-iter.h"
 
 /* Simplification and canonicalization of RTL.  */
 
@@ -2263,9 +2264,40 @@  simplify_context::simplify_associative_o
 {
   rtx tem;
 
+  if (GET_CODE (op0) == code || GET_CODE (op1) == code)
+    {
+      /* During vartrack, the expressions can grow arbitrarily large.
+	 Reassociation isn't really useful for larger expressions
+	 and can be very compile time expensive.  */
+      unsigned count = 0;
+      subrtx_iterator::array_type array;
+      FOR_EACH_SUBRTX (iter, array, op0, NONCONST)
+	{
+	  const_rtx x = *iter;
+	  if (GET_CODE (x) == code)
+	    {
+	      if (count++ >= 64)
+		return NULL_RTX;
+	    }
+	  else
+	    iter.skip_subrtxes ();
+	}
+      FOR_EACH_SUBRTX (iter, array, op1, NONCONST)
+	{
+	  const_rtx x = *iter;
+	  if (GET_CODE (x) == code)
+	    {
+	      if (count++ >= 64)
+		return NULL_RTX;
+	    }
+	  else
+	    iter.skip_subrtxes ();
+	}
+    }
+
   /* Linearize the operator to the left.  */
   if (GET_CODE (op1) == code)
     {
       /* "(a op b) op (c op d)" becomes "((a op b) op c) op d)".  */
       if (GET_CODE (op0) == code)
 	{
--- gcc/dwarf2out.c.jj	2021-11-29 14:24:14.053634713 +0100
+++ gcc/dwarf2out.c	2021-11-29 19:44:54.192113401 +0100
@@ -16363,6 +16363,15 @@  mem_loc_descriptor (rtx rtl, machine_mod
     do_binop:
       op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
 				VAR_INIT_STATUS_INITIALIZED);
+      if (XEXP (rtl, 0) == XEXP (rtl, 1))
+	{
+	  if (op0 == 0)
+	    break;
+	  mem_loc_result = op0;
+	  add_loc_descr (&mem_loc_result, new_loc_descr (DW_OP_dup, 0, 0));
+	  add_loc_descr (&mem_loc_result, new_loc_descr (op, 0, 0));
+	  break;
+	}
       op1 = mem_loc_descriptor (XEXP (rtl, 1), mode, mem_mode,
 				VAR_INIT_STATUS_INITIALIZED);
 
--- gcc/testsuite/gcc.dg/pr102356.c.jj	2021-11-29 19:57:47.061075856 +0100
+++ gcc/testsuite/gcc.dg/pr102356.c	2021-11-29 19:57:26.852364489 +0100
@@ -0,0 +1,33 @@ 
+/* PR rtl-optimization/102356 */
+/* { dg-do compile { target int32plus } } */
+/* { dg-options "-O3 -g" } */
+
+signed char a = 0;
+unsigned char b = 9;
+unsigned long long c = 0xF1FBFC17225F7A57ULL;
+int d = 0x3A6667C6;
+
+unsigned char
+foo (unsigned int x)
+{
+  unsigned int *e = &x;
+  if ((c /= ((0 * (*e *= b)) <= 0)))
+    ;
+  for (d = 9; d > 2; d -= 2)
+    {
+      c = -2;
+      do
+	if ((*e *= *e))
+	  {
+	    a = 4;
+	    do
+	      {
+		a -= 3;
+		if ((*e *= *e))
+		  b = 9;
+	      }
+	    while (a > 2);
+	  }
+      while (c++);
+    }
+}