Implement constant-folding simplifications of reductions.
Commit Message
This patch addresses a code quality regression in GCC 12 by implementing
some constant folding/simplification transformations for REDUC_PLUS_EXPR
in match.pd. The motivating example is gcc.dg/vect/pr89440.c which with
-O2 -ffast-math (with vectorization now enabled) gets optimized to:
float f (float x)
{
vector(4) float vect_x_14.11;
vector(4) float _2;
float _32;
_2 = {x_9(D), 0.0, 0.0, 0.0};
vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 };
_32 = .REDUC_PLUS (vect_x_14.11_29); [tail call]
return _32;
}
With these proposed new transformations, we can simplify the
above code even further.
float f (float x)
{
float _32;
_32 = x_9(D) + 1.36e+2;
return _32;
}
[which happens to match what we'd produce with -fno-tree-vectorize,
and with GCC 11].
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures. Ok for mainline?
2022-02-21 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* fold-const.cc (ctor_single_nonzero_element): New function to
return the single non-zero element of a (vector) constructor.
* fold-const.h (ctor_single_nonzero_element): Prototype here.
* match.pd (reduc (constructor@0)): Simplify reductions of a
constructor containing a single non-zero element.
(reduc (@0 op VECTOR_CST) -> (reduc @0) op CONST): Simplify
reductions of vector operations of the same operator with
constant vector operands.
gcc/testsuite/ChangeLog
* gcc.dg/fold-reduc-1.c: New test case.
Thanks in advance,
Roger
--
Comments
On Mon, 21 Feb 2022, Roger Sayle wrote:
> +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST). */
> +(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX
> + IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR)
> + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
> + (simplify (reduc (op @0 VECTOR_CST@1))
> + (op (reduc:type @0) (reduc:type @1))))
I wonder if we need to test flag_associative_math for the 'plus' case,
or if the presence of IFN_REDUC_PLUS is enough to justify the possible
loss of precision.
Hi Marc,
I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
reduction to be done in any order, for example the testcase requires
-ffast-math to allow the REDUC_PLUS to be introduced in the first place.
This also applies explains why the patch doesn't need to distinguish
negative zeros from positive zeros in ctor_single_nonzero_element
(but it's perhaps something to beware of in uses of VECTOR_CST's
single_nonzero_element).
Cheers,
Roger
--
> -----Original Message-----
> From: Marc Glisse <marc.glisse@inria.fr>
> Sent: 21 February 2022 08:21
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Implement constant-folding simplifications of
reductions.
>
> On Mon, 21 Feb 2022, Roger Sayle wrote:
>
> > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC
> (VECTOR_CST).
> > +*/ (for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN
> IFN_REDUC_FMAX
> > + IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR
> IFN_REDUC_XOR)
> > + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
> > + (simplify (reduc (op @0 VECTOR_CST@1))
> > + (op (reduc:type @0) (reduc:type @1))))
>
> I wonder if we need to test flag_associative_math for the 'plus' case, or
if the
> presence of IFN_REDUC_PLUS is enough to justify the possible loss of
precision.
>
> --
> Marc Glisse
On Mon, Feb 21, 2022 at 9:31 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Marc,
> I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
> reduction to be done in any order, for example the testcase requires
> -ffast-math to allow the REDUC_PLUS to be introduced in the first place.
> This also applies explains why the patch doesn't need to distinguish
> negative zeros from positive zeros in ctor_single_nonzero_element
> (but it's perhaps something to beware of in uses of VECTOR_CST's
> single_nonzero_element).
.IFN_REDUC_PLUS directly maps to the reduc_plus_scal optab
which does not specify an operation order. There's also
fold_left_plus which does (left-to-right).
An argument could be made that constant folding should do what
the CPU will end up doing but we're already not doing that for
double arithmetic and flush-to-zero enabled with -ffast-math and
denormals I think.
+/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE. */
+tree
+ctor_single_nonzero_element (const_tree t)
+{
+ unsigned HOST_WIDE_INT idx;
+ constructor_elt *ce;
+ tree elt = NULL_TREE;
+
+ if (TREE_CODE (t) == SSA_NAME)
+ {
+ gassign *def = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (t));
I think this function should expect a CONSTRUCTOR 'T' and the use in
match.pd be adjusted similar to other CONSTRUCTOR users like
(simplify
(BIT_FIELD_REF CONSTRUCTOR@0 @1 @2)
...
(with
{
tree ctor = (TREE_CODE (@0) == SSA_NAME
? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0);
+/* Fold reduction of a single nonzero element constructor. */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR)
+ (simplify (reduc (CONSTRUCTOR@0))
+ (with { tree elt = ctor_single_nonzero_element (@0); }
+ (if (elt)
+ (non_lvalue { elt; })))))
is (non_value... ) really necessary? I highly doubt so.
I wonder if there's a case like { _1, 0., 0., -0. } where independent on
the order of the operations this transform relies on !HONOR_SIGNED_ZEROS?
It likely also should demote _1 from sNaN to qNaN and thus relies on
-fno-signalling-nans?
Otherwise OK.
Thanks,
Richard.
> Cheers,
> Roger
> --
>
> > -----Original Message-----
> > From: Marc Glisse <marc.glisse@inria.fr>
> > Sent: 21 February 2022 08:21
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Implement constant-folding simplifications of
> reductions.
> >
> > On Mon, 21 Feb 2022, Roger Sayle wrote:
> >
> > > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC
> > (VECTOR_CST).
> > > +*/ (for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN
> > IFN_REDUC_FMAX
> > > + IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR
> > IFN_REDUC_XOR)
> > > + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
> > > + (simplify (reduc (op @0 VECTOR_CST@1))
> > > + (op (reduc:type @0) (reduc:type @1))))
> >
> > I wonder if we need to test flag_associative_math for the 'plus' case, or
> if the
> > presence of IFN_REDUC_PLUS is enough to justify the possible loss of
> precision.
> >
> > --
> > Marc Glisse
>
On 2/21/2022 3:55 AM, Richard Biener via Gcc-patches wrote:
> On Mon, Feb 21, 2022 at 9:31 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>>
>> Hi Marc,
>> I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
>> reduction to be done in any order, for example the testcase requires
>> -ffast-math to allow the REDUC_PLUS to be introduced in the first place.
>> This also applies explains why the patch doesn't need to distinguish
>> negative zeros from positive zeros in ctor_single_nonzero_element
>> (but it's perhaps something to beware of in uses of VECTOR_CST's
>> single_nonzero_element).
> .IFN_REDUC_PLUS directly maps to the reduc_plus_scal optab
> which does not specify an operation order. There's also
> fold_left_plus which does (left-to-right).
fold-left-plus is meant to be a strictly in-order reduction, which
implies (at least to me) that REDUC_PLUS allows reassociation.
> An argument could be made that constant folding should do what
> the CPU will end up doing but we're already not doing that for
> double arithmetic and flush-to-zero enabled with -ffast-math and
> denormals I think.
Right. We can also lose inexact state when folding, but I think that's
largely considered OK as well.
jeff
@@ -16792,6 +16792,33 @@ address_compare (tree_code code, tree type, tree op0, tree op1,
return equal;
}
+/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE. */
+tree
+ctor_single_nonzero_element (const_tree t)
+{
+ unsigned HOST_WIDE_INT idx;
+ constructor_elt *ce;
+ tree elt = NULL_TREE;
+
+ if (TREE_CODE (t) == SSA_NAME)
+ {
+ gassign *def = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (t));
+ if (gimple_assign_rhs_code (def) == CONSTRUCTOR)
+ t = gimple_assign_rhs1 (def);
+ }
+
+ if (TREE_CODE (t) != CONSTRUCTOR)
+ return NULL_TREE;
+ for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (t), idx, &ce); idx++)
+ if (!integer_zerop (ce->value) && !real_zerop (ce->value))
+ {
+ if (elt)
+ return NULL_TREE;
+ elt = ce->value;
+ }
+ return elt;
+}
+
#if CHECKING_P
namespace selftest {
@@ -224,6 +224,7 @@ extern const char *c_getstr (tree);
extern wide_int tree_nonzero_bits (const_tree);
extern int address_compare (tree_code, tree, tree, tree, tree &, tree &,
poly_int64 &, poly_int64 &, bool);
+extern tree ctor_single_nonzero_element (const_tree);
/* Return OFF converted to a pointer offset type suitable as offset for
POINTER_PLUS_EXPR. Use location LOC for this conversion. */
@@ -7528,6 +7528,20 @@ and,
(BIT_FIELD_REF:elt_type @0 { size; } { pos; })
{ elt; })))))))
+/* Fold reduction of a single nonzero element constructor. */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR)
+ (simplify (reduc (CONSTRUCTOR@0))
+ (with { tree elt = ctor_single_nonzero_element (@0); }
+ (if (elt)
+ (non_lvalue { elt; })))))
+
+/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST). */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX
+ IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR)
+ op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
+ (simplify (reduc (op @0 VECTOR_CST@1))
+ (op (reduc:type @0) (reduc:type @1))))
+
(simplify
(vec_perm @0 @1 VECTOR_CST@2)
(with
new file mode 100644
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
+float foo (float x)
+{
+ int i;
+ float j;
+ float a = 0;
+ for (i = 0; i < 4; ++i)
+ {
+ for (j = 0; j < 4; ++j)
+ {
+ a += 1;
+ x += a;
+ }
+ }
+ return x;
+}
+
+/* { dg-final { scan-tree-dump-not "REDUC_PLUS" "optimized"} } */