match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

Message ID 20211125081834.GX2646553@tucnak
State Committed
Headers
Series match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417] |

Commit Message

Jakub Jelinek Nov. 25, 2021, 8:18 a.m. UTC
  Hi!

The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
changes.
The simplification triggers on
(x & 4294967040U) >= 0U
and turns it into:
x <= 255U
which is incorrect, it should fold to 1 because unsigned >= 0U is always
true and normally the
/* Non-equality compare simplifications from fold_binary  */
     (if (wi::to_wide (cst) == min)
       (if (cmp == GE_EXPR)
        { constant_boolean_node (true, type); })
simplification folds that, but this simplification was done earlier.

The simplification correctly doesn't include lt which has the same
reason why it shouldn't be handled, we'll fold it to 0 elsewhere.

But, IMNSHO while it isn't incorrect to handle le and gt there, it is
unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should
never appear, again in
/* Non-equality compare simplifications from fold_binary  */
we have a simplification for it:
       (if (cmp == LE_EXPR)
        (eq @2 @1))
       (if (cmp == GT_EXPR)
        (ne @2 @1))))
This is done for
  (cmp (convert?@2 @0) uniform_integer_cst_p@1)
and so should be done for both integers and vectors.
As the bitmask_inv_cst_vector_p simplification only handles
eq and ne for signed types, I think it can be simplified to just
following patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best
size of *-match.c, wouldn't accepting just CONSTANT_CLASS_P@1
and then just say in bitmask_inv_cst_vector_p return NULL_TREE if
it isn't INTEGER_CST or VECTOR_CST?

Also, without/with this patch I see on i686-linux (can be reproduced
with RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask* signbit-2*'"
too):
FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">\\\\s*.+{ 255,.+}"
FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7 "<=\\\\s*.+{ 255,.+}" 1
FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors)
FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1, 65535 }"
FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7 "<=\\\\s*.+{ 255,.+}" 1
FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=\\\\s*.+{ 1,.+}" 1
FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">\\\\s*.+{ 255,.+}" 1
FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7 "<=\\\\s*.+{ 255,.+}" 1
FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">\\\\s*.+{ 1,.+}" 1
FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7 "&\\\\s*.+{ 4294967290,.+}"
FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "\\\\s+>\\\\s+{ 0(, 0)+ }"
Those tests use vect_int effective target, but AFAIK that can be used only
in *.dg/vect/ because it relies on vect.exp enabling options to support
vectorization on the particular target (e.g. for i686-linux that -msse2).
I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-options
other than having the test driven by vect.exp.

And, finally, I've noticed incorrect formatting in the new
bitmask_inv_cst_vector_p routine:
  do {
    if (idx > 0)
      cst = vector_cst_elt (t, idx);
...
    builder.quick_push (newcst);
  } while (++idx < nelts);
It should be
  do
    {
      if (idx > 0)
	cst = vector_cst_elt (t, idx);
...
      builder.quick_push (newcst);
    }
  while (++idx < nelts);

2021-11-25  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/103417
	* match.pd ((X & Y) CMP 0): Only handle eq and ne.  Commonalize
	common tests.

	* gcc.c-torture/execute/pr103417.c: New test.


	Jakub
  

Comments

Tamar Christina Nov. 25, 2021, 8:23 a.m. UTC | #1
Hi Jakub,

> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Thursday, November 25, 2021 8:19 AM
> To: Richard Biener <rguenther@suse.de>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> simplification [PR103417]
> 
> Hi!
> 
> The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
> changes.
> The simplification triggers on
> (x & 4294967040U) >= 0U
> and turns it into:
> x <= 255U
> which is incorrect, it should fold to 1 because unsigned >= 0U is always true
> and normally the
> /* Non-equality compare simplifications from fold_binary  */
>      (if (wi::to_wide (cst) == min)
>        (if (cmp == GE_EXPR)
>         { constant_boolean_node (true, type); }) simplification folds that, but
> this simplification was done earlier.
> 
> The simplification correctly doesn't include lt which has the same reason why
> it shouldn't be handled, we'll fold it to 0 elsewhere.


Yes this was a bug, sorry I'm not sure why I didn't catch it...

> 
> But, IMNSHO while it isn't incorrect to handle le and gt there, it is
> unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should never appear,
> again in
> /* Non-equality compare simplifications from fold_binary  */ we have a
> simplification for it:
>        (if (cmp == LE_EXPR)
>         (eq @2 @1))
>        (if (cmp == GT_EXPR)
>         (ne @2 @1))))
> This is done for
>   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done
> for both integers and vectors.
> As the bitmask_inv_cst_vector_p simplification only handles eq and ne for
> signed types, I think it can be simplified to just following patch.

As I mentioned on the PR I don't think LE and GT should be removed, the patch
Is attempting to simplify the bitmask used because most vector ISAs can create
the simpler mask much easier than the complex mask.

It. 0xFFFFFF00 is harder to create than 0xFF.   So while for scalar it doesn't matter
as much, it does for vector code.

Regards,
Tamar

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of *-
> match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in
> bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or
> VECTOR_CST?
> 
> Also, without/with this patch I see on i686-linux (can be reproduced with
> RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask*
> signbit-2*'"
> too):
> FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">\\\\s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7
> "<=\\\\s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors)
> FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1,
> 65535 }"
> FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7
> "<=\\\\s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=\\\\s*.+{ 1,.+}" 1
> FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">\\\\s*.+{ 255,.+}"
> 1
> FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7
> "<=\\\\s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">\\\\s*.+{ 1,.+}" 1
> FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7
> "&\\\\s*.+{ 4294967290,.+}"
> FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "\\\\s+>\\\\s+{ 0(, 0)+ }"
> Those tests use vect_int effective target, but AFAIK that can be used only in
> *.dg/vect/ because it relies on vect.exp enabling options to support
> vectorization on the particular target (e.g. for i686-linux that -msse2).
> I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-
> options other than having the test driven by vect.exp.
> 
> And, finally, I've noticed incorrect formatting in the new
> bitmask_inv_cst_vector_p routine:
>   do {
>     if (idx > 0)
>       cst = vector_cst_elt (t, idx);
> ...
>     builder.quick_push (newcst);
>   } while (++idx < nelts);
> It should be
>   do
>     {
>       if (idx > 0)
> 	cst = vector_cst_elt (t, idx);
> ...
>       builder.quick_push (newcst);
>     }
>   while (++idx < nelts);
> 
> 2021-11-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/103417
> 	* match.pd ((X & Y) CMP 0): Only handle eq and ne.  Commonalize
> 	common tests.
> 
> 	* gcc.c-torture/execute/pr103417.c: New test.
> 
> --- gcc/match.pd.jj	2021-11-24 11:46:03.191918052 +0100
> +++ gcc/match.pd	2021-11-24 22:33:43.852575772 +0100
> @@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND
>  /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
>     where ~Y + 1 == pow2 and Z = ~Y.  */  (for cst (VECTOR_CST INTEGER_CST)
> - (for cmp (le eq ne ge gt)
> -      icmp (le le gt le gt)
> - (simplify
> -  (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> -   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> -     (switch
> -      (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
> -	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> -       (icmp @0 { csts; }))
> -      (if (csts && !TYPE_UNSIGNED (TREE_TYPE (@1))
> -	   && (cmp == EQ_EXPR || cmp == NE_EXPR)
> -	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> + (for cmp (eq ne)
> +      icmp (le gt)
> +  (simplify
> +   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> +    (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> +     (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> +      (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
> +       (icmp @0 { csts; })
>         (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> -	(icmp (convert:utype @0) { csts; }))))))))
> +	 (icmp (convert:utype @0) { csts; }))))))))
> 
>  /* -A CMP -B -> B CMP A.  */
>  (for cmp (tcc_comparison)
> --- gcc/testsuite/gcc.c-torture/execute/pr103417.c.jj	2021-11-24
> 22:36:00.732626424 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr103417.c	2021-11-24
> 22:35:43.964865218 +0100
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/103417 */
> +
> +struct { int a : 8; int b : 24; } c = { 0, 1 };
> +
> +int
> +main ()
> +{
> +  if (c.b && !c.b)
> +    __builtin_abort ();
> +  return 0;
> +}
> 
> 	Jakub
  
Richard Biener Nov. 25, 2021, 8:31 a.m. UTC | #2
On Thu, 25 Nov 2021, Tamar Christina wrote:

> Hi Jakub,
> 
> > -----Original Message-----
> > From: Jakub Jelinek <jakub@redhat.com>
> > Sent: Thursday, November 25, 2021 8:19 AM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> > Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> > simplification [PR103417]
> > 
> > Hi!
> > 
> > The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
> > changes.
> > The simplification triggers on
> > (x & 4294967040U) >= 0U
> > and turns it into:
> > x <= 255U
> > which is incorrect, it should fold to 1 because unsigned >= 0U is always true
> > and normally the
> > /* Non-equality compare simplifications from fold_binary  */
> >      (if (wi::to_wide (cst) == min)
> >        (if (cmp == GE_EXPR)
> >         { constant_boolean_node (true, type); }) simplification folds that, but
> > this simplification was done earlier.
> > 
> > The simplification correctly doesn't include lt which has the same reason why
> > it shouldn't be handled, we'll fold it to 0 elsewhere.
> 
> 
> Yes this was a bug, sorry I'm not sure why I didn't catch it...
> 
> > 
> > But, IMNSHO while it isn't incorrect to handle le and gt there, it is
> > unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should never appear,
> > again in
> > /* Non-equality compare simplifications from fold_binary  */ we have a
> > simplification for it:
> >        (if (cmp == LE_EXPR)
> >         (eq @2 @1))
> >        (if (cmp == GT_EXPR)
> >         (ne @2 @1))))
> > This is done for
> >   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done
> > for both integers and vectors.
> > As the bitmask_inv_cst_vector_p simplification only handles eq and ne for
> > signed types, I think it can be simplified to just following patch.

Note that would mean the transform should be ordered _after_ the above,
even if we retain it for vector le/gt.

> As I mentioned on the PR I don't think LE and GT should be removed, the patch
> Is attempting to simplify the bitmask used because most vector ISAs can create
> the simpler mask much easier than the complex mask.
> 
> It. 0xFFFFFF00 is harder to create than 0xFF.   So while for scalar it doesn't matter
> as much, it does for vector code.
>
> Regards,
> Tamar
> 
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of *-
> > match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in
> > bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or
> > VECTOR_CST?

In the end that should be recoverable by genmatch.  I do have some ideas
to improve it for size in this area, maybe during stage4.  Originally
genmatch was trying to optimize for matching speed but with now honoring
ordering of patterns that very much became secondary (note re-ordering
patterns in match.pd can also improve *-match.c size greatly!  maybe
some script can try to brute-force the "optimal" order - but note some
pattern order matters ;))

> > Also, without/with this patch I see on i686-linux (can be reproduced with
> > RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask*
> > signbit-2*'"
> > too):
> > FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
> > FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">\\\\s*.+{ 255,.+}"
> > FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
> > FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7
> > "<=\\\\s*.+{ 255,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors)
> > FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1,
> > 65535 }"
> > FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7
> > "<=\\\\s*.+{ 255,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=\\\\s*.+{ 1,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">\\\\s*.+{ 255,.+}"
> > 1
> > FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7
> > "<=\\\\s*.+{ 255,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">\\\\s*.+{ 1,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7
> > "&\\\\s*.+{ 4294967290,.+}"
> > FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "\\\\s+>\\\\s+{ 0(, 0)+ }"
> > Those tests use vect_int effective target, but AFAIK that can be used only in
> > *.dg/vect/ because it relies on vect.exp enabling options to support
> > vectorization on the particular target (e.g. for i686-linux that -msse2).
> > I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-
> > options other than having the test driven by vect.exp.
> > 
> > And, finally, I've noticed incorrect formatting in the new
> > bitmask_inv_cst_vector_p routine:
> >   do {
> >     if (idx > 0)
> >       cst = vector_cst_elt (t, idx);
> > ...
> >     builder.quick_push (newcst);
> >   } while (++idx < nelts);
> > It should be
> >   do
> >     {
> >       if (idx > 0)
> > 	cst = vector_cst_elt (t, idx);
> > ...
> >       builder.quick_push (newcst);
> >     }
> >   while (++idx < nelts);
> > 
> > 2021-11-25  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	PR tree-optimization/103417
> > 	* match.pd ((X & Y) CMP 0): Only handle eq and ne.  Commonalize
> > 	common tests.
> > 
> > 	* gcc.c-torture/execute/pr103417.c: New test.
> > 
> > --- gcc/match.pd.jj	2021-11-24 11:46:03.191918052 +0100
> > +++ gcc/match.pd	2021-11-24 22:33:43.852575772 +0100
> > @@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND
> >  /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
> >     where ~Y + 1 == pow2 and Z = ~Y.  */  (for cst (VECTOR_CST INTEGER_CST)
> > - (for cmp (le eq ne ge gt)
> > -      icmp (le le gt le gt)
> > - (simplify
> > -  (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> > -   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> > -     (switch
> > -      (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
> > -	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> > -       (icmp @0 { csts; }))
> > -      (if (csts && !TYPE_UNSIGNED (TREE_TYPE (@1))
> > -	   && (cmp == EQ_EXPR || cmp == NE_EXPR)
> > -	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> > + (for cmp (eq ne)
> > +      icmp (le gt)
> > +  (simplify
> > +   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> > +    (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> > +     (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> > +      (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
> > +       (icmp @0 { csts; })
> >         (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> > -	(icmp (convert:utype @0) { csts; }))))))))
> > +	 (icmp (convert:utype @0) { csts; }))))))))
> > 
> >  /* -A CMP -B -> B CMP A.  */
> >  (for cmp (tcc_comparison)
> > --- gcc/testsuite/gcc.c-torture/execute/pr103417.c.jj	2021-11-24
> > 22:36:00.732626424 +0100
> > +++ gcc/testsuite/gcc.c-torture/execute/pr103417.c	2021-11-24
> > 22:35:43.964865218 +0100
> > @@ -0,0 +1,11 @@
> > +/* PR tree-optimization/103417 */
> > +
> > +struct { int a : 8; int b : 24; } c = { 0, 1 };
> > +
> > +int
> > +main ()
> > +{
> > +  if (c.b && !c.b)
> > +    __builtin_abort ();
> > +  return 0;
> > +}
> > 
> > 	Jakub
> 
>
  
Jakub Jelinek Nov. 25, 2021, 8:39 a.m. UTC | #3
On Thu, Nov 25, 2021 at 08:23:50AM +0000, Tamar Christina wrote:
> > But, IMNSHO while it isn't incorrect to handle le and gt there, it is
> > unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should never appear,
> > again in
> > /* Non-equality compare simplifications from fold_binary  */ we have a
> > simplification for it:
> >        (if (cmp == LE_EXPR)
> >         (eq @2 @1))
> >        (if (cmp == GT_EXPR)
> >         (ne @2 @1))))
> > This is done for
> >   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done
> > for both integers and vectors.
> > As the bitmask_inv_cst_vector_p simplification only handles eq and ne for
> > signed types, I think it can be simplified to just following patch.
> 
> As I mentioned on the PR I don't think LE and GT should be removed, the patch
> Is attempting to simplify the bitmask used because most vector ISAs can create
> the simpler mask much easier than the complex mask.
> 
> It. 0xFFFFFF00 is harder to create than 0xFF.   So while for scalar it doesn't matter
> as much, it does for vector code.

What I'm trying to explain is that you should never see those le or gt cases
with TYPE_UNSIGNED (especially when the simplification is moved after those
/* Non-equality compare simplifications from fold_binary  */
I've mentioned), because if you try:
typedef unsigned V __attribute__((vector_size (4)));

unsigned f1 (unsigned x) { unsigned z = 0; return x > z; }
unsigned f2 (unsigned x) { unsigned z = 0; return x <= z; }
V f3 (V x) { V z = (V) {}; return x > z; }
V f4 (V x) { V z = (V) {}; return x <= z; }
you'll see that this is at ccp1 when the constants propagate simplified
using the rules I mentioned into x != 0U, x == 0U, x != (V) {} and x == (V) {}.

The important rule of match.pd is composability, the simplifications should
rely on other simplifications and not repeating all their decisions because
that makes the *match.c larger and more expensive (and a source of extra
possible bugs).

	Jakub
  
Tamar Christina Nov. 25, 2021, 8:54 a.m. UTC | #4
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Thursday, November 25, 2021 8:39 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> simplification [PR103417]
> 
> On Thu, Nov 25, 2021 at 08:23:50AM +0000, Tamar Christina wrote:
> > > But, IMNSHO while it isn't incorrect to handle le and gt there, it
> > > is unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should
> > > never appear, again in
> > > /* Non-equality compare simplifications from fold_binary  */ we have
> > > a simplification for it:
> > >        (if (cmp == LE_EXPR)
> > >         (eq @2 @1))
> > >        (if (cmp == GT_EXPR)
> > >         (ne @2 @1))))
> > > This is done for
> > >   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be
> > > done for both integers and vectors.
> > > As the bitmask_inv_cst_vector_p simplification only handles eq and
> > > ne for signed types, I think it can be simplified to just following patch.
> >
> > As I mentioned on the PR I don't think LE and GT should be removed,
> > the patch Is attempting to simplify the bitmask used because most
> > vector ISAs can create the simpler mask much easier than the complex
> mask.
> >
> > It. 0xFFFFFF00 is harder to create than 0xFF.   So while for scalar it doesn't
> matter
> > as much, it does for vector code.
> 
> What I'm trying to explain is that you should never see those le or gt cases
> with TYPE_UNSIGNED (especially when the simplification is moved after
> those
> /* Non-equality compare simplifications from fold_binary  */ I've mentioned),
> because if you try:
> typedef unsigned V __attribute__((vector_size (4)));
> 
> unsigned f1 (unsigned x) { unsigned z = 0; return x > z; } unsigned f2
> (unsigned x) { unsigned z = 0; return x <= z; } V f3 (V x) { V z = (V) {}; return x >
> z; } V f4 (V x) { V z = (V) {}; return x <= z; } you'll see that this is at ccp1 when
> the constants propagate simplified using the rules I mentioned into x != 0U, x
> == 0U, x != (V) {} and x == (V) {}.

Ah I see, sorry I didn't see that rule before, you're right that if this is ordered
after it then they can be dropped.

Thanks,
Tamar

> 
> The important rule of match.pd is composability, the simplifications should
> rely on other simplifications and not repeating all their decisions because that
> makes the *match.c larger and more expensive (and a source of extra
> possible bugs).
> 
> 	Jakub
  
Tamar Christina Nov. 25, 2021, 9:07 a.m. UTC | #5
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Thursday, November 25, 2021 8:19 AM
> To: Richard Biener <rguenther@suse.de>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> simplification [PR103417]
> 
> Hi!
> 
> The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
> changes.
> The simplification triggers on
> (x & 4294967040U) >= 0U
> and turns it into:
> x <= 255U
> which is incorrect, it should fold to 1 because unsigned >= 0U is always true
> and normally the
> /* Non-equality compare simplifications from fold_binary  */
>      (if (wi::to_wide (cst) == min)
>        (if (cmp == GE_EXPR)
>         { constant_boolean_node (true, type); }) simplification folds that, but
> this simplification was done earlier.
> 
> The simplification correctly doesn't include lt which has the same reason why
> it shouldn't be handled, we'll fold it to 0 elsewhere.
> 
> But, IMNSHO while it isn't incorrect to handle le and gt there, it is
> unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should never appear,
> again in
> /* Non-equality compare simplifications from fold_binary  */ we have a
> simplification for it:
>        (if (cmp == LE_EXPR)
>         (eq @2 @1))
>        (if (cmp == GT_EXPR)
>         (ne @2 @1))))
> This is done for
>   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done
> for both integers and vectors.
> As the bitmask_inv_cst_vector_p simplification only handles eq and ne for
> signed types, I think it can be simplified to just following patch.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of *-
> match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in
> bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or
> VECTOR_CST?
> 
> Also, without/with this patch I see on i686-linux (can be reproduced with
> RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask*
> signbit-2*'"
> too):
> FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">\\\\s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7
> "<=\\\\s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors)
> FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1,
> 65535 }"
> FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7
> "<=\\\\s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=\\\\s*.+{ 1,.+}" 1
> FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">\\\\s*.+{ 255,.+}"
> 1
> FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7
> "<=\\\\s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">\\\\s*.+{ 1,.+}" 1
> FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7
> "&\\\\s*.+{ 4294967290,.+}"
> FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "\\\\s+>\\\\s+{ 0(, 0)+ }"
> Those tests use vect_int effective target, but AFAIK that can be used only in
> *.dg/vect/ because it relies on vect.exp enabling options to support
> vectorization on the particular target (e.g. for i686-linux that -msse2).
> I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-
> options other than having the test driven by vect.exp.

Yeah, I now see that vect_int is different from some of the other effective target checks like the SVE one.
I'll move the ones testing the vector code to vect and leave the scalars where they are.

Thanks,
Tamar

> 
> And, finally, I've noticed incorrect formatting in the new
> bitmask_inv_cst_vector_p routine:
>   do {
>     if (idx > 0)
>       cst = vector_cst_elt (t, idx);
> ...
>     builder.quick_push (newcst);
>   } while (++idx < nelts);
> It should be
>   do
>     {
>       if (idx > 0)
> 	cst = vector_cst_elt (t, idx);
> ...
>       builder.quick_push (newcst);
>     }
>   while (++idx < nelts);
> 
> 2021-11-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/103417
> 	* match.pd ((X & Y) CMP 0): Only handle eq and ne.  Commonalize
> 	common tests.
> 
> 	* gcc.c-torture/execute/pr103417.c: New test.
> 
> --- gcc/match.pd.jj	2021-11-24 11:46:03.191918052 +0100
> +++ gcc/match.pd	2021-11-24 22:33:43.852575772 +0100
> @@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND
>  /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
>     where ~Y + 1 == pow2 and Z = ~Y.  */  (for cst (VECTOR_CST INTEGER_CST)
> - (for cmp (le eq ne ge gt)
> -      icmp (le le gt le gt)
> - (simplify
> -  (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> -   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> -     (switch
> -      (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
> -	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> -       (icmp @0 { csts; }))
> -      (if (csts && !TYPE_UNSIGNED (TREE_TYPE (@1))
> -	   && (cmp == EQ_EXPR || cmp == NE_EXPR)
> -	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> + (for cmp (eq ne)
> +      icmp (le gt)
> +  (simplify
> +   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> +    (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> +     (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> +      (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
> +       (icmp @0 { csts; })
>         (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> -	(icmp (convert:utype @0) { csts; }))))))))
> +	 (icmp (convert:utype @0) { csts; }))))))))
> 
>  /* -A CMP -B -> B CMP A.  */
>  (for cmp (tcc_comparison)
> --- gcc/testsuite/gcc.c-torture/execute/pr103417.c.jj	2021-11-24
> 22:36:00.732626424 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr103417.c	2021-11-24
> 22:35:43.964865218 +0100
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/103417 */
> +
> +struct { int a : 8; int b : 24; } c = { 0, 1 };
> +
> +int
> +main ()
> +{
> +  if (c.b && !c.b)
> +    __builtin_abort ();
> +  return 0;
> +}
> 
> 	Jakub
  
Richard Biener Nov. 25, 2021, 9:17 a.m. UTC | #6
On Thu, 25 Nov 2021, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Jakub Jelinek <jakub@redhat.com>
> > Sent: Thursday, November 25, 2021 8:39 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: Richard Biener <rguenther@suse.de>; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> > simplification [PR103417]
> > 
> > On Thu, Nov 25, 2021 at 08:23:50AM +0000, Tamar Christina wrote:
> > > > But, IMNSHO while it isn't incorrect to handle le and gt there, it
> > > > is unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should
> > > > never appear, again in
> > > > /* Non-equality compare simplifications from fold_binary  */ we have
> > > > a simplification for it:
> > > >        (if (cmp == LE_EXPR)
> > > >         (eq @2 @1))
> > > >        (if (cmp == GT_EXPR)
> > > >         (ne @2 @1))))
> > > > This is done for
> > > >   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be
> > > > done for both integers and vectors.
> > > > As the bitmask_inv_cst_vector_p simplification only handles eq and
> > > > ne for signed types, I think it can be simplified to just following patch.
> > >
> > > As I mentioned on the PR I don't think LE and GT should be removed,
> > > the patch Is attempting to simplify the bitmask used because most
> > > vector ISAs can create the simpler mask much easier than the complex
> > mask.
> > >
> > > It. 0xFFFFFF00 is harder to create than 0xFF.   So while for scalar it doesn't
> > matter
> > > as much, it does for vector code.
> > 
> > What I'm trying to explain is that you should never see those le or gt cases
> > with TYPE_UNSIGNED (especially when the simplification is moved after
> > those
> > /* Non-equality compare simplifications from fold_binary  */ I've mentioned),
> > because if you try:
> > typedef unsigned V __attribute__((vector_size (4)));
> > 
> > unsigned f1 (unsigned x) { unsigned z = 0; return x > z; } unsigned f2
> > (unsigned x) { unsigned z = 0; return x <= z; } V f3 (V x) { V z = (V) {}; return x >
> > z; } V f4 (V x) { V z = (V) {}; return x <= z; } you'll see that this is at ccp1 when
> > the constants propagate simplified using the rules I mentioned into x != 0U, x
> > == 0U, x != (V) {} and x == (V) {}.
> 
> Ah I see, sorry I didn't see that rule before, you're right that if this is ordered
> after it then they can be dropped.

So the patch is OK, possibly with re-ordering the matches.

Thanks,
Richard.
  
Jakub Jelinek Nov. 25, 2021, 9:52 a.m. UTC | #7
On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote:
> > Ah I see, sorry I didn't see that rule before, you're right that if this is ordered
> > after it then they can be dropped.
> 
> So the patch is OK, possibly with re-ordering the matches.

I've committed the patch as is because it has been tested that way and I'd
like to avoid dups of that PR flowing in.  Even when not reordered, the new
earlier match.pd simplification will not trigger for the lt le gt ge cases
anymore and the later old simplifications will trigger and I'd expect after
that latter simplification the earlier should trigger again because the IL
changed, no?
Tamar, can you handle the reordering together with the testsuite changes
(and perhaps formatting fixes in the tree.c routine)?

	Jakub
  
Richard Biener Nov. 25, 2021, 10:58 a.m. UTC | #8
On Thu, 25 Nov 2021, Jakub Jelinek wrote:

> On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote:
> > > Ah I see, sorry I didn't see that rule before, you're right that if this is ordered
> > > after it then they can be dropped.
> > 
> > So the patch is OK, possibly with re-ordering the matches.
> 
> I've committed the patch as is because it has been tested that way and I'd
> like to avoid dups of that PR flowing in.  Even when not reordered, the new
> earlier match.pd simplification will not trigger for the lt le gt ge cases
> anymore and the later old simplifications will trigger and I'd expect after
> that latter simplification the earlier should trigger again because the IL
> changed, no?

Yes, the result always is re-folded.

> Tamar, can you handle the reordering together with the testsuite changes
> (and perhaps formatting fixes in the tree.c routine)?
  
Tamar Christina Nov. 25, 2021, 3:28 p.m. UTC | #9
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Thursday, November 25, 2021 9:53 AM
> To: Richard Biener <rguenther@suse.de>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> simplification [PR103417]
> 
> On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote:
> > > Ah I see, sorry I didn't see that rule before, you're right that if
> > > this is ordered after it then they can be dropped.
> >
> > So the patch is OK, possibly with re-ordering the matches.
> 
> I've committed the patch as is because it has been tested that way and I'd
> like to avoid dups of that PR flowing in.  Even when not reordered, the new
> earlier match.pd simplification will not trigger for the lt le gt ge cases anymore
> and the later old simplifications will trigger and I'd expect after that latter
> simplification the earlier should trigger again because the IL changed, no?
> Tamar, can you handle the reordering together with the testsuite changes
> (and perhaps formatting fixes in the tree.c routine)?

Yes I will, I'll send a patch tomorrow morning. Thanks!

Regards,
Tamar
> 
> 	Jakub
  

Patch

--- gcc/match.pd.jj	2021-11-24 11:46:03.191918052 +0100
+++ gcc/match.pd	2021-11-24 22:33:43.852575772 +0100
@@ -5214,20 +5214,16 @@  (define_operator_list SYNC_FETCH_AND_AND
 /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
    where ~Y + 1 == pow2 and Z = ~Y.  */
 (for cst (VECTOR_CST INTEGER_CST)
- (for cmp (le eq ne ge gt)
-      icmp (le le gt le gt)
- (simplify
-  (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
-   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
-     (switch
-      (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
-	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
-       (icmp @0 { csts; }))
-      (if (csts && !TYPE_UNSIGNED (TREE_TYPE (@1))
-	   && (cmp == EQ_EXPR || cmp == NE_EXPR)
-	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
+ (for cmp (eq ne)
+      icmp (le gt)
+  (simplify
+   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
+    (with { tree csts = bitmask_inv_cst_vector_p (@1); }
+     (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
+      (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
+       (icmp @0 { csts; })
        (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
-	(icmp (convert:utype @0) { csts; }))))))))
+	 (icmp (convert:utype @0) { csts; }))))))))
 
 /* -A CMP -B -> B CMP A.  */
 (for cmp (tcc_comparison)
--- gcc/testsuite/gcc.c-torture/execute/pr103417.c.jj	2021-11-24 22:36:00.732626424 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr103417.c	2021-11-24 22:35:43.964865218 +0100
@@ -0,0 +1,11 @@ 
+/* PR tree-optimization/103417 */
+
+struct { int a : 8; int b : 24; } c = { 0, 1 };
+
+int
+main ()
+{
+  if (c.b && !c.b)
+    __builtin_abort ();
+  return 0;
+}