fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].

Message ID a45c5452-1b17-43fe-a858-7bce23ee88f1@gmail.com
State New
Headers
Series fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971]. |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Testing passed

Commit Message

Robin Dapp Dec. 18, 2023, 7:50 p.m. UTC
  Hi,

found in PR112971, this patch adds folding support for bitwise operations
of const duplicate zero vectors and stepped vectors.
On riscv we have the situation that a folding would perpetually continue
without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
not fold to {0, 0, 0, ...}.

Bootstrapped and regtested on x86 and aarch64, regtested on riscv.

I won't be available to respond quickly until next year.  Pan or Juzhe,
as discussed, feel free to continue with possible revisions.

Regards
 Robin


gcc/ChangeLog:

	PR middle-end/112971

	* fold-const.cc (const_binop): Handle
	zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr112971.c: New test.
---
 gcc/fold-const.cc                              | 14 +++++++++++++-
 .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
  

Comments

juzhe.zhong@rivai.ai Dec. 18, 2023, 10:49 p.m. UTC | #1
Thanks Robin send initial patch to fix this ICE bug.

CC to Richard S, Richard B, and Andrew.

Thanks.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-12-19 03:50
To: gcc-patches
CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
Hi,
 
found in PR112971, this patch adds folding support for bitwise operations
of const duplicate zero vectors and stepped vectors.
On riscv we have the situation that a folding would perpetually continue
without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
not fold to {0, 0, 0, ...}.
 
Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
 
I won't be available to respond quickly until next year.  Pan or Juzhe,
as discussed, feel free to continue with possible revisions.
 
Regards
Robin
 
 
gcc/ChangeLog:
 
PR middle-end/112971
 
* fold-const.cc (const_binop): Handle
zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr112971.c: New test.
---
gcc/fold-const.cc                              | 14 +++++++++++++-
.../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
2 files changed, 31 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
 
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index f5d68ac323a..43ed097bf5c 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
     {
       tree type = TREE_TYPE (arg1);
       bool step_ok_p;
+
+      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
       if (VECTOR_CST_STEPPED_P (arg1)
-   && VECTOR_CST_STEPPED_P (arg2))
+   && VECTOR_CST_DUPLICATE_P (arg2)
+   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
+ step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
+   || code == BIT_XOR_EXPR;
+      else if (VECTOR_CST_STEPPED_P (arg2)
+        && VECTOR_CST_DUPLICATE_P (arg1)
+        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
+ step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
+   || code == BIT_XOR_EXPR;
+      else if (VECTOR_CST_STEPPED_P (arg1)
+        && VECTOR_CST_STEPPED_P (arg2))
/* We can operate directly on the encoding if:
      a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
new file mode 100644
index 00000000000..816ebd3c493
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
@@ -0,0 +1,18 @@
+/* { dg-do compile }  */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
+
+int a;
+short b[9];
+char c, d;
+void e() {
+  d = 0;
+  for (;; d++) {
+    if (b[d])
+      break;
+    a = 8;
+    for (; a >= 0; a--) {
+      char *f = &c;
+      *f &= d == (a & d);
+    }
+  }
+}
-- 
2.43.0
  
Richard Biener Dec. 19, 2023, 8:15 a.m. UTC | #2
On Tue, 19 Dec 2023, ??? wrote:

> Thanks Robin send initial patch to fix this ICE bug.
> 
> CC to Richard S, Richard B, and Andrew.

Just one comment, it seems that VECTOR_CST_STEPPED_P should
implicitly include VECTOR_CST_DUPLICATE_P since it would be
a step of zero (but as implemented it doesn't catch this).
Looking at the implementation it's odd that we can handle
VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
== 3 (stepped) but not == 2 (not sure what that would be).

Maybe the tests can be re-formulated in terms of
VECTOR_CST_NELTS_PER_PATTERN?

Richard.

> Thanks.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Robin Dapp
> Date: 2023-12-19 03:50
> To: gcc-patches
> CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
> Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> Hi,
>  
> found in PR112971, this patch adds folding support for bitwise operations
> of const duplicate zero vectors and stepped vectors.
> On riscv we have the situation that a folding would perpetually continue
> without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> not fold to {0, 0, 0, ...}.
>  
> Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
>  
> I won't be available to respond quickly until next year.  Pan or Juzhe,
> as discussed, feel free to continue with possible revisions.
>  
> Regards
> Robin
>  
>  
> gcc/ChangeLog:
>  
> PR middle-end/112971
>  
> * fold-const.cc (const_binop): Handle
> zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
>  
> gcc/testsuite/ChangeLog:
>  
> * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> ---
> gcc/fold-const.cc                              | 14 +++++++++++++-
> .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
> 2 files changed, 31 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
>  
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index f5d68ac323a..43ed097bf5c 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
>      {
>        tree type = TREE_TYPE (arg1);
>        bool step_ok_p;
> +
> +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
>        if (VECTOR_CST_STEPPED_P (arg1)
> -   && VECTOR_CST_STEPPED_P (arg2))
> +   && VECTOR_CST_DUPLICATE_P (arg2)
> +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> +   || code == BIT_XOR_EXPR;
> +      else if (VECTOR_CST_STEPPED_P (arg2)
> +        && VECTOR_CST_DUPLICATE_P (arg1)
> +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> +   || code == BIT_XOR_EXPR;
> +      else if (VECTOR_CST_STEPPED_P (arg1)
> +        && VECTOR_CST_STEPPED_P (arg2))
> /* We can operate directly on the encoding if:
>       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> new file mode 100644
> index 00000000000..816ebd3c493
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile }  */
> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> +
> +int a;
> +short b[9];
> +char c, d;
> +void e() {
> +  d = 0;
> +  for (;; d++) {
> +    if (b[d])
> +      break;
> +    a = 8;
> +    for (; a >= 0; a--) {
> +      char *f = &c;
> +      *f &= d == (a & d);
> +    }
> +  }
> +}
>
  
juzhe.zhong@rivai.ai Dec. 19, 2023, 8:54 a.m. UTC | #3
Hi,Richard. Do you mean add the check as follows ?

      if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
          && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3
          && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
        step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
          || code == BIT_XOR_EXPR);
      else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
          && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
          && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
        step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
          || code == BIT_XOR_EXPR);



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-12-19 16:15
To: 钟居哲
CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
On Tue, 19 Dec 2023, ??? wrote:
 
> Thanks Robin send initial patch to fix this ICE bug.
> 
> CC to Richard S, Richard B, and Andrew.
 
Just one comment, it seems that VECTOR_CST_STEPPED_P should
implicitly include VECTOR_CST_DUPLICATE_P since it would be
a step of zero (but as implemented it doesn't catch this).
Looking at the implementation it's odd that we can handle
VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
== 3 (stepped) but not == 2 (not sure what that would be).
 
Maybe the tests can be re-formulated in terms of
VECTOR_CST_NELTS_PER_PATTERN?
 
Richard.
 
> Thanks.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Robin Dapp
> Date: 2023-12-19 03:50
> To: gcc-patches
> CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
> Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> Hi,
>  
> found in PR112971, this patch adds folding support for bitwise operations
> of const duplicate zero vectors and stepped vectors.
> On riscv we have the situation that a folding would perpetually continue
> without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> not fold to {0, 0, 0, ...}.
>  
> Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
>  
> I won't be available to respond quickly until next year.  Pan or Juzhe,
> as discussed, feel free to continue with possible revisions.
>  
> Regards
> Robin
>  
>  
> gcc/ChangeLog:
>  
> PR middle-end/112971
>  
> * fold-const.cc (const_binop): Handle
> zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
>  
> gcc/testsuite/ChangeLog:
>  
> * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> ---
> gcc/fold-const.cc                              | 14 +++++++++++++-
> .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
> 2 files changed, 31 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
>  
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index f5d68ac323a..43ed097bf5c 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
>      {
>        tree type = TREE_TYPE (arg1);
>        bool step_ok_p;
> +
> +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
>        if (VECTOR_CST_STEPPED_P (arg1)
> -   && VECTOR_CST_STEPPED_P (arg2))
> +   && VECTOR_CST_DUPLICATE_P (arg2)
> +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> +   || code == BIT_XOR_EXPR;
> +      else if (VECTOR_CST_STEPPED_P (arg2)
> +        && VECTOR_CST_DUPLICATE_P (arg1)
> +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> +   || code == BIT_XOR_EXPR;
> +      else if (VECTOR_CST_STEPPED_P (arg1)
> +        && VECTOR_CST_STEPPED_P (arg2))
> /* We can operate directly on the encoding if:
>       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> new file mode 100644
> index 00000000000..816ebd3c493
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile }  */
> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> +
> +int a;
> +short b[9];
> +char c, d;
> +void e() {
> +  d = 0;
> +  for (;; d++) {
> +    if (b[d])
> +      break;
> +    a = 8;
> +    for (; a >= 0; a--) {
> +      char *f = &c;
> +      *f &= d == (a & d);
> +    }
> +  }
> +}
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
  
Richard Biener Dec. 19, 2023, 9:12 a.m. UTC | #4
On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:

> Hi?Richard. Do you mean add the check as follows ?
> 
>       if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
>           && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3

Or <= 3 which would allow combining.  As said, not sure what
== 2 would be and whether that would work.

Btw, integer_allonesp should also allow to be optimized for
and/ior at least.  Possibly IOR/AND with the sign bit for
signed elements as well.

I wonder if there's a programmatic way to identify OK cases
rather than enumerating them.

>           && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>           || code == BIT_XOR_EXPR);
>       else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
>           && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
>           && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>           || code == BIT_XOR_EXPR);
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-19 16:15
> To: ???
> CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
> Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> On Tue, 19 Dec 2023, ??? wrote:
>  
> > Thanks Robin send initial patch to fix this ICE bug.
> > 
> > CC to Richard S, Richard B, and Andrew.
>  
> Just one comment, it seems that VECTOR_CST_STEPPED_P should
> implicitly include VECTOR_CST_DUPLICATE_P since it would be
> a step of zero (but as implemented it doesn't catch this).
> Looking at the implementation it's odd that we can handle
> VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
> == 3 (stepped) but not == 2 (not sure what that would be).
>  
> Maybe the tests can be re-formulated in terms of
> VECTOR_CST_NELTS_PER_PATTERN?
>  
> Richard.
>  
> > Thanks.
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Robin Dapp
> > Date: 2023-12-19 03:50
> > To: gcc-patches
> > CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
> > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> > Hi,
> >  
> > found in PR112971, this patch adds folding support for bitwise operations
> > of const duplicate zero vectors and stepped vectors.
> > On riscv we have the situation that a folding would perpetually continue
> > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> > not fold to {0, 0, 0, ...}.
> >  
> > Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
> >  
> > I won't be available to respond quickly until next year.  Pan or Juzhe,
> > as discussed, feel free to continue with possible revisions.
> >  
> > Regards
> > Robin
> >  
> >  
> > gcc/ChangeLog:
> >  
> > PR middle-end/112971
> >  
> > * fold-const.cc (const_binop): Handle
> > zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
> >  
> > gcc/testsuite/ChangeLog:
> >  
> > * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> > ---
> > gcc/fold-const.cc                              | 14 +++++++++++++-
> > .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
> > 2 files changed, 31 insertions(+), 1 deletion(-)
> > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> >  
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index f5d68ac323a..43ed097bf5c 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
> >      {
> >        tree type = TREE_TYPE (arg1);
> >        bool step_ok_p;
> > +
> > +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
> >        if (VECTOR_CST_STEPPED_P (arg1)
> > -   && VECTOR_CST_STEPPED_P (arg2))
> > +   && VECTOR_CST_DUPLICATE_P (arg2)
> > +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > +   || code == BIT_XOR_EXPR;
> > +      else if (VECTOR_CST_STEPPED_P (arg2)
> > +        && VECTOR_CST_DUPLICATE_P (arg1)
> > +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > +   || code == BIT_XOR_EXPR;
> > +      else if (VECTOR_CST_STEPPED_P (arg1)
> > +        && VECTOR_CST_STEPPED_P (arg2))
> > /* We can operate directly on the encoding if:
> >       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > new file mode 100644
> > index 00000000000..816ebd3c493
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile }  */
> > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> > +
> > +int a;
> > +short b[9];
> > +char c, d;
> > +void e() {
> > +  d = 0;
> > +  for (;; d++) {
> > +    if (b[d])
> > +      break;
> > +    a = 8;
> > +    for (; a >= 0; a--) {
> > +      char *f = &c;
> > +      *f &= d == (a & d);
> > +    }
> > +  }
> > +}
> > 
>  
>
  
juzhe.zhong@rivai.ai Dec. 19, 2023, 9:35 a.m. UTC | #5
Hi, Richard.

After investigating the codes:
/* Return true if EXPR is the integer constant zero or a complex constant
   of zero, or a location wrapper for such a constant.  */

bool
integer_zerop (const_tree expr)
{
  STRIP_ANY_LOCATION_WRAPPER (expr);

  switch (TREE_CODE (expr))
    {
    case INTEGER_CST:
      return wi::to_wide (expr) == 0;
    case COMPLEX_CST:
      return (integer_zerop (TREE_REALPART (expr))
              && integer_zerop (TREE_IMAGPART (expr)));
    case VECTOR_CST:
      return (VECTOR_CST_NPATTERNS (expr) == 1
              && VECTOR_CST_DUPLICATE_P (expr)
              && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
    default:
      return false;
    }
}

I wonder whether we can simplify the codes as follows :?
      if (integer_zerop (arg1) || integer_zerop (arg2))
        step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
                     || code == BIT_XOR_EXPR);




juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-12-19 17:12
To: juzhe.zhong@rivai.ai
CC: Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; pinskia
Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
 
> Hi?Richard. Do you mean add the check as follows ?
> 
>       if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
>           && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3
 
Or <= 3 which would allow combining.  As said, not sure what
== 2 would be and whether that would work.
 
Btw, integer_allonesp should also allow to be optimized for
and/ior at least.  Possibly IOR/AND with the sign bit for
signed elements as well.
 
I wonder if there's a programmatic way to identify OK cases
rather than enumerating them.
 
>           && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>           || code == BIT_XOR_EXPR);
>       else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
>           && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
>           && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>           || code == BIT_XOR_EXPR);
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-19 16:15
> To: ???
> CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
> Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> On Tue, 19 Dec 2023, ??? wrote:
>  
> > Thanks Robin send initial patch to fix this ICE bug.
> > 
> > CC to Richard S, Richard B, and Andrew.
>  
> Just one comment, it seems that VECTOR_CST_STEPPED_P should
> implicitly include VECTOR_CST_DUPLICATE_P since it would be
> a step of zero (but as implemented it doesn't catch this).
> Looking at the implementation it's odd that we can handle
> VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
> == 3 (stepped) but not == 2 (not sure what that would be).
>  
> Maybe the tests can be re-formulated in terms of
> VECTOR_CST_NELTS_PER_PATTERN?
>  
> Richard.
>  
> > Thanks.
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Robin Dapp
> > Date: 2023-12-19 03:50
> > To: gcc-patches
> > CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
> > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> > Hi,
> >  
> > found in PR112971, this patch adds folding support for bitwise operations
> > of const duplicate zero vectors and stepped vectors.
> > On riscv we have the situation that a folding would perpetually continue
> > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> > not fold to {0, 0, 0, ...}.
> >  
> > Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
> >  
> > I won't be available to respond quickly until next year.  Pan or Juzhe,
> > as discussed, feel free to continue with possible revisions.
> >  
> > Regards
> > Robin
> >  
> >  
> > gcc/ChangeLog:
> >  
> > PR middle-end/112971
> >  
> > * fold-const.cc (const_binop): Handle
> > zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
> >  
> > gcc/testsuite/ChangeLog:
> >  
> > * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> > ---
> > gcc/fold-const.cc                              | 14 +++++++++++++-
> > .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
> > 2 files changed, 31 insertions(+), 1 deletion(-)
> > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> >  
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index f5d68ac323a..43ed097bf5c 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
> >      {
> >        tree type = TREE_TYPE (arg1);
> >        bool step_ok_p;
> > +
> > +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
> >        if (VECTOR_CST_STEPPED_P (arg1)
> > -   && VECTOR_CST_STEPPED_P (arg2))
> > +   && VECTOR_CST_DUPLICATE_P (arg2)
> > +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > +   || code == BIT_XOR_EXPR;
> > +      else if (VECTOR_CST_STEPPED_P (arg2)
> > +        && VECTOR_CST_DUPLICATE_P (arg1)
> > +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > +   || code == BIT_XOR_EXPR;
> > +      else if (VECTOR_CST_STEPPED_P (arg1)
> > +        && VECTOR_CST_STEPPED_P (arg2))
> > /* We can operate directly on the encoding if:
> >       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > new file mode 100644
> > index 00000000000..816ebd3c493
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile }  */
> > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> > +
> > +int a;
> > +short b[9];
> > +char c, d;
> > +void e() {
> > +  d = 0;
> > +  for (;; d++) {
> > +    if (b[d])
> > +      break;
> > +    a = 8;
> > +    for (; a >= 0; a--) {
> > +      char *f = &c;
> > +      *f &= d == (a & d);
> > +    }
> > +  }
> > +}
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
  
Jakub Jelinek Dec. 19, 2023, 9:45 a.m. UTC | #6
On Tue, Dec 19, 2023 at 05:35:14PM +0800, juzhe.zhong@rivai.ai wrote:
> I wonder whether we can simplify the codes as follows :?
>       if (integer_zerop (arg1) || integer_zerop (arg2))
>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>                      || code == BIT_XOR_EXPR);

What about integer_all_onesp (arg1) || integer_all_onesp (arg2) ?
(x & -1) == x
(x | -1) == -1
(x ^ -1) == ~x
even for the VL vectors, no?

	Jakub
  
juzhe.zhong@rivai.ai Dec. 19, 2023, 9:49 a.m. UTC | #7
>> (x & -1) == x
>>(x | -1) == -1
>>(x ^ -1) == ~x

Looks reasonable to me. 

Do you mean modify the code as follows ?

       if (integer_zerop (arg1) || integer_zerop (arg2)) || integer_onep (arg1) || integer_onep(arg2))
         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
                      || code == BIT_XOR_EXPR);



juzhe.zhong@rivai.ai
 
From: Jakub Jelinek
Date: 2023-12-19 17:45
To: juzhe.zhong@rivai.ai
CC: rguenther; Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; pinskia
Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
On Tue, Dec 19, 2023 at 05:35:14PM +0800, juzhe.zhong@rivai.ai wrote:
> I wonder whether we can simplify the codes as follows :?
>       if (integer_zerop (arg1) || integer_zerop (arg2))
>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>                      || code == BIT_XOR_EXPR);
 
What about integer_all_onesp (arg1) || integer_all_onesp (arg2) ?
(x & -1) == x
(x | -1) == -1
(x ^ -1) == ~x
even for the VL vectors, no?
 
Jakub
  
Jakub Jelinek Dec. 19, 2023, 9:55 a.m. UTC | #8
On Tue, Dec 19, 2023 at 05:49:48PM +0800, juzhe.zhong@rivai.ai wrote:
> >> (x & -1) == x
> >>(x | -1) == -1
> >>(x ^ -1) == ~x
> 
> Looks reasonable to me. 
> 
> Do you mean modify the code as follows ?
> 
>        if (integer_zerop (arg1) || integer_zerop (arg2)) || integer_onep (arg1) || integer_onep(arg2))

That is too long line and integer_onep is not integer_all_onesp (and for
integer_onep something like that certainly doesn't apply).
Plus it needs to be tested it does the right thing.

>          step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>                       || code == BIT_XOR_EXPR);

	Jakub
  
Richard Biener Dec. 19, 2023, 10:11 a.m. UTC | #9
On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:

> Hi, Richard.
> 
> After investigating the codes:
> /* Return true if EXPR is the integer constant zero or a complex constant
>    of zero, or a location wrapper for such a constant.  */
> 
> bool
> integer_zerop (const_tree expr)
> {
>   STRIP_ANY_LOCATION_WRAPPER (expr);
> 
>   switch (TREE_CODE (expr))
>     {
>     case INTEGER_CST:
>       return wi::to_wide (expr) == 0;
>     case COMPLEX_CST:
>       return (integer_zerop (TREE_REALPART (expr))
>               && integer_zerop (TREE_IMAGPART (expr)));
>     case VECTOR_CST:
>       return (VECTOR_CST_NPATTERNS (expr) == 1
>               && VECTOR_CST_DUPLICATE_P (expr)
>               && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
>     default:
>       return false;
>     }
> }
> 
> I wonder whether we can simplify the codes as follows :?
>       if (integer_zerop (arg1) || integer_zerop (arg2))
>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>                      || code == BIT_XOR_EXPR);

Possibly.  I'll let Richard S. comment on the whole structure.

Richard.

> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-12-19 17:12
> To: juzhe.zhong@rivai.ai
> CC: Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; pinskia
> Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
>  
> > Hi?Richard. Do you mean add the check as follows ?
> > 
> >       if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
> >           && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3
>  
> Or <= 3 which would allow combining.  As said, not sure what
> == 2 would be and whether that would work.
>  
> Btw, integer_allonesp should also allow to be optimized for
> and/ior at least.  Possibly IOR/AND with the sign bit for
> signed elements as well.
>  
> I wonder if there's a programmatic way to identify OK cases
> rather than enumerating them.
>  
> >           && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >           || code == BIT_XOR_EXPR);
> >       else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
> >           && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
> >           && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >           || code == BIT_XOR_EXPR);
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-12-19 16:15
> > To: ???
> > CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
> > Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> > On Tue, 19 Dec 2023, ??? wrote:
> >  
> > > Thanks Robin send initial patch to fix this ICE bug.
> > > 
> > > CC to Richard S, Richard B, and Andrew.
> >  
> > Just one comment, it seems that VECTOR_CST_STEPPED_P should
> > implicitly include VECTOR_CST_DUPLICATE_P since it would be
> > a step of zero (but as implemented it doesn't catch this).
> > Looking at the implementation it's odd that we can handle
> > VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
> > == 3 (stepped) but not == 2 (not sure what that would be).
> >  
> > Maybe the tests can be re-formulated in terms of
> > VECTOR_CST_NELTS_PER_PATTERN?
> >  
> > Richard.
> >  
> > > Thanks.
> > > 
> > > 
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Robin Dapp
> > > Date: 2023-12-19 03:50
> > > To: gcc-patches
> > > CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
> > > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> > > Hi,
> > >  
> > > found in PR112971, this patch adds folding support for bitwise operations
> > > of const duplicate zero vectors and stepped vectors.
> > > On riscv we have the situation that a folding would perpetually continue
> > > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> > > not fold to {0, 0, 0, ...}.
> > >  
> > > Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
> > >  
> > > I won't be available to respond quickly until next year.  Pan or Juzhe,
> > > as discussed, feel free to continue with possible revisions.
> > >  
> > > Regards
> > > Robin
> > >  
> > >  
> > > gcc/ChangeLog:
> > >  
> > > PR middle-end/112971
> > >  
> > > * fold-const.cc (const_binop): Handle
> > > zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
> > >  
> > > gcc/testsuite/ChangeLog:
> > >  
> > > * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> > > ---
> > > gcc/fold-const.cc                              | 14 +++++++++++++-
> > > .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
> > > 2 files changed, 31 insertions(+), 1 deletion(-)
> > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > >  
> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > index f5d68ac323a..43ed097bf5c 100644
> > > --- a/gcc/fold-const.cc
> > > +++ b/gcc/fold-const.cc
> > > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
> > >      {
> > >        tree type = TREE_TYPE (arg1);
> > >        bool step_ok_p;
> > > +
> > > +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
> > >        if (VECTOR_CST_STEPPED_P (arg1)
> > > -   && VECTOR_CST_STEPPED_P (arg2))
> > > +   && VECTOR_CST_DUPLICATE_P (arg2)
> > > +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > > +   || code == BIT_XOR_EXPR;
> > > +      else if (VECTOR_CST_STEPPED_P (arg2)
> > > +        && VECTOR_CST_DUPLICATE_P (arg1)
> > > +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > > +   || code == BIT_XOR_EXPR;
> > > +      else if (VECTOR_CST_STEPPED_P (arg1)
> > > +        && VECTOR_CST_STEPPED_P (arg2))
> > > /* We can operate directly on the encoding if:
> > >       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > > new file mode 100644
> > > index 00000000000..816ebd3c493
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile }  */
> > > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> > > +
> > > +int a;
> > > +short b[9];
> > > +char c, d;
> > > +void e() {
> > > +  d = 0;
> > > +  for (;; d++) {
> > > +    if (b[d])
> > > +      break;
> > > +    a = 8;
> > > +    for (; a >= 0; a--) {
> > > +      char *f = &c;
> > > +      *f &= d == (a & d);
> > > +    }
> > > +  }
> > > +}
> > > 
> >  
> > 
>  
>
  
Richard Sandiford Dec. 19, 2023, 10:40 a.m. UTC | #10
Richard Biener <rguenther@suse.de> writes:
> On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
>
>> Hi, Richard.
>> 
>> After investigating the codes:
>> /* Return true if EXPR is the integer constant zero or a complex constant
>>    of zero, or a location wrapper for such a constant.  */
>> 
>> bool
>> integer_zerop (const_tree expr)
>> {
>>   STRIP_ANY_LOCATION_WRAPPER (expr);
>> 
>>   switch (TREE_CODE (expr))
>>     {
>>     case INTEGER_CST:
>>       return wi::to_wide (expr) == 0;
>>     case COMPLEX_CST:
>>       return (integer_zerop (TREE_REALPART (expr))
>>               && integer_zerop (TREE_IMAGPART (expr)));
>>     case VECTOR_CST:
>>       return (VECTOR_CST_NPATTERNS (expr) == 1
>>               && VECTOR_CST_DUPLICATE_P (expr)
>>               && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
>>     default:
>>       return false;
>>     }
>> }
>> 
>> I wonder whether we can simplify the codes as follows :?
>>       if (integer_zerop (arg1) || integer_zerop (arg2))
>>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>>                      || code == BIT_XOR_EXPR);
>
> Possibly.  I'll let Richard S. comment on the whole structure.

The current code is handling cases that require elementwise arithmetic.
ISTM that what we're really doing here is identifying cases where
whole-vector arithmetic is possible instead.  I think that should be
a separate pre-step, rather than integrated into the current code.

Largely this would consist of writing out match.pd-style folds in
C++ code, so Andrew's fix in comment 7 seems neater to me.

But if this must happen in const_binop instead, then we could have
a function like:

/* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP
   is the other operand.  Try to use the value of OP to simplify the
   operation in one step, without having to process individual elements.  */
tree
simplify_const_binop (tree_code code, rtx op, rtx other_op, int index)
{
  ...
}

Thanks,
Richard

>
> Richard.
>
>> 
>> 
>> 
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Biener
>> Date: 2023-12-19 17:12
>> To: juzhe.zhong@rivai.ai
>> CC: Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; pinskia
>> Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
>> On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
>>  
>> > Hi?Richard. Do you mean add the check as follows ?
>> > 
>> >       if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
>> >           && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3
>>  
>> Or <= 3 which would allow combining.  As said, not sure what
>> == 2 would be and whether that would work.
>>  
>> Btw, integer_allonesp should also allow to be optimized for
>> and/ior at least.  Possibly IOR/AND with the sign bit for
>> signed elements as well.
>>  
>> I wonder if there's a programmatic way to identify OK cases
>> rather than enumerating them.
>>  
>> >           && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
>> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> >           || code == BIT_XOR_EXPR);
>> >       else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
>> >           && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
>> >           && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
>> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> >           || code == BIT_XOR_EXPR);
>> > 
>> > 
>> > 
>> > juzhe.zhong@rivai.ai
>> >  
>> > From: Richard Biener
>> > Date: 2023-12-19 16:15
>> > To: ???
>> > CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
>> > Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
>> > On Tue, 19 Dec 2023, ??? wrote:
>> >  
>> > > Thanks Robin send initial patch to fix this ICE bug.
>> > > 
>> > > CC to Richard S, Richard B, and Andrew.
>> >  
>> > Just one comment, it seems that VECTOR_CST_STEPPED_P should
>> > implicitly include VECTOR_CST_DUPLICATE_P since it would be
>> > a step of zero (but as implemented it doesn't catch this).
>> > Looking at the implementation it's odd that we can handle
>> > VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
>> > == 3 (stepped) but not == 2 (not sure what that would be).
>> >  
>> > Maybe the tests can be re-formulated in terms of
>> > VECTOR_CST_NELTS_PER_PATTERN?
>> >  
>> > Richard.
>> >  
>> > > Thanks.
>> > > 
>> > > 
>> > > 
>> > > juzhe.zhong@rivai.ai
>> > >  
>> > > From: Robin Dapp
>> > > Date: 2023-12-19 03:50
>> > > To: gcc-patches
>> > > CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
>> > > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
>> > > Hi,
>> > >  
>> > > found in PR112971, this patch adds folding support for bitwise operations
>> > > of const duplicate zero vectors and stepped vectors.
>> > > On riscv we have the situation that a folding would perpetually continue
>> > > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
>> > > not fold to {0, 0, 0, ...}.
>> > >  
>> > > Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
>> > >  
>> > > I won't be available to respond quickly until next year.  Pan or Juzhe,
>> > > as discussed, feel free to continue with possible revisions.
>> > >  
>> > > Regards
>> > > Robin
>> > >  
>> > >  
>> > > gcc/ChangeLog:
>> > >  
>> > > PR middle-end/112971
>> > >  
>> > > * fold-const.cc (const_binop): Handle
>> > > zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
>> > >  
>> > > gcc/testsuite/ChangeLog:
>> > >  
>> > > * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
>> > > ---
>> > > gcc/fold-const.cc                              | 14 +++++++++++++-
>> > > .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
>> > > 2 files changed, 31 insertions(+), 1 deletion(-)
>> > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
>> > >  
>> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>> > > index f5d68ac323a..43ed097bf5c 100644
>> > > --- a/gcc/fold-const.cc
>> > > +++ b/gcc/fold-const.cc
>> > > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
>> > >      {
>> > >        tree type = TREE_TYPE (arg1);
>> > >        bool step_ok_p;
>> > > +
>> > > +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
>> > >        if (VECTOR_CST_STEPPED_P (arg1)
>> > > -   && VECTOR_CST_STEPPED_P (arg2))
>> > > +   && VECTOR_CST_DUPLICATE_P (arg2)
>> > > +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
>> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> > > +   || code == BIT_XOR_EXPR;
>> > > +      else if (VECTOR_CST_STEPPED_P (arg2)
>> > > +        && VECTOR_CST_DUPLICATE_P (arg1)
>> > > +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
>> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> > > +   || code == BIT_XOR_EXPR;
>> > > +      else if (VECTOR_CST_STEPPED_P (arg1)
>> > > +        && VECTOR_CST_STEPPED_P (arg2))
>> > > /* We can operate directly on the encoding if:
>> > >       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
>> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
>> > > new file mode 100644
>> > > index 00000000000..816ebd3c493
>> > > --- /dev/null
>> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
>> > > @@ -0,0 +1,18 @@
>> > > +/* { dg-do compile }  */
>> > > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
>> > > +
>> > > +int a;
>> > > +short b[9];
>> > > +char c, d;
>> > > +void e() {
>> > > +  d = 0;
>> > > +  for (;; d++) {
>> > > +    if (b[d])
>> > > +      break;
>> > > +    a = 8;
>> > > +    for (; a >= 0; a--) {
>> > > +      char *f = &c;
>> > > +      *f &= d == (a & d);
>> > > +    }
>> > > +  }
>> > > +}
>> > > 
>> >  
>> > 
>>  
>>
  
juzhe.zhong@rivai.ai Dec. 19, 2023, 10:58 a.m. UTC | #11
OK.
I just take a look at match.pd.
If we changed bit_and.
Do we need to adapt for these 2 ?
/* x | ~0 -> ~0  */
(simplify
 (bit_ior @0 integer_all_onesp@1)
 @1)

/* x | 0 -> x  */
(simplify
 (bit_ior @0 integer_zerop)
 @0)

I am not sure since we currently only face the ICE on BIT_AND for RVV.



juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-12-19 18:40
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; Robin Dapp; gcc-patches; pan2.li; Richard Biener; pinskia
Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
Richard Biener <rguenther@suse.de> writes:
> On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
>
>> Hi, Richard.
>> 
>> After investigating the codes:
>> /* Return true if EXPR is the integer constant zero or a complex constant
>>    of zero, or a location wrapper for such a constant.  */
>> 
>> bool
>> integer_zerop (const_tree expr)
>> {
>>   STRIP_ANY_LOCATION_WRAPPER (expr);
>> 
>>   switch (TREE_CODE (expr))
>>     {
>>     case INTEGER_CST:
>>       return wi::to_wide (expr) == 0;
>>     case COMPLEX_CST:
>>       return (integer_zerop (TREE_REALPART (expr))
>>               && integer_zerop (TREE_IMAGPART (expr)));
>>     case VECTOR_CST:
>>       return (VECTOR_CST_NPATTERNS (expr) == 1
>>               && VECTOR_CST_DUPLICATE_P (expr)
>>               && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
>>     default:
>>       return false;
>>     }
>> }
>> 
>> I wonder whether we can simplify the codes as follows :?
>>       if (integer_zerop (arg1) || integer_zerop (arg2))
>>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>>                      || code == BIT_XOR_EXPR);
>
> Possibly.  I'll let Richard S. comment on the whole structure.
 
The current code is handling cases that require elementwise arithmetic.
ISTM that what we're really doing here is identifying cases where
whole-vector arithmetic is possible instead.  I think that should be
a separate pre-step, rather than integrated into the current code.
 
Largely this would consist of writing out match.pd-style folds in
C++ code, so Andrew's fix in comment 7 seems neater to me.
 
But if this must happen in const_binop instead, then we could have
a function like:
 
/* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP
   is the other operand.  Try to use the value of OP to simplify the
   operation in one step, without having to process individual elements.  */
tree
simplify_const_binop (tree_code code, rtx op, rtx other_op, int index)
{
  ...
}
 
Thanks,
Richard
 
>
> Richard.
>
>> 
>> 
>> 
>> juzhe.zhong@rivai.ai
>>  
>> From: Richard Biener
>> Date: 2023-12-19 17:12
>> To: juzhe.zhong@rivai.ai
>> CC: Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; pinskia
>> Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
>> On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
>>  
>> > Hi?Richard. Do you mean add the check as follows ?
>> > 
>> >       if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
>> >           && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3
>>  
>> Or <= 3 which would allow combining.  As said, not sure what
>> == 2 would be and whether that would work.
>>  
>> Btw, integer_allonesp should also allow to be optimized for
>> and/ior at least.  Possibly IOR/AND with the sign bit for
>> signed elements as well.
>>  
>> I wonder if there's a programmatic way to identify OK cases
>> rather than enumerating them.
>>  
>> >           && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
>> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> >           || code == BIT_XOR_EXPR);
>> >       else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
>> >           && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
>> >           && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
>> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> >           || code == BIT_XOR_EXPR);
>> > 
>> > 
>> > 
>> > juzhe.zhong@rivai.ai
>> >  
>> > From: Richard Biener
>> > Date: 2023-12-19 16:15
>> > To: ???
>> > CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
>> > Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
>> > On Tue, 19 Dec 2023, ??? wrote:
>> >  
>> > > Thanks Robin send initial patch to fix this ICE bug.
>> > > 
>> > > CC to Richard S, Richard B, and Andrew.
>> >  
>> > Just one comment, it seems that VECTOR_CST_STEPPED_P should
>> > implicitly include VECTOR_CST_DUPLICATE_P since it would be
>> > a step of zero (but as implemented it doesn't catch this).
>> > Looking at the implementation it's odd that we can handle
>> > VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
>> > == 3 (stepped) but not == 2 (not sure what that would be).
>> >  
>> > Maybe the tests can be re-formulated in terms of
>> > VECTOR_CST_NELTS_PER_PATTERN?
>> >  
>> > Richard.
>> >  
>> > > Thanks.
>> > > 
>> > > 
>> > > 
>> > > juzhe.zhong@rivai.ai
>> > >  
>> > > From: Robin Dapp
>> > > Date: 2023-12-19 03:50
>> > > To: gcc-patches
>> > > CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
>> > > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
>> > > Hi,
>> > >  
>> > > found in PR112971, this patch adds folding support for bitwise operations
>> > > of const duplicate zero vectors and stepped vectors.
>> > > On riscv we have the situation that a folding would perpetually continue
>> > > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
>> > > not fold to {0, 0, 0, ...}.
>> > >  
>> > > Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
>> > >  
>> > > I won't be available to respond quickly until next year.  Pan or Juzhe,
>> > > as discussed, feel free to continue with possible revisions.
>> > >  
>> > > Regards
>> > > Robin
>> > >  
>> > >  
>> > > gcc/ChangeLog:
>> > >  
>> > > PR middle-end/112971
>> > >  
>> > > * fold-const.cc (const_binop): Handle
>> > > zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
>> > >  
>> > > gcc/testsuite/ChangeLog:
>> > >  
>> > > * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
>> > > ---
>> > > gcc/fold-const.cc                              | 14 +++++++++++++-
>> > > .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
>> > > 2 files changed, 31 insertions(+), 1 deletion(-)
>> > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
>> > >  
>> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>> > > index f5d68ac323a..43ed097bf5c 100644
>> > > --- a/gcc/fold-const.cc
>> > > +++ b/gcc/fold-const.cc
>> > > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
>> > >      {
>> > >        tree type = TREE_TYPE (arg1);
>> > >        bool step_ok_p;
>> > > +
>> > > +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
>> > >        if (VECTOR_CST_STEPPED_P (arg1)
>> > > -   && VECTOR_CST_STEPPED_P (arg2))
>> > > +   && VECTOR_CST_DUPLICATE_P (arg2)
>> > > +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
>> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> > > +   || code == BIT_XOR_EXPR;
>> > > +      else if (VECTOR_CST_STEPPED_P (arg2)
>> > > +        && VECTOR_CST_DUPLICATE_P (arg1)
>> > > +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
>> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> > > +   || code == BIT_XOR_EXPR;
>> > > +      else if (VECTOR_CST_STEPPED_P (arg1)
>> > > +        && VECTOR_CST_STEPPED_P (arg2))
>> > > /* We can operate directly on the encoding if:
>> > >       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
>> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
>> > > new file mode 100644
>> > > index 00000000000..816ebd3c493
>> > > --- /dev/null
>> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
>> > > @@ -0,0 +1,18 @@
>> > > +/* { dg-do compile }  */
>> > > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
>> > > +
>> > > +int a;
>> > > +short b[9];
>> > > +char c, d;
>> > > +void e() {
>> > > +  d = 0;
>> > > +  for (;; d++) {
>> > > +    if (b[d])
>> > > +      break;
>> > > +    a = 8;
>> > > +    for (; a >= 0; a--) {
>> > > +      char *f = &c;
>> > > +      *f &= d == (a & d);
>> > > +    }
>> > > +  }
>> > > +}
>> > > 
>> >  
>> > 
>>  
>>
  
Andrew Pinski Dec. 20, 2023, 2:04 a.m. UTC | #12
On Tue, Dec 19, 2023 at 2:40 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> Hi, Richard.
> >>
> >> After investigating the codes:
> >> /* Return true if EXPR is the integer constant zero or a complex constant
> >>    of zero, or a location wrapper for such a constant.  */
> >>
> >> bool
> >> integer_zerop (const_tree expr)
> >> {
> >>   STRIP_ANY_LOCATION_WRAPPER (expr);
> >>
> >>   switch (TREE_CODE (expr))
> >>     {
> >>     case INTEGER_CST:
> >>       return wi::to_wide (expr) == 0;
> >>     case COMPLEX_CST:
> >>       return (integer_zerop (TREE_REALPART (expr))
> >>               && integer_zerop (TREE_IMAGPART (expr)));
> >>     case VECTOR_CST:
> >>       return (VECTOR_CST_NPATTERNS (expr) == 1
> >>               && VECTOR_CST_DUPLICATE_P (expr)
> >>               && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
> >>     default:
> >>       return false;
> >>     }
> >> }
> >>
> >> I wonder whether we can simplify the codes as follows :?
> >>       if (integer_zerop (arg1) || integer_zerop (arg2))
> >>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >>                      || code == BIT_XOR_EXPR);
> >
> > Possibly.  I'll let Richard S. comment on the whole structure.
>
> The current code is handling cases that require elementwise arithmetic.
> ISTM that what we're really doing here is identifying cases where
> whole-vector arithmetic is possible instead.  I think that should be
> a separate pre-step, rather than integrated into the current code.
>
> Largely this would consist of writing out match.pd-style folds in
> C++ code, so Andrew's fix in comment 7 seems neater to me.

I didn't like the change to match.pd (even with a comment on why)
because it violates the whole idea behind canonicalization of
constants being 2nd operand of commutative and comparison expressions.
Maybe there are only a few limited match/simplify patterns which need
to add the :c for constants not being the 2nd operand but there needs
to be a comment on why :c is needed for this.

>
> But if this must happen in const_binop instead, then we could have
> a function like:

The reasoning of why it should be in const_binop rather than in match
is because both operands are constants.
Now for commutative expressions, we only need to check the first
operand for zero/all_ones and try again swapping the operands. This
will most likely solve the problem of writing so much code.
We could even use lambdas to simplify things too.

Thanks,
Andrew Pinski

>
> /* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP
>    is the other operand.  Try to use the value of OP to simplify the
>    operation in one step, without having to process individual elements.  */
> tree
> simplify_const_binop (tree_code code, rtx op, rtx other_op, int index)
> {
>   ...
> }
>
> Thanks,
> Richard
>
> >
> > Richard.
> >
> >>
> >>
> >>
> >> juzhe.zhong@rivai.ai
> >>
> >> From: Richard Biener
> >> Date: 2023-12-19 17:12
> >> To: juzhe.zhong@rivai.ai
> >> CC: Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; pinskia
> >> Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> >> On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
> >>
> >> > Hi?Richard. Do you mean add the check as follows ?
> >> >
> >> >       if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
> >> >           && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3
> >>
> >> Or <= 3 which would allow combining.  As said, not sure what
> >> == 2 would be and whether that would work.
> >>
> >> Btw, integer_allonesp should also allow to be optimized for
> >> and/ior at least.  Possibly IOR/AND with the sign bit for
> >> signed elements as well.
> >>
> >> I wonder if there's a programmatic way to identify OK cases
> >> rather than enumerating them.
> >>
> >> >           && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> >> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> >           || code == BIT_XOR_EXPR);
> >> >       else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
> >> >           && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
> >> >           && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> >> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> >           || code == BIT_XOR_EXPR);
> >> >
> >> >
> >> >
> >> > juzhe.zhong@rivai.ai
> >> >
> >> > From: Richard Biener
> >> > Date: 2023-12-19 16:15
> >> > To: ???
> >> > CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
> >> > Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> >> > On Tue, 19 Dec 2023, ??? wrote:
> >> >
> >> > > Thanks Robin send initial patch to fix this ICE bug.
> >> > >
> >> > > CC to Richard S, Richard B, and Andrew.
> >> >
> >> > Just one comment, it seems that VECTOR_CST_STEPPED_P should
> >> > implicitly include VECTOR_CST_DUPLICATE_P since it would be
> >> > a step of zero (but as implemented it doesn't catch this).
> >> > Looking at the implementation it's odd that we can handle
> >> > VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
> >> > == 3 (stepped) but not == 2 (not sure what that would be).
> >> >
> >> > Maybe the tests can be re-formulated in terms of
> >> > VECTOR_CST_NELTS_PER_PATTERN?
> >> >
> >> > Richard.
> >> >
> >> > > Thanks.
> >> > >
> >> > >
> >> > >
> >> > > juzhe.zhong@rivai.ai
> >> > >
> >> > > From: Robin Dapp
> >> > > Date: 2023-12-19 03:50
> >> > > To: gcc-patches
> >> > > CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
> >> > > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> >> > > Hi,
> >> > >
> >> > > found in PR112971, this patch adds folding support for bitwise operations
> >> > > of const duplicate zero vectors and stepped vectors.
> >> > > On riscv we have the situation that a folding would perpetually continue
> >> > > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> >> > > not fold to {0, 0, 0, ...}.
> >> > >
> >> > > Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
> >> > >
> >> > > I won't be available to respond quickly until next year.  Pan or Juzhe,
> >> > > as discussed, feel free to continue with possible revisions.
> >> > >
> >> > > Regards
> >> > > Robin
> >> > >
> >> > >
> >> > > gcc/ChangeLog:
> >> > >
> >> > > PR middle-end/112971
> >> > >
> >> > > * fold-const.cc (const_binop): Handle
> >> > > zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
> >> > >
> >> > > gcc/testsuite/ChangeLog:
> >> > >
> >> > > * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> >> > > ---
> >> > > gcc/fold-const.cc                              | 14 +++++++++++++-
> >> > > .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
> >> > > 2 files changed, 31 insertions(+), 1 deletion(-)
> >> > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> >> > >
> >> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > > index f5d68ac323a..43ed097bf5c 100644
> >> > > --- a/gcc/fold-const.cc
> >> > > +++ b/gcc/fold-const.cc
> >> > > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
> >> > >      {
> >> > >        tree type = TREE_TYPE (arg1);
> >> > >        bool step_ok_p;
> >> > > +
> >> > > +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
> >> > >        if (VECTOR_CST_STEPPED_P (arg1)
> >> > > -   && VECTOR_CST_STEPPED_P (arg2))
> >> > > +   && VECTOR_CST_DUPLICATE_P (arg2)
> >> > > +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> >> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> > > +   || code == BIT_XOR_EXPR;
> >> > > +      else if (VECTOR_CST_STEPPED_P (arg2)
> >> > > +        && VECTOR_CST_DUPLICATE_P (arg1)
> >> > > +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> >> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> > > +   || code == BIT_XOR_EXPR;
> >> > > +      else if (VECTOR_CST_STEPPED_P (arg1)
> >> > > +        && VECTOR_CST_STEPPED_P (arg2))
> >> > > /* We can operate directly on the encoding if:
> >> > >       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
> >> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> >> > > new file mode 100644
> >> > > index 00000000000..816ebd3c493
> >> > > --- /dev/null
> >> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> >> > > @@ -0,0 +1,18 @@
> >> > > +/* { dg-do compile }  */
> >> > > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> >> > > +
> >> > > +int a;
> >> > > +short b[9];
> >> > > +char c, d;
> >> > > +void e() {
> >> > > +  d = 0;
> >> > > +  for (;; d++) {
> >> > > +    if (b[d])
> >> > > +      break;
> >> > > +    a = 8;
> >> > > +    for (; a >= 0; a--) {
> >> > > +      char *f = &c;
> >> > > +      *f &= d == (a & d);
> >> > > +    }
> >> > > +  }
> >> > > +}
> >> > >
> >> >
> >> >
> >>
> >>
  
juzhe.zhong@rivai.ai Dec. 20, 2023, 2:07 a.m. UTC | #13
Thanks Andrew.

Ok. I think I can apply suggestions from both you and Richards.

That is, fix it in const_binop but with creating new helper function simplify_const_binop (suggestion from Richard).

But I don't know how to write codes in simplify_const_ binop... Could you give me some hints ?




juzhe.zhong@rivai.ai
 
From: Andrew Pinski
Date: 2023-12-20 10:04
To: Richard Biener; juzhe.zhong@rivai.ai; Robin Dapp; gcc-patches; pan2.li; Richard Biener; pinskia; richard.sandiford
Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
On Tue, Dec 19, 2023 at 2:40 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
> >
> >> Hi, Richard.
> >>
> >> After investigating the codes:
> >> /* Return true if EXPR is the integer constant zero or a complex constant
> >>    of zero, or a location wrapper for such a constant.  */
> >>
> >> bool
> >> integer_zerop (const_tree expr)
> >> {
> >>   STRIP_ANY_LOCATION_WRAPPER (expr);
> >>
> >>   switch (TREE_CODE (expr))
> >>     {
> >>     case INTEGER_CST:
> >>       return wi::to_wide (expr) == 0;
> >>     case COMPLEX_CST:
> >>       return (integer_zerop (TREE_REALPART (expr))
> >>               && integer_zerop (TREE_IMAGPART (expr)));
> >>     case VECTOR_CST:
> >>       return (VECTOR_CST_NPATTERNS (expr) == 1
> >>               && VECTOR_CST_DUPLICATE_P (expr)
> >>               && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
> >>     default:
> >>       return false;
> >>     }
> >> }
> >>
> >> I wonder whether we can simplify the codes as follows :?
> >>       if (integer_zerop (arg1) || integer_zerop (arg2))
> >>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >>                      || code == BIT_XOR_EXPR);
> >
> > Possibly.  I'll let Richard S. comment on the whole structure.
>
> The current code is handling cases that require elementwise arithmetic.
> ISTM that what we're really doing here is identifying cases where
> whole-vector arithmetic is possible instead.  I think that should be
> a separate pre-step, rather than integrated into the current code.
>
> Largely this would consist of writing out match.pd-style folds in
> C++ code, so Andrew's fix in comment 7 seems neater to me.
 
I didn't like the change to match.pd (even with a comment on why)
because it violates the whole idea behind canonicalization of
constants being 2nd operand of commutative and comparison expressions.
Maybe there are only a few limited match/simplify patterns which need
to add the :c for constants not being the 2nd operand but there needs
to be a comment on why :c is needed for this.
 
>
> But if this must happen in const_binop instead, then we could have
> a function like:
 
The reasoning of why it should be in const_binop rather than in match
is because both operands are constants.
Now for commutative expressions, we only need to check the first
operand for zero/all_ones and try again swapping the operands. This
will most likely solve the problem of writing so much code.
We could even use lambdas to simplify things too.
 
Thanks,
Andrew Pinski
 
>
> /* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP
>    is the other operand.  Try to use the value of OP to simplify the
>    operation in one step, without having to process individual elements.  */
> tree
> simplify_const_binop (tree_code code, rtx op, rtx other_op, int index)
> {
>   ...
> }
>
> Thanks,
> Richard
>
> >
> > Richard.
> >
> >>
> >>
> >>
> >> juzhe.zhong@rivai.ai
> >>
> >> From: Richard Biener
> >> Date: 2023-12-19 17:12
> >> To: juzhe.zhong@rivai.ai
> >> CC: Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; pinskia
> >> Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> >> On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
> >>
> >> > Hi?Richard. Do you mean add the check as follows ?
> >> >
> >> >       if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
> >> >           && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3
> >>
> >> Or <= 3 which would allow combining.  As said, not sure what
> >> == 2 would be and whether that would work.
> >>
> >> Btw, integer_allonesp should also allow to be optimized for
> >> and/ior at least.  Possibly IOR/AND with the sign bit for
> >> signed elements as well.
> >>
> >> I wonder if there's a programmatic way to identify OK cases
> >> rather than enumerating them.
> >>
> >> >           && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> >> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> >           || code == BIT_XOR_EXPR);
> >> >       else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
> >> >           && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
> >> >           && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> >> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> >           || code == BIT_XOR_EXPR);
> >> >
> >> >
> >> >
> >> > juzhe.zhong@rivai.ai
> >> >
> >> > From: Richard Biener
> >> > Date: 2023-12-19 16:15
> >> > To: ???
> >> > CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
> >> > Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> >> > On Tue, 19 Dec 2023, ??? wrote:
> >> >
> >> > > Thanks Robin send initial patch to fix this ICE bug.
> >> > >
> >> > > CC to Richard S, Richard B, and Andrew.
> >> >
> >> > Just one comment, it seems that VECTOR_CST_STEPPED_P should
> >> > implicitly include VECTOR_CST_DUPLICATE_P since it would be
> >> > a step of zero (but as implemented it doesn't catch this).
> >> > Looking at the implementation it's odd that we can handle
> >> > VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
> >> > == 3 (stepped) but not == 2 (not sure what that would be).
> >> >
> >> > Maybe the tests can be re-formulated in terms of
> >> > VECTOR_CST_NELTS_PER_PATTERN?
> >> >
> >> > Richard.
> >> >
> >> > > Thanks.
> >> > >
> >> > >
> >> > >
> >> > > juzhe.zhong@rivai.ai
> >> > >
> >> > > From: Robin Dapp
> >> > > Date: 2023-12-19 03:50
> >> > > To: gcc-patches
> >> > > CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
> >> > > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> >> > > Hi,
> >> > >
> >> > > found in PR112971, this patch adds folding support for bitwise operations
> >> > > of const duplicate zero vectors and stepped vectors.
> >> > > On riscv we have the situation that a folding would perpetually continue
> >> > > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> >> > > not fold to {0, 0, 0, ...}.
> >> > >
> >> > > Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
> >> > >
> >> > > I won't be available to respond quickly until next year.  Pan or Juzhe,
> >> > > as discussed, feel free to continue with possible revisions.
> >> > >
> >> > > Regards
> >> > > Robin
> >> > >
> >> > >
> >> > > gcc/ChangeLog:
> >> > >
> >> > > PR middle-end/112971
> >> > >
> >> > > * fold-const.cc (const_binop): Handle
> >> > > zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
> >> > >
> >> > > gcc/testsuite/ChangeLog:
> >> > >
> >> > > * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> >> > > ---
> >> > > gcc/fold-const.cc                              | 14 +++++++++++++-
> >> > > .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
> >> > > 2 files changed, 31 insertions(+), 1 deletion(-)
> >> > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> >> > >
> >> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > > index f5d68ac323a..43ed097bf5c 100644
> >> > > --- a/gcc/fold-const.cc
> >> > > +++ b/gcc/fold-const.cc
> >> > > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
> >> > >      {
> >> > >        tree type = TREE_TYPE (arg1);
> >> > >        bool step_ok_p;
> >> > > +
> >> > > +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
> >> > >        if (VECTOR_CST_STEPPED_P (arg1)
> >> > > -   && VECTOR_CST_STEPPED_P (arg2))
> >> > > +   && VECTOR_CST_DUPLICATE_P (arg2)
> >> > > +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> >> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> > > +   || code == BIT_XOR_EXPR;
> >> > > +      else if (VECTOR_CST_STEPPED_P (arg2)
> >> > > +        && VECTOR_CST_DUPLICATE_P (arg1)
> >> > > +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> >> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> > > +   || code == BIT_XOR_EXPR;
> >> > > +      else if (VECTOR_CST_STEPPED_P (arg1)
> >> > > +        && VECTOR_CST_STEPPED_P (arg2))
> >> > > /* We can operate directly on the encoding if:
> >> > >       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
> >> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> >> > > new file mode 100644
> >> > > index 00000000000..816ebd3c493
> >> > > --- /dev/null
> >> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> >> > > @@ -0,0 +1,18 @@
> >> > > +/* { dg-do compile }  */
> >> > > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> >> > > +
> >> > > +int a;
> >> > > +short b[9];
> >> > > +char c, d;
> >> > > +void e() {
> >> > > +  d = 0;
> >> > > +  for (;; d++) {
> >> > > +    if (b[d])
> >> > > +      break;
> >> > > +    a = 8;
> >> > > +    for (; a >= 0; a--) {
> >> > > +      char *f = &c;
> >> > > +      *f &= d == (a & d);
> >> > > +    }
> >> > > +  }
> >> > > +}
> >> > >
> >> >
> >> >
> >>
> >>
  
Richard Biener Dec. 20, 2023, 7:25 a.m. UTC | #14
On Tue, 19 Dec 2023, Andrew Pinski wrote:

> On Tue, Dec 19, 2023 at 2:40?AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Richard Biener <rguenther@suse.de> writes:
> > > On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
> > >
> > >> Hi, Richard.
> > >>
> > >> After investigating the codes:
> > >> /* Return true if EXPR is the integer constant zero or a complex constant
> > >>    of zero, or a location wrapper for such a constant.  */
> > >>
> > >> bool
> > >> integer_zerop (const_tree expr)
> > >> {
> > >>   STRIP_ANY_LOCATION_WRAPPER (expr);
> > >>
> > >>   switch (TREE_CODE (expr))
> > >>     {
> > >>     case INTEGER_CST:
> > >>       return wi::to_wide (expr) == 0;
> > >>     case COMPLEX_CST:
> > >>       return (integer_zerop (TREE_REALPART (expr))
> > >>               && integer_zerop (TREE_IMAGPART (expr)));
> > >>     case VECTOR_CST:
> > >>       return (VECTOR_CST_NPATTERNS (expr) == 1
> > >>               && VECTOR_CST_DUPLICATE_P (expr)
> > >>               && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
> > >>     default:
> > >>       return false;
> > >>     }
> > >> }
> > >>
> > >> I wonder whether we can simplify the codes as follows :?
> > >>       if (integer_zerop (arg1) || integer_zerop (arg2))
> > >>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > >>                      || code == BIT_XOR_EXPR);
> > >
> > > Possibly.  I'll let Richard S. comment on the whole structure.
> >
> > The current code is handling cases that require elementwise arithmetic.
> > ISTM that what we're really doing here is identifying cases where
> > whole-vector arithmetic is possible instead.  I think that should be
> > a separate pre-step, rather than integrated into the current code.
> >
> > Largely this would consist of writing out match.pd-style folds in
> > C++ code, so Andrew's fix in comment 7 seems neater to me.
> 
> I didn't like the change to match.pd (even with a comment on why)
> because it violates the whole idea behind canonicalization of
> constants being 2nd operand of commutative and comparison expressions.
> Maybe there are only a few limited match/simplify patterns which need
> to add the :c for constants not being the 2nd operand but there needs
> to be a comment on why :c is needed for this.

Agreed.  Note that in theory we of course could define extra
canonicalization rules for CST op CST in tree_swap_operands_p,
there are some cases those surivive, mostly around FP and
dynamic rounding state.  I'd rather go that route if we decide
that match.pd should catch this.

> >
> > But if this must happen in const_binop instead, then we could have
> > a function like:
> 
> The reasoning of why it should be in const_binop rather than in match
> is because both operands are constants.

+1

Richard.

> Now for commutative expressions, we only need to check the first
> operand for zero/all_ones and try again swapping the operands. This
> will most likely solve the problem of writing so much code.
> We could even use lambdas to simplify things too.
> 
> Thanks,
> Andrew Pinski
> 
> >
> > /* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP
> >    is the other operand.  Try to use the value of OP to simplify the
> >    operation in one step, without having to process individual elements.  */
> > tree
> > simplify_const_binop (tree_code code, rtx op, rtx other_op, int index)
> > {
> >   ...
> > }
> >
> > Thanks,
> > Richard
> >
> > >
> > > Richard.
> > >
> > >>
> > >>
> > >>
> > >> juzhe.zhong@rivai.ai
> > >>
> > >> From: Richard Biener
> > >> Date: 2023-12-19 17:12
> > >> To: juzhe.zhong@rivai.ai
> > >> CC: Robin Dapp; gcc-patches; pan2.li; richard.sandiford; Richard Biener; pinskia
> > >> Subject: Re: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> > >> On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
> > >>
> > >> > Hi?Richard. Do you mean add the check as follows ?
> > >> >
> > >> >       if (VECTOR_CST_NELTS_PER_PATTERN (arg1) == 1
> > >> >           && VECTOR_CST_NELTS_PER_PATTERN (arg2) == 3
> > >>
> > >> Or <= 3 which would allow combining.  As said, not sure what
> > >> == 2 would be and whether that would work.
> > >>
> > >> Btw, integer_allonesp should also allow to be optimized for
> > >> and/ior at least.  Possibly IOR/AND with the sign bit for
> > >> signed elements as well.
> > >>
> > >> I wonder if there's a programmatic way to identify OK cases
> > >> rather than enumerating them.
> > >>
> > >> >           && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> > >> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > >> >           || code == BIT_XOR_EXPR);
> > >> >       else if (VECTOR_CST_NELTS_PER_PATTERN (arg2) == 1
> > >> >           && VECTOR_CST_NELTS_PER_PATTERN (arg1) == 3
> > >> >           && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> > >> >         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > >> >           || code == BIT_XOR_EXPR);
> > >> >
> > >> >
> > >> >
> > >> > juzhe.zhong@rivai.ai
> > >> >
> > >> > From: Richard Biener
> > >> > Date: 2023-12-19 16:15
> > >> > To: ???
> > >> > CC: rdapp.gcc; gcc-patches; pan2.li; richard.sandiford; richard.guenther; Andrew Pinski
> > >> > Subject: Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> > >> > On Tue, 19 Dec 2023, ??? wrote:
> > >> >
> > >> > > Thanks Robin send initial patch to fix this ICE bug.
> > >> > >
> > >> > > CC to Richard S, Richard B, and Andrew.
> > >> >
> > >> > Just one comment, it seems that VECTOR_CST_STEPPED_P should
> > >> > implicitly include VECTOR_CST_DUPLICATE_P since it would be
> > >> > a step of zero (but as implemented it doesn't catch this).
> > >> > Looking at the implementation it's odd that we can handle
> > >> > VECTOR_CST_NELTS_PER_PATTERN == 1 (duplicate) and
> > >> > == 3 (stepped) but not == 2 (not sure what that would be).
> > >> >
> > >> > Maybe the tests can be re-formulated in terms of
> > >> > VECTOR_CST_NELTS_PER_PATTERN?
> > >> >
> > >> > Richard.
> > >> >
> > >> > > Thanks.
> > >> > >
> > >> > >
> > >> > >
> > >> > > juzhe.zhong@rivai.ai
> > >> > >
> > >> > > From: Robin Dapp
> > >> > > Date: 2023-12-19 03:50
> > >> > > To: gcc-patches
> > >> > > CC: rdapp.gcc; Li, Pan2; juzhe.zhong@rivai.ai
> > >> > > Subject: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].
> > >> > > Hi,
> > >> > >
> > >> > > found in PR112971, this patch adds folding support for bitwise operations
> > >> > > of const duplicate zero vectors and stepped vectors.
> > >> > > On riscv we have the situation that a folding would perpetually continue
> > >> > > without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> > >> > > not fold to {0, 0, 0, ...}.
> > >> > >
> > >> > > Bootstrapped and regtested on x86 and aarch64, regtested on riscv.
> > >> > >
> > >> > > I won't be available to respond quickly until next year.  Pan or Juzhe,
> > >> > > as discussed, feel free to continue with possible revisions.
> > >> > >
> > >> > > Regards
> > >> > > Robin
> > >> > >
> > >> > >
> > >> > > gcc/ChangeLog:
> > >> > >
> > >> > > PR middle-end/112971
> > >> > >
> > >> > > * fold-const.cc (const_binop): Handle
> > >> > > zerop@1  AND/IOR/XOR  VECT_CST_STEPPED_P@2
> > >> > >
> > >> > > gcc/testsuite/ChangeLog:
> > >> > >
> > >> > > * gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> > >> > > ---
> > >> > > gcc/fold-const.cc                              | 14 +++++++++++++-
> > >> > > .../gcc.target/riscv/rvv/autovec/pr112971.c    | 18 ++++++++++++++++++
> > >> > > 2 files changed, 31 insertions(+), 1 deletion(-)
> > >> > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > >> > >
> > >> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > >> > > index f5d68ac323a..43ed097bf5c 100644
> > >> > > --- a/gcc/fold-const.cc
> > >> > > +++ b/gcc/fold-const.cc
> > >> > > @@ -1653,8 +1653,20 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
> > >> > >      {
> > >> > >        tree type = TREE_TYPE (arg1);
> > >> > >        bool step_ok_p;
> > >> > > +
> > >> > > +      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
> > >> > >        if (VECTOR_CST_STEPPED_P (arg1)
> > >> > > -   && VECTOR_CST_STEPPED_P (arg2))
> > >> > > +   && VECTOR_CST_DUPLICATE_P (arg2)
> > >> > > +   && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
> > >> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > >> > > +   || code == BIT_XOR_EXPR;
> > >> > > +      else if (VECTOR_CST_STEPPED_P (arg2)
> > >> > > +        && VECTOR_CST_DUPLICATE_P (arg1)
> > >> > > +        && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
> > >> > > + step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> > >> > > +   || code == BIT_XOR_EXPR;
> > >> > > +      else if (VECTOR_CST_STEPPED_P (arg1)
> > >> > > +        && VECTOR_CST_STEPPED_P (arg2))
> > >> > > /* We can operate directly on the encoding if:
> > >> > >       a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
> > >> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > >> > > new file mode 100644
> > >> > > index 00000000000..816ebd3c493
> > >> > > --- /dev/null
> > >> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> > >> > > @@ -0,0 +1,18 @@
> > >> > > +/* { dg-do compile }  */
> > >> > > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> > >> > > +
> > >> > > +int a;
> > >> > > +short b[9];
> > >> > > +char c, d;
> > >> > > +void e() {
> > >> > > +  d = 0;
> > >> > > +  for (;; d++) {
> > >> > > +    if (b[d])
> > >> > > +      break;
> > >> > > +    a = 8;
> > >> > > +    for (; a >= 0; a--) {
> > >> > > +      char *f = &c;
> > >> > > +      *f &= d == (a & d);
> > >> > > +    }
> > >> > > +  }
> > >> > > +}
> > >> > >
> > >> >
> > >> >
> > >>
> > >>
>
  
Richard Sandiford Dec. 20, 2023, 9:33 a.m. UTC | #15
Richard Biener <rguenther@suse.de> writes:
> On Tue, 19 Dec 2023, Andrew Pinski wrote:
>
>> On Tue, Dec 19, 2023 at 2:40?AM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> >
>> > Richard Biener <rguenther@suse.de> writes:
>> > > On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
>> > >
>> > >> Hi, Richard.
>> > >>
>> > >> After investigating the codes:
>> > >> /* Return true if EXPR is the integer constant zero or a complex constant
>> > >>    of zero, or a location wrapper for such a constant.  */
>> > >>
>> > >> bool
>> > >> integer_zerop (const_tree expr)
>> > >> {
>> > >>   STRIP_ANY_LOCATION_WRAPPER (expr);
>> > >>
>> > >>   switch (TREE_CODE (expr))
>> > >>     {
>> > >>     case INTEGER_CST:
>> > >>       return wi::to_wide (expr) == 0;
>> > >>     case COMPLEX_CST:
>> > >>       return (integer_zerop (TREE_REALPART (expr))
>> > >>               && integer_zerop (TREE_IMAGPART (expr)));
>> > >>     case VECTOR_CST:
>> > >>       return (VECTOR_CST_NPATTERNS (expr) == 1
>> > >>               && VECTOR_CST_DUPLICATE_P (expr)
>> > >>               && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
>> > >>     default:
>> > >>       return false;
>> > >>     }
>> > >> }
>> > >>
>> > >> I wonder whether we can simplify the codes as follows :?
>> > >>       if (integer_zerop (arg1) || integer_zerop (arg2))
>> > >>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>> > >>                      || code == BIT_XOR_EXPR);
>> > >
>> > > Possibly.  I'll let Richard S. comment on the whole structure.
>> >
>> > The current code is handling cases that require elementwise arithmetic.
>> > ISTM that what we're really doing here is identifying cases where
>> > whole-vector arithmetic is possible instead.  I think that should be
>> > a separate pre-step, rather than integrated into the current code.
>> >
>> > Largely this would consist of writing out match.pd-style folds in
>> > C++ code, so Andrew's fix in comment 7 seems neater to me.
>> 
>> I didn't like the change to match.pd (even with a comment on why)
>> because it violates the whole idea behind canonicalization of
>> constants being 2nd operand of commutative and comparison expressions.
>> Maybe there are only a few limited match/simplify patterns which need
>> to add the :c for constants not being the 2nd operand but there needs
>> to be a comment on why :c is needed for this.
>
> Agreed.  Note that in theory we of course could define extra
> canonicalization rules for CST op CST in tree_swap_operands_p,
> there are some cases those surivive, mostly around FP and
> dynamic rounding state.  I'd rather go that route if we decide
> that match.pd should catch this.

I don't think that'll help in all cases though.  E.g. consider an
unfoldable:

  (or 1 CST)

where CST is "complicated".  We'd probably canonicalise that to:

  (or CST 1)

And that's good if we have:

  (or (or CST 1) 2)

since it could be folded to:

  (or CST 3)

But there are other constants CST2 for which (or CST CST2) is foldable
and (op 1 CST2) isn't.  So:

  (or (or 1 CST) CST2)

would sometimes be foldable in cases where:

  (or (or CST 1) CST2)

isn't.

>> >
>> > But if this must happen in const_binop instead, then we could have
>> > a function like:
>> 
>> The reasoning of why it should be in const_binop rather than in match
>> is because both operands are constants.
>
> +1

I can see that argument for the traditional case where all constants
can be folded at compile time.  But that isn't the case for VLA constants.
(op CST1 CST2) is sometimes not knowable at compile time.  And I think
match.pd should apply to those cases just as much as to (op VAR CST).

VLA vector "constants" are really in intermediate category between
variable and constant.

The approach that the patch takes is to add a "rule of thumb" that
applies to all (op X CST), regardless of whether X is constant.
It doesn't work by doing constant arithmetic per se.  If we add
those rules of thumb here, I think it'll keep growing and growing.

Thanks,
Richard
  
Richard Biener Dec. 20, 2023, 10:06 a.m. UTC | #16
On Wed, 20 Dec 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 19 Dec 2023, Andrew Pinski wrote:
> >
> >> On Tue, Dec 19, 2023 at 2:40?AM Richard Sandiford
> >> <richard.sandiford@arm.com> wrote:
> >> >
> >> > Richard Biener <rguenther@suse.de> writes:
> >> > > On Tue, 19 Dec 2023, juzhe.zhong@rivai.ai wrote:
> >> > >
> >> > >> Hi, Richard.
> >> > >>
> >> > >> After investigating the codes:
> >> > >> /* Return true if EXPR is the integer constant zero or a complex constant
> >> > >>    of zero, or a location wrapper for such a constant.  */
> >> > >>
> >> > >> bool
> >> > >> integer_zerop (const_tree expr)
> >> > >> {
> >> > >>   STRIP_ANY_LOCATION_WRAPPER (expr);
> >> > >>
> >> > >>   switch (TREE_CODE (expr))
> >> > >>     {
> >> > >>     case INTEGER_CST:
> >> > >>       return wi::to_wide (expr) == 0;
> >> > >>     case COMPLEX_CST:
> >> > >>       return (integer_zerop (TREE_REALPART (expr))
> >> > >>               && integer_zerop (TREE_IMAGPART (expr)));
> >> > >>     case VECTOR_CST:
> >> > >>       return (VECTOR_CST_NPATTERNS (expr) == 1
> >> > >>               && VECTOR_CST_DUPLICATE_P (expr)
> >> > >>               && integer_zerop (VECTOR_CST_ENCODED_ELT (expr, 0)));
> >> > >>     default:
> >> > >>       return false;
> >> > >>     }
> >> > >> }
> >> > >>
> >> > >> I wonder whether we can simplify the codes as follows :?
> >> > >>       if (integer_zerop (arg1) || integer_zerop (arg2))
> >> > >>         step_ok_p = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR
> >> > >>                      || code == BIT_XOR_EXPR);
> >> > >
> >> > > Possibly.  I'll let Richard S. comment on the whole structure.
> >> >
> >> > The current code is handling cases that require elementwise arithmetic.
> >> > ISTM that what we're really doing here is identifying cases where
> >> > whole-vector arithmetic is possible instead.  I think that should be
> >> > a separate pre-step, rather than integrated into the current code.
> >> >
> >> > Largely this would consist of writing out match.pd-style folds in
> >> > C++ code, so Andrew's fix in comment 7 seems neater to me.
> >> 
> >> I didn't like the change to match.pd (even with a comment on why)
> >> because it violates the whole idea behind canonicalization of
> >> constants being 2nd operand of commutative and comparison expressions.
> >> Maybe there are only a few limited match/simplify patterns which need
> >> to add the :c for constants not being the 2nd operand but there needs
> >> to be a comment on why :c is needed for this.
> >
> > Agreed.  Note that in theory we of course could define extra
> > canonicalization rules for CST op CST in tree_swap_operands_p,
> > there are some cases those surivive, mostly around FP and
> > dynamic rounding state.  I'd rather go that route if we decide
> > that match.pd should catch this.
> 
> I don't think that'll help in all cases though.  E.g. consider an
> unfoldable:
> 
>   (or 1 CST)
> 
> where CST is "complicated".  We'd probably canonicalise that to:
> 
>   (or CST 1)
> 
> And that's good if we have:
> 
>   (or (or CST 1) 2)
> 
> since it could be folded to:
> 
>   (or CST 3)
> 
> But there are other constants CST2 for which (or CST CST2) is foldable
> and (op 1 CST2) isn't.  So:
> 
>   (or (or 1 CST) CST2)
> 
> would sometimes be foldable in cases where:
> 
>   (or (or CST 1) CST2)
> 
> isn't.

OK, I was thinking of only handling VECTOR_CST_NELTS_PER_PATTERN
for example as additional (cheap) heuristic so we put
VECTOR_CST_DUPLICATE_P second (this would cover the particular
cases in this thread).

> >> >
> >> > But if this must happen in const_binop instead, then we could have
> >> > a function like:
> >> 
> >> The reasoning of why it should be in const_binop rather than in match
> >> is because both operands are constants.
> >
> > +1
> 
> I can see that argument for the traditional case where all constants
> can be folded at compile time.  But that isn't the case for VLA constants.
> (op CST1 CST2) is sometimes not knowable at compile time.  And I think
> match.pd should apply to those cases just as much as to (op VAR CST).
> 
> VLA vector "constants" are really in intermediate category between
> variable and constant.
> 
> The approach that the patch takes is to add a "rule of thumb" that
> applies to all (op X CST), regardless of whether X is constant.
> It doesn't work by doing constant arithmetic per se.  If we add
> those rules of thumb here, I think it'll keep growing and growing.

But doesn't this mean we can't rely on folding (match.pd) not seeing
un-constant-folded operations and thus proper canonicalization?
Which means we'd possibly have to alter _all_ (op X CST) cases to
use :c?

> Thanks,
> Richard
>
  
Robin Dapp Jan. 15, 2024, 3:23 p.m. UTC | #17
I gave it another shot now by introducing a separate function as
Richard suggested.  It's probably not at the location he intended.

The way I read the discussion there hasn't been any consensus
on how (or rather where) to properly tackle the problem.  Any
other ideas still?

Regards
 Robin


Found in PR112971 this patch adds folding support for bitwise operations
of const duplicate zero/one vectors with stepped vectors.
On riscv we have the situation that a folding would perpetually continue
without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
not be folded to {0, 0, 0, ...}.

gcc/ChangeLog:

	PR middle-end/112971

	* fold-const.cc (simplify_const_binop): New function for binop
	simplification of two constant vectors when element-wise
	handling is not necessary.
	(const_binop): Call new function.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr112971.c: New test.
---
 gcc/fold-const.cc                             | 31 +++++++++++++++++++
 .../gcc.target/riscv/rvv/autovec/pr112971.c   | 18 +++++++++++
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 385e4a69ab3..2ef425aec0f 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1343,6 +1343,29 @@ distributes_over_addition_p (tree_code op, int opno)
     }
 }
 
+/* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP
+   is the other operand.  Try to use the value of OP to simplify the
+   operation in one step, without having to process individual elements.  */
+static tree
+simplify_const_binop (tree_code code, tree op, tree other_op,
+		      int index ATTRIBUTE_UNUSED)
+{
+  /* AND, IOR as well as XOR with a zerop can be simplified directly.  */
+  if (TREE_CODE (op) == VECTOR_CST && TREE_CODE (other_op) == VECTOR_CST)
+    {
+      if (integer_zerop (other_op))
+	{
+	  if (code == BIT_IOR_EXPR || code == BIT_XOR_EXPR)
+	    return op;
+	  else if (code == BIT_AND_EXPR)
+	    return other_op;
+	}
+    }
+
+  return NULL_TREE;
+}
+
+
 /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
    constant.  We assume ARG1 and ARG2 have the same data type, or at least
    are the same kind of constant and the same machine mode.  Return zero if
@@ -1646,6 +1669,14 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
 	return build_complex (type, real, imag);
     }
 
+  tree simplified;
+  if ((simplified = simplify_const_binop (code, arg1, arg2, 0)))
+    return simplified;
+
+  if (commutative_tree_code (code)
+      && (simplified = simplify_const_binop (code, arg2, arg1, 1)))
+    return simplified;
+
   if (TREE_CODE (arg1) == VECTOR_CST
       && TREE_CODE (arg2) == VECTOR_CST
       && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)),
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
new file mode 100644
index 00000000000..816ebd3c493
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
@@ -0,0 +1,18 @@
+/* { dg-do compile }  */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
+
+int a;
+short b[9];
+char c, d;
+void e() {
+  d = 0;
+  for (;; d++) {
+    if (b[d])
+      break;
+    a = 8;
+    for (; a >= 0; a--) {
+      char *f = &c;
+      *f &= d == (a & d);
+    }
+  }
+}
  
Richard Biener Jan. 16, 2024, 7:17 a.m. UTC | #18
On Mon, 15 Jan 2024, Robin Dapp wrote:

> I gave it another shot now by introducing a separate function as
> Richard suggested.  It's probably not at the location he intended.
> 
> The way I read the discussion there hasn't been any consensus
> on how (or rather where) to properly tackle the problem.  Any
> other ideas still?

I'm happy enough with the patch, esp. at this stage.  OK if
Richard S. doesn't disagree.

Thanks,
Richard.

> Regards
>  Robin
> 
> 
> Found in PR112971 this patch adds folding support for bitwise operations
> of const duplicate zero/one vectors with stepped vectors.
> On riscv we have the situation that a folding would perpetually continue
> without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
> not be folded to {0, 0, 0, ...}.
> 
> gcc/ChangeLog:
> 
> 	PR middle-end/112971
> 
> 	* fold-const.cc (simplify_const_binop): New function for binop
> 	simplification of two constant vectors when element-wise
> 	handling is not necessary.
> 	(const_binop): Call new function.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/autovec/pr112971.c: New test.
> ---
>  gcc/fold-const.cc                             | 31 +++++++++++++++++++
>  .../gcc.target/riscv/rvv/autovec/pr112971.c   | 18 +++++++++++
>  2 files changed, 49 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 385e4a69ab3..2ef425aec0f 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -1343,6 +1343,29 @@ distributes_over_addition_p (tree_code op, int opno)
>      }
>  }
>  
> +/* OP is the INDEXth operand to CODE (counting from zero) and OTHER_OP
> +   is the other operand.  Try to use the value of OP to simplify the
> +   operation in one step, without having to process individual elements.  */
> +static tree
> +simplify_const_binop (tree_code code, tree op, tree other_op,
> +		      int index ATTRIBUTE_UNUSED)
> +{
> +  /* AND, IOR as well as XOR with a zerop can be simplified directly.  */
> +  if (TREE_CODE (op) == VECTOR_CST && TREE_CODE (other_op) == VECTOR_CST)
> +    {
> +      if (integer_zerop (other_op))
> +	{
> +	  if (code == BIT_IOR_EXPR || code == BIT_XOR_EXPR)
> +	    return op;
> +	  else if (code == BIT_AND_EXPR)
> +	    return other_op;
> +	}
> +    }
> +
> +  return NULL_TREE;
> +}
> +
> +
>  /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
>     constant.  We assume ARG1 and ARG2 have the same data type, or at least
>     are the same kind of constant and the same machine mode.  Return zero if
> @@ -1646,6 +1669,14 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
>  	return build_complex (type, real, imag);
>      }
>  
> +  tree simplified;
> +  if ((simplified = simplify_const_binop (code, arg1, arg2, 0)))
> +    return simplified;
> +
> +  if (commutative_tree_code (code)
> +      && (simplified = simplify_const_binop (code, arg2, arg1, 1)))
> +    return simplified;
> +
>    if (TREE_CODE (arg1) == VECTOR_CST
>        && TREE_CODE (arg2) == VECTOR_CST
>        && known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)),
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> new file mode 100644
> index 00000000000..816ebd3c493
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile }  */
> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
> +
> +int a;
> +short b[9];
> +char c, d;
> +void e() {
> +  d = 0;
> +  for (;; d++) {
> +    if (b[d])
> +      break;
> +    a = 8;
> +    for (; a >= 0; a--) {
> +      char *f = &c;
> +      *f &= d == (a & d);
> +    }
> +  }
> +}
>
  
Richard Sandiford Jan. 24, 2024, 11:29 a.m. UTC | #19
Richard Biener <rguenther@suse.de> writes:
> On Mon, 15 Jan 2024, Robin Dapp wrote:
>
>> I gave it another shot now by introducing a separate function as
>> Richard suggested.  It's probably not at the location he intended.
>> 
>> The way I read the discussion there hasn't been any consensus
>> on how (or rather where) to properly tackle the problem.  Any
>> other ideas still?
>
> I'm happy enough with the patch, esp. at this stage.  OK if
> Richard S. doesn't disagree.

Yeah, no objection from me.

Richard
  

Patch

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index f5d68ac323a..43ed097bf5c 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1653,8 +1653,20 @@  const_binop (enum tree_code code, tree arg1, tree arg2)
     {
       tree type = TREE_TYPE (arg1);
       bool step_ok_p;
+
+      /* AND, IOR as well as XOR with a zerop can be handled directly.  */
       if (VECTOR_CST_STEPPED_P (arg1)
-	  && VECTOR_CST_STEPPED_P (arg2))
+	  && VECTOR_CST_DUPLICATE_P (arg2)
+	  && integer_zerop (VECTOR_CST_ELT (arg2, 0)))
+	step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
+	  || code == BIT_XOR_EXPR;
+      else if (VECTOR_CST_STEPPED_P (arg2)
+	       && VECTOR_CST_DUPLICATE_P (arg1)
+	       && integer_zerop (VECTOR_CST_ELT (arg1, 0)))
+	step_ok_p = code == BIT_AND_EXPR || code == BIT_IOR_EXPR
+	  || code == BIT_XOR_EXPR;
+      else if (VECTOR_CST_STEPPED_P (arg1)
+	       && VECTOR_CST_STEPPED_P (arg2))
 	/* We can operate directly on the encoding if:
 
 	      a3 - a2 == a2 - a1 && b3 - b2 == b2 - b1
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
new file mode 100644
index 00000000000..816ebd3c493
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112971.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile }  */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-vect-cost-model" }  */
+
+int a;
+short b[9];
+char c, d;
+void e() {
+  d = 0;
+  for (;; d++) {
+    if (b[d])
+      break;
+    a = 8;
+    for (; a >= 0; a--) {
+      char *f = &c;
+      *f &= d == (a & d);
+    }
+  }
+}