vect: Fold LEN_{LOAD,STORE} if it's for the whole vector [PR107412]

Message ID 94ac390b-a770-c868-051b-75319eb7f81d@linux.ibm.com
State New
Headers
Series vect: Fold LEN_{LOAD,STORE} if it's for the whole vector [PR107412] |

Commit Message

Kewen.Lin Nov. 2, 2022, 7:59 a.m. UTC
  Hi,

As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
STORE} into normal vector load/store if the given length is known
to be equal to the length of the whole vector.  It would help to
improve overall cycles as normally the latency of vector access
with length in bytes is bigger than normal vector access, and it
also saves the preparation for length if constant length can not
be encoded into instruction (such as on power).

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-----
	PR tree-optimization/107412

gcc/ChangeLog:

	* gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
	(gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
	mask_p indicating it's for mask or length, and add some handlings for
	IFN LEN_{LOAD,STORE}.
	(gimple_fold_mask_load): Rename to ...
	(gimple_fold_partial_load): ... this, add one parameter mask_p.
	(gimple_fold_mask_store): Rename to ...
	(gimple_fold_partial_store): ... this, add one parameter mask_p.
	(gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
	and adjust calls on gimple_fold_mask_load_store_mem_ref to
	gimple_fold_partial_load_store_mem_ref.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr107412.c: New test.
	* gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
	folded LEN_LOAD.
---
 gcc/gimple-fold.cc                            | 57 ++++++++++++++-----
 .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++++++
 3 files changed, 64 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c

--
2.27.0
  

Comments

Richard Biener Nov. 5, 2022, 11:40 a.m. UTC | #1
On Wed, Nov 2, 2022 at 8:59 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
> STORE} into normal vector load/store if the given length is known
> to be equal to the length of the whole vector.  It would help to
> improve overall cycles as normally the latency of vector access
> with length in bytes is bigger than normal vector access, and it
> also saves the preparation for length if constant length can not
> be encoded into instruction (such as on power).
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK.

>
> BR,
> Kewen
> -----
>         PR tree-optimization/107412
>
> gcc/ChangeLog:
>
>         * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
>         (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
>         mask_p indicating it's for mask or length, and add some handlings for
>         IFN LEN_{LOAD,STORE}.
>         (gimple_fold_mask_load): Rename to ...
>         (gimple_fold_partial_load): ... this, add one parameter mask_p.
>         (gimple_fold_mask_store): Rename to ...
>         (gimple_fold_partial_store): ... this, add one parameter mask_p.
>         (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
>         and adjust calls on gimple_fold_mask_load_store_mem_ref to
>         gimple_fold_partial_load_store_mem_ref.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/powerpc/pr107412.c: New test.
>         * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
>         folded LEN_LOAD.
> ---
>  gcc/gimple-fold.cc                            | 57 ++++++++++++++-----
>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++++++
>  3 files changed, 64 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index a1704784bc9..e3a087defa6 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree type,
>    return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
>  }
>
> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
>     for the memory it references, otherwise return null.  VECTYPE is the
> -   type of the memory vector.  */
> +   type of the memory vector.  MASK_P indicates it's for MASK if true,
> +   otherwise it's for LEN.  */
>
>  static tree
> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
>  {
>    tree ptr = gimple_call_arg (call, 0);
>    tree alias_align = gimple_call_arg (call, 1);
> -  tree mask = gimple_call_arg (call, 2);
> -  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
> +  if (!tree_fits_uhwi_p (alias_align))
>      return NULL_TREE;
>
> +  if (mask_p)
> +    {
> +      tree mask = gimple_call_arg (call, 2);
> +      if (!integer_all_onesp (mask))
> +       return NULL_TREE;
> +    } else {
> +      tree basic_len = gimple_call_arg (call, 2);
> +      if (!tree_fits_uhwi_p (basic_len))
> +       return NULL_TREE;
> +      unsigned int nargs = gimple_call_num_args (call);
> +      tree bias = gimple_call_arg (call, nargs - 1);
> +      gcc_assert (tree_fits_uhwi_p (bias));
> +      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
> +      unsigned int len = tree_to_uhwi (biased_len);
> +      unsigned int vect_len
> +       = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
> +      if (vect_len != len)
> +       return NULL_TREE;
> +    }
> +
>    unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
>    if (TYPE_ALIGN (vectype) != align)
>      vectype = build_aligned_type (vectype, align);
> @@ -5390,16 +5410,18 @@ gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
>    return fold_build2 (MEM_REF, vectype, ptr, offset);
>  }
>
> -/* Try to fold IFN_MASK_LOAD call CALL.  Return true on success.  */
> +/* Try to fold IFN_{MASK,LEN}_LOAD call CALL.  Return true on success.
> +   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */
>
>  static bool
> -gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call)
> +gimple_fold_partial_load (gimple_stmt_iterator *gsi, gcall *call, bool mask_p)
>  {
>    tree lhs = gimple_call_lhs (call);
>    if (!lhs)
>      return false;
>
> -  if (tree rhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (lhs)))
> +  if (tree rhs
> +      = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (lhs), mask_p))
>      {
>        gassign *new_stmt = gimple_build_assign (lhs, rhs);
>        gimple_set_location (new_stmt, gimple_location (call));
> @@ -5410,13 +5432,16 @@ gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call)
>    return false;
>  }
>
> -/* Try to fold IFN_MASK_STORE call CALL.  Return true on success.  */
> +/* Try to fold IFN_{MASK,LEN}_STORE call CALL.  Return true on success.
> +   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */
>
>  static bool
> -gimple_fold_mask_store (gimple_stmt_iterator *gsi, gcall *call)
> +gimple_fold_partial_store (gimple_stmt_iterator *gsi, gcall *call,
> +                          bool mask_p)
>  {
>    tree rhs = gimple_call_arg (call, 3);
> -  if (tree lhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (rhs)))
> +  if (tree lhs
> +      = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (rhs), mask_p))
>      {
>        gassign *new_stmt = gimple_build_assign (lhs, rhs);
>        gimple_set_location (new_stmt, gimple_location (call));
> @@ -5634,10 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace)
>           cplx_result = true;
>           break;
>         case IFN_MASK_LOAD:
> -         changed |= gimple_fold_mask_load (gsi, stmt);
> +         changed |= gimple_fold_partial_load (gsi, stmt, true);
>           break;
>         case IFN_MASK_STORE:
> -         changed |= gimple_fold_mask_store (gsi, stmt);
> +         changed |= gimple_fold_partial_store (gsi, stmt, true);
> +         break;
> +       case IFN_LEN_LOAD:
> +         changed |= gimple_fold_partial_load (gsi, stmt, false);
> +         break;
> +       case IFN_LEN_STORE:
> +         changed |= gimple_fold_partial_store (gsi, stmt, false);
>           break;
>         default:
>           break;
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> index 961df0d5646..8b9c9107814 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> @@ -8,5 +8,5 @@
>
>  #include "p9-vec-length-8.h"
>
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */
>  /* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr107412.c b/gcc/testsuite/gcc.target/powerpc/pr107412.c
> new file mode 100644
> index 00000000000..4526ea8639d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr107412.c
> @@ -0,0 +1,19 @@
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize -fno-vect-cost-model -funroll-loops -fno-tree-loop-distribute-patterns --param vect-partial-vector-usage=2 -fdump-tree-optimized" } */
> +
> +/* Verify there is only one IFN call LEN_LOAD and IFN_STORE separately.  */
> +
> +#define N 16
> +int src[N];
> +int dest[N];
> +
> +void
> +foo ()
> +{
> +  for (int i = 0; i < (N - 1); i++)
> +    dest[i] = src[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times {\mLEN_LOAD\M} 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times {\mLEN_STORE\M} 1 "optimized" } } */
> --
> 2.27.0
  
Richard Sandiford Nov. 24, 2022, 9:24 a.m. UTC | #2
"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi,
>
> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
> STORE} into normal vector load/store if the given length is known
> to be equal to the length of the whole vector.  It would help to
> improve overall cycles as normally the latency of vector access
> with length in bytes is bigger than normal vector access, and it
> also saves the preparation for length if constant length can not
> be encoded into instruction (such as on power).
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -----
> 	PR tree-optimization/107412
>
> gcc/ChangeLog:
>
> 	* gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
> 	(gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
> 	mask_p indicating it's for mask or length, and add some handlings for
> 	IFN LEN_{LOAD,STORE}.
> 	(gimple_fold_mask_load): Rename to ...
> 	(gimple_fold_partial_load): ... this, add one parameter mask_p.
> 	(gimple_fold_mask_store): Rename to ...
> 	(gimple_fold_partial_store): ... this, add one parameter mask_p.
> 	(gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
> 	and adjust calls on gimple_fold_mask_load_store_mem_ref to
> 	gimple_fold_partial_load_store_mem_ref.

Sorry to reply to late (still catching up on email), but:

>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/pr107412.c: New test.
> 	* gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
> 	folded LEN_LOAD.
> ---
>  gcc/gimple-fold.cc                            | 57 ++++++++++++++-----
>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++++++
>  3 files changed, 64 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index a1704784bc9..e3a087defa6 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree type,
>    return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
>  }
>
> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
>     for the memory it references, otherwise return null.  VECTYPE is the
> -   type of the memory vector.  */
> +   type of the memory vector.  MASK_P indicates it's for MASK if true,
> +   otherwise it's for LEN.  */
>
>  static tree
> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
>  {
>    tree ptr = gimple_call_arg (call, 0);
>    tree alias_align = gimple_call_arg (call, 1);
> -  tree mask = gimple_call_arg (call, 2);
> -  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
> +  if (!tree_fits_uhwi_p (alias_align))
>      return NULL_TREE;
>
> +  if (mask_p)
> +    {
> +      tree mask = gimple_call_arg (call, 2);
> +      if (!integer_all_onesp (mask))
> +	return NULL_TREE;
> +    } else {

Minor nit: }, else, and { should be on separate lines.  But the thing
I actually wanted to say was...

> +      tree basic_len = gimple_call_arg (call, 2);
> +      if (!tree_fits_uhwi_p (basic_len))
> +	return NULL_TREE;
> +      unsigned int nargs = gimple_call_num_args (call);
> +      tree bias = gimple_call_arg (call, nargs - 1);
> +      gcc_assert (tree_fits_uhwi_p (bias));
> +      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
> +      unsigned int len = tree_to_uhwi (biased_len);
> +      unsigned int vect_len
> +	= GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
> +      if (vect_len != len)
> +	return NULL_TREE;

Using "unsigned int" truncates the value.  I realise that's probably
safe in this context, since large values have undefined behaviour.
But it still seems better to use an untruncated type, so that it
looks less like an oversight.  (I think this is one case where "auto"
can help, since it gets the type right automatically.)

It would also be better to avoid the to_constant, since we haven't
proven is_constant.  How about:

      tree basic_len = gimple_call_arg (call, 2);
      if (!poly_int_tree_p (basic_len))
	return NULL_TREE;
      unsigned int nargs = gimple_call_num_args (call);
      tree bias = gimple_call_arg (call, nargs - 1);
      gcc_assert (TREE_CODE (bias) == INTEGER_CST);
      if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
		    GET_MODE_SIZE (TYPE_MODE (vectype))))
	return NULL_TREE;

which also avoids using tree arithmetic for the subtraction?

Thanks,
Richard


> +    }
> +
>    unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
>    if (TYPE_ALIGN (vectype) != align)
>      vectype = build_aligned_type (vectype, align);
> @@ -5390,16 +5410,18 @@ gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
>    return fold_build2 (MEM_REF, vectype, ptr, offset);
>  }
>
> -/* Try to fold IFN_MASK_LOAD call CALL.  Return true on success.  */
> +/* Try to fold IFN_{MASK,LEN}_LOAD call CALL.  Return true on success.
> +   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */
>
>  static bool
> -gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call)
> +gimple_fold_partial_load (gimple_stmt_iterator *gsi, gcall *call, bool mask_p)
>  {
>    tree lhs = gimple_call_lhs (call);
>    if (!lhs)
>      return false;
>
> -  if (tree rhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (lhs)))
> +  if (tree rhs
> +      = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (lhs), mask_p))
>      {
>        gassign *new_stmt = gimple_build_assign (lhs, rhs);
>        gimple_set_location (new_stmt, gimple_location (call));
> @@ -5410,13 +5432,16 @@ gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call)
>    return false;
>  }
>
> -/* Try to fold IFN_MASK_STORE call CALL.  Return true on success.  */
> +/* Try to fold IFN_{MASK,LEN}_STORE call CALL.  Return true on success.
> +   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */
>
>  static bool
> -gimple_fold_mask_store (gimple_stmt_iterator *gsi, gcall *call)
> +gimple_fold_partial_store (gimple_stmt_iterator *gsi, gcall *call,
> +			   bool mask_p)
>  {
>    tree rhs = gimple_call_arg (call, 3);
> -  if (tree lhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (rhs)))
> +  if (tree lhs
> +      = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (rhs), mask_p))
>      {
>        gassign *new_stmt = gimple_build_assign (lhs, rhs);
>        gimple_set_location (new_stmt, gimple_location (call));
> @@ -5634,10 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace)
>  	  cplx_result = true;
>  	  break;
>  	case IFN_MASK_LOAD:
> -	  changed |= gimple_fold_mask_load (gsi, stmt);
> +	  changed |= gimple_fold_partial_load (gsi, stmt, true);
>  	  break;
>  	case IFN_MASK_STORE:
> -	  changed |= gimple_fold_mask_store (gsi, stmt);
> +	  changed |= gimple_fold_partial_store (gsi, stmt, true);
> +	  break;
> +	case IFN_LEN_LOAD:
> +	  changed |= gimple_fold_partial_load (gsi, stmt, false);
> +	  break;
> +	case IFN_LEN_STORE:
> +	  changed |= gimple_fold_partial_store (gsi, stmt, false);
>  	  break;
>  	default:
>  	  break;
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> index 961df0d5646..8b9c9107814 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
> @@ -8,5 +8,5 @@
>
>  #include "p9-vec-length-8.h"
>
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */
>  /* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr107412.c b/gcc/testsuite/gcc.target/powerpc/pr107412.c
> new file mode 100644
> index 00000000000..4526ea8639d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr107412.c
> @@ -0,0 +1,19 @@
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize -fno-vect-cost-model -funroll-loops -fno-tree-loop-distribute-patterns --param vect-partial-vector-usage=2 -fdump-tree-optimized" } */
> +
> +/* Verify there is only one IFN call LEN_LOAD and IFN_STORE separately.  */
> +
> +#define N 16
> +int src[N];
> +int dest[N];
> +
> +void
> +foo ()
> +{
> +  for (int i = 0; i < (N - 1); i++)
> +    dest[i] = src[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times {\mLEN_LOAD\M} 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times {\mLEN_STORE\M} 1 "optimized" } } */
> --
> 2.27.0
  
Kewen.Lin Nov. 28, 2022, 2:57 a.m. UTC | #3
Hi Richard,

on 2022/11/24 17:24, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> Hi,
>>
>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
>> STORE} into normal vector load/store if the given length is known
>> to be equal to the length of the whole vector.  It would help to
>> improve overall cycles as normally the latency of vector access
>> with length in bytes is bigger than normal vector access, and it
>> also saves the preparation for length if constant length can not
>> be encoded into instruction (such as on power).
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
>> 	PR tree-optimization/107412
>>
>> gcc/ChangeLog:
>>
>> 	* gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
>> 	(gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
>> 	mask_p indicating it's for mask or length, and add some handlings for
>> 	IFN LEN_{LOAD,STORE}.
>> 	(gimple_fold_mask_load): Rename to ...
>> 	(gimple_fold_partial_load): ... this, add one parameter mask_p.
>> 	(gimple_fold_mask_store): Rename to ...
>> 	(gimple_fold_partial_store): ... this, add one parameter mask_p.
>> 	(gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
>> 	and adjust calls on gimple_fold_mask_load_store_mem_ref to
>> 	gimple_fold_partial_load_store_mem_ref.
> 
> Sorry to reply to late (still catching up on email), but:
> 
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/powerpc/pr107412.c: New test.
>> 	* gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
>> 	folded LEN_LOAD.
>> ---
>>  gcc/gimple-fold.cc                            | 57 ++++++++++++++-----
>>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++++++
>>  3 files changed, 64 insertions(+), 14 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c
>>
>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>> index a1704784bc9..e3a087defa6 100644
>> --- a/gcc/gimple-fold.cc
>> +++ b/gcc/gimple-fold.cc
>> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree type,
>>    return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
>>  }
>>
>> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
>> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
>>     for the memory it references, otherwise return null.  VECTYPE is the
>> -   type of the memory vector.  */
>> +   type of the memory vector.  MASK_P indicates it's for MASK if true,
>> +   otherwise it's for LEN.  */
>>
>>  static tree
>> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
>> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
>>  {
>>    tree ptr = gimple_call_arg (call, 0);
>>    tree alias_align = gimple_call_arg (call, 1);
>> -  tree mask = gimple_call_arg (call, 2);
>> -  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
>> +  if (!tree_fits_uhwi_p (alias_align))
>>      return NULL_TREE;
>>
>> +  if (mask_p)
>> +    {
>> +      tree mask = gimple_call_arg (call, 2);
>> +      if (!integer_all_onesp (mask))
>> +	return NULL_TREE;
>> +    } else {
> 
> Minor nit: }, else, and { should be on separate lines.  But the thing
> I actually wanted to say was...

Thanks for catching, I must have forgotten to reformat these lines.

> 
>> +      tree basic_len = gimple_call_arg (call, 2);
>> +      if (!tree_fits_uhwi_p (basic_len))
>> +	return NULL_TREE;
>> +      unsigned int nargs = gimple_call_num_args (call);
>> +      tree bias = gimple_call_arg (call, nargs - 1);
>> +      gcc_assert (tree_fits_uhwi_p (bias));
>> +      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
>> +      unsigned int len = tree_to_uhwi (biased_len);
>> +      unsigned int vect_len
>> +	= GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
>> +      if (vect_len != len)
>> +	return NULL_TREE;
> 
> Using "unsigned int" truncates the value.  I realise that's probably
> safe in this context, since large values have undefined behaviour.
> But it still seems better to use an untruncated type, so that it
> looks less like an oversight.  (I think this is one case where "auto"
> can help, since it gets the type right automatically.)
> 
> It would also be better to avoid the to_constant, since we haven't
> proven is_constant.  How about:
> 
>       tree basic_len = gimple_call_arg (call, 2);
>       if (!poly_int_tree_p (basic_len))
> 	return NULL_TREE;
>       unsigned int nargs = gimple_call_num_args (call);
>       tree bias = gimple_call_arg (call, nargs - 1);
>       gcc_assert (TREE_CODE (bias) == INTEGER_CST);
>       if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
> 		    GET_MODE_SIZE (TYPE_MODE (vectype))))
> 	return NULL_TREE;
> 
> which also avoids using tree arithmetic for the subtraction?

I agree your proposed code has better robustness, thanks!

Sorry that the original patch was committed, I made a patch as attached.
It's bootstrapped and regresss-tested on powerpc64-linux-gnu P8, and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
From 3984a7f86a35d13e1fd40bc0c12ed5ad5b234047 Mon Sep 17 00:00:00 2001
From: Kewen Lin <linkw@linux.ibm.com>
Date: Sun, 27 Nov 2022 20:29:57 -0600
Subject: [PATCH] gimple-fold: Refine gimple_fold_partial_load_store_mem_ref

Following Richard's review comments, this patch is to use
untruncated type for the length used for IFN_LEN_{LOAD,STORE}
instead of "unsigned int" for better robustness.  It also
avoid to use to_constant and tree arithmetic for subtraction.

Co-authored-by: Richard Sandiford  <richard.sandiford@arm.com>

gcc/ChangeLog:

	* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Use
	untruncated type for the length, and avoid to_constant and tree
	arithmetic for subtraction.
---
 gcc/gimple-fold.cc | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c2d9c806aee..88d14c7adcc 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -5387,18 +5387,17 @@ gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
       tree mask = gimple_call_arg (call, 2);
       if (!integer_all_onesp (mask))
 	return NULL_TREE;
-    } else {
+    }
+  else
+    {
       tree basic_len = gimple_call_arg (call, 2);
-      if (!tree_fits_uhwi_p (basic_len))
+      if (!poly_int_tree_p (basic_len))
 	return NULL_TREE;
       unsigned int nargs = gimple_call_num_args (call);
       tree bias = gimple_call_arg (call, nargs - 1);
-      gcc_assert (tree_fits_shwi_p (bias));
-      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
-      unsigned int len = tree_to_uhwi (biased_len);
-      unsigned int vect_len
-	= GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
-      if (vect_len != len)
+      gcc_assert (TREE_CODE (bias) == INTEGER_CST);
+      if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
+		    GET_MODE_SIZE (TYPE_MODE (vectype))))
 	return NULL_TREE;
     }
  
Richard Sandiford Dec. 1, 2022, 11:28 a.m. UTC | #4
"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi Richard,
>
> on 2022/11/24 17:24, Richard Sandiford wrote:
>> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>>> Hi,
>>>
>>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
>>> STORE} into normal vector load/store if the given length is known
>>> to be equal to the length of the whole vector.  It would help to
>>> improve overall cycles as normally the latency of vector access
>>> with length in bytes is bigger than normal vector access, and it
>>> also saves the preparation for length if constant length can not
>>> be encoded into instruction (such as on power).
>>>
>>> Bootstrapped and regtested on x86_64-redhat-linux,
>>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -----
>>> 	PR tree-optimization/107412
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
>>> 	(gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
>>> 	mask_p indicating it's for mask or length, and add some handlings for
>>> 	IFN LEN_{LOAD,STORE}.
>>> 	(gimple_fold_mask_load): Rename to ...
>>> 	(gimple_fold_partial_load): ... this, add one parameter mask_p.
>>> 	(gimple_fold_mask_store): Rename to ...
>>> 	(gimple_fold_partial_store): ... this, add one parameter mask_p.
>>> 	(gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
>>> 	and adjust calls on gimple_fold_mask_load_store_mem_ref to
>>> 	gimple_fold_partial_load_store_mem_ref.
>> 
>> Sorry to reply to late (still catching up on email), but:
>> 
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.target/powerpc/pr107412.c: New test.
>>> 	* gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
>>> 	folded LEN_LOAD.
>>> ---
>>>  gcc/gimple-fold.cc                            | 57 ++++++++++++++-----
>>>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
>>>  gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++++++
>>>  3 files changed, 64 insertions(+), 14 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c
>>>
>>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>>> index a1704784bc9..e3a087defa6 100644
>>> --- a/gcc/gimple-fold.cc
>>> +++ b/gcc/gimple-fold.cc
>>> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree type,
>>>    return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
>>>  }
>>>
>>> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
>>> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
>>>     for the memory it references, otherwise return null.  VECTYPE is the
>>> -   type of the memory vector.  */
>>> +   type of the memory vector.  MASK_P indicates it's for MASK if true,
>>> +   otherwise it's for LEN.  */
>>>
>>>  static tree
>>> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
>>> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
>>>  {
>>>    tree ptr = gimple_call_arg (call, 0);
>>>    tree alias_align = gimple_call_arg (call, 1);
>>> -  tree mask = gimple_call_arg (call, 2);
>>> -  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
>>> +  if (!tree_fits_uhwi_p (alias_align))
>>>      return NULL_TREE;
>>>
>>> +  if (mask_p)
>>> +    {
>>> +      tree mask = gimple_call_arg (call, 2);
>>> +      if (!integer_all_onesp (mask))
>>> +	return NULL_TREE;
>>> +    } else {
>> 
>> Minor nit: }, else, and { should be on separate lines.  But the thing
>> I actually wanted to say was...
>
> Thanks for catching, I must have forgotten to reformat these lines.
>
>> 
>>> +      tree basic_len = gimple_call_arg (call, 2);
>>> +      if (!tree_fits_uhwi_p (basic_len))
>>> +	return NULL_TREE;
>>> +      unsigned int nargs = gimple_call_num_args (call);
>>> +      tree bias = gimple_call_arg (call, nargs - 1);
>>> +      gcc_assert (tree_fits_uhwi_p (bias));
>>> +      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
>>> +      unsigned int len = tree_to_uhwi (biased_len);
>>> +      unsigned int vect_len
>>> +	= GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
>>> +      if (vect_len != len)
>>> +	return NULL_TREE;
>> 
>> Using "unsigned int" truncates the value.  I realise that's probably
>> safe in this context, since large values have undefined behaviour.
>> But it still seems better to use an untruncated type, so that it
>> looks less like an oversight.  (I think this is one case where "auto"
>> can help, since it gets the type right automatically.)
>> 
>> It would also be better to avoid the to_constant, since we haven't
>> proven is_constant.  How about:
>> 
>>       tree basic_len = gimple_call_arg (call, 2);
>>       if (!poly_int_tree_p (basic_len))
>> 	return NULL_TREE;
>>       unsigned int nargs = gimple_call_num_args (call);
>>       tree bias = gimple_call_arg (call, nargs - 1);
>>       gcc_assert (TREE_CODE (bias) == INTEGER_CST);
>>       if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
>> 		    GET_MODE_SIZE (TYPE_MODE (vectype))))
>> 	return NULL_TREE;
>> 
>> which also avoids using tree arithmetic for the subtraction?
>
> I agree your proposed code has better robustness, thanks!
>
> Sorry that the original patch was committed, I made a patch as attached.
> It's bootstrapped and regresss-tested on powerpc64-linux-gnu P8, and
> powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?

OK, thanks.

Richard

> BR,
> Kewen
>
> From 3984a7f86a35d13e1fd40bc0c12ed5ad5b234047 Mon Sep 17 00:00:00 2001
> From: Kewen Lin <linkw@linux.ibm.com>
> Date: Sun, 27 Nov 2022 20:29:57 -0600
> Subject: [PATCH] gimple-fold: Refine gimple_fold_partial_load_store_mem_ref
>
> Following Richard's review comments, this patch is to use
> untruncated type for the length used for IFN_LEN_{LOAD,STORE}
> instead of "unsigned int" for better robustness.  It also
> avoid to use to_constant and tree arithmetic for subtraction.
>
> Co-authored-by: Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/ChangeLog:
>
> 	* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Use
> 	untruncated type for the length, and avoid to_constant and tree
> 	arithmetic for subtraction.
> ---
>  gcc/gimple-fold.cc | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index c2d9c806aee..88d14c7adcc 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -5387,18 +5387,17 @@ gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
>        tree mask = gimple_call_arg (call, 2);
>        if (!integer_all_onesp (mask))
>  	return NULL_TREE;
> -    } else {
> +    }
> +  else
> +    {
>        tree basic_len = gimple_call_arg (call, 2);
> -      if (!tree_fits_uhwi_p (basic_len))
> +      if (!poly_int_tree_p (basic_len))
>  	return NULL_TREE;
>        unsigned int nargs = gimple_call_num_args (call);
>        tree bias = gimple_call_arg (call, nargs - 1);
> -      gcc_assert (tree_fits_shwi_p (bias));
> -      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
> -      unsigned int len = tree_to_uhwi (biased_len);
> -      unsigned int vect_len
> -	= GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
> -      if (vect_len != len)
> +      gcc_assert (TREE_CODE (bias) == INTEGER_CST);
> +      if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
> +		    GET_MODE_SIZE (TYPE_MODE (vectype))))
>  	return NULL_TREE;
>      }
  

Patch

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index a1704784bc9..e3a087defa6 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -5370,19 +5370,39 @@  arith_overflowed_p (enum tree_code code, const_tree type,
   return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
 }

-/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
+/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
    for the memory it references, otherwise return null.  VECTYPE is the
-   type of the memory vector.  */
+   type of the memory vector.  MASK_P indicates it's for MASK if true,
+   otherwise it's for LEN.  */

 static tree
-gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
+gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
 {
   tree ptr = gimple_call_arg (call, 0);
   tree alias_align = gimple_call_arg (call, 1);
-  tree mask = gimple_call_arg (call, 2);
-  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
+  if (!tree_fits_uhwi_p (alias_align))
     return NULL_TREE;

+  if (mask_p)
+    {
+      tree mask = gimple_call_arg (call, 2);
+      if (!integer_all_onesp (mask))
+	return NULL_TREE;
+    } else {
+      tree basic_len = gimple_call_arg (call, 2);
+      if (!tree_fits_uhwi_p (basic_len))
+	return NULL_TREE;
+      unsigned int nargs = gimple_call_num_args (call);
+      tree bias = gimple_call_arg (call, nargs - 1);
+      gcc_assert (tree_fits_uhwi_p (bias));
+      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
+      unsigned int len = tree_to_uhwi (biased_len);
+      unsigned int vect_len
+	= GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
+      if (vect_len != len)
+	return NULL_TREE;
+    }
+
   unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
   if (TYPE_ALIGN (vectype) != align)
     vectype = build_aligned_type (vectype, align);
@@ -5390,16 +5410,18 @@  gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
   return fold_build2 (MEM_REF, vectype, ptr, offset);
 }

-/* Try to fold IFN_MASK_LOAD call CALL.  Return true on success.  */
+/* Try to fold IFN_{MASK,LEN}_LOAD call CALL.  Return true on success.
+   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */

 static bool
-gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call)
+gimple_fold_partial_load (gimple_stmt_iterator *gsi, gcall *call, bool mask_p)
 {
   tree lhs = gimple_call_lhs (call);
   if (!lhs)
     return false;

-  if (tree rhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (lhs)))
+  if (tree rhs
+      = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (lhs), mask_p))
     {
       gassign *new_stmt = gimple_build_assign (lhs, rhs);
       gimple_set_location (new_stmt, gimple_location (call));
@@ -5410,13 +5432,16 @@  gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call)
   return false;
 }

-/* Try to fold IFN_MASK_STORE call CALL.  Return true on success.  */
+/* Try to fold IFN_{MASK,LEN}_STORE call CALL.  Return true on success.
+   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */

 static bool
-gimple_fold_mask_store (gimple_stmt_iterator *gsi, gcall *call)
+gimple_fold_partial_store (gimple_stmt_iterator *gsi, gcall *call,
+			   bool mask_p)
 {
   tree rhs = gimple_call_arg (call, 3);
-  if (tree lhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (rhs)))
+  if (tree lhs
+      = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (rhs), mask_p))
     {
       gassign *new_stmt = gimple_build_assign (lhs, rhs);
       gimple_set_location (new_stmt, gimple_location (call));
@@ -5634,10 +5659,16 @@  gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace)
 	  cplx_result = true;
 	  break;
 	case IFN_MASK_LOAD:
-	  changed |= gimple_fold_mask_load (gsi, stmt);
+	  changed |= gimple_fold_partial_load (gsi, stmt, true);
 	  break;
 	case IFN_MASK_STORE:
-	  changed |= gimple_fold_mask_store (gsi, stmt);
+	  changed |= gimple_fold_partial_store (gsi, stmt, true);
+	  break;
+	case IFN_LEN_LOAD:
+	  changed |= gimple_fold_partial_load (gsi, stmt, false);
+	  break;
+	case IFN_LEN_STORE:
+	  changed |= gimple_fold_partial_store (gsi, stmt, false);
 	  break;
 	default:
 	  break;
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
index 961df0d5646..8b9c9107814 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c
@@ -8,5 +8,5 @@ 

 #include "p9-vec-length-8.h"

-/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */
+/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */
 /* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr107412.c b/gcc/testsuite/gcc.target/powerpc/pr107412.c
new file mode 100644
index 00000000000..4526ea8639d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr107412.c
@@ -0,0 +1,19 @@ 
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize -fno-vect-cost-model -funroll-loops -fno-tree-loop-distribute-patterns --param vect-partial-vector-usage=2 -fdump-tree-optimized" } */
+
+/* Verify there is only one IFN call LEN_LOAD and IFN_STORE separately.  */
+
+#define N 16
+int src[N];
+int dest[N];
+
+void
+foo ()
+{
+  for (int i = 0; i < (N - 1); i++)
+    dest[i] = src[i];
+}
+
+/* { dg-final { scan-tree-dump-times {\mLEN_LOAD\M} 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times {\mLEN_STORE\M} 1 "optimized" } } */