i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

Message ID 20240521061617.667442-1-haochen.jiang@intel.com
State New
Headers
Series i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_gcc_check--master-arm success Testing passed

Commit Message

Jiang, Haochen May 21, 2024, 6:16 a.m. UTC
  Hi all,

Since vpermq is really slow, we should avoid using it when it is
the only instruction could be used for ix86_expand_vecop_qihi2.

Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen

gcc/ChangeLog:

        PR target/115069
	* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
	Do not enable the optimization when AVX512BW is not enabled.
---
 gcc/config/i386/i386-expand.cc | 5 +++++
 1 file changed, 5 insertions(+)
  

Comments

Hongtao Liu May 21, 2024, 6:17 a.m. UTC | #1
On Tue, May 21, 2024 at 2:16 PM Haochen Jiang <haochen.jiang@intel.com> wrote:
>
> Hi all,
>
> Since vpermq is really slow, we should avoid using it when it is
> the only instruction could be used for ix86_expand_vecop_qihi2.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk?
Please add a testcase for it.
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
>         PR target/115069
>         * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
>         Do not enable the optimization when AVX512BW is not enabled.
> ---
>  gcc/config/i386/i386-expand.cc | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index a6132911e6a..f24c800bb4f 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -24323,6 +24323,11 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, rtx op1, rtx op2)
>    bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
>    bool uns_p = code != ASHIFTRT;
>
> +  /* vpermq is slow and we should not fall into the optimization when
> +     it is the only instruction to be selected.  */
> +  if (!TARGET_AVX512BW)
> +    return false;
> +
>    if ((qimode == V16QImode && !TARGET_AVX2)
>        || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
>        /* There are no V64HImode instructions.  */
> --
> 2.31.1
>
  
Uros Bizjak May 21, 2024, 6:35 a.m. UTC | #2
On Tue, May 21, 2024 at 8:16 AM Haochen Jiang <haochen.jiang@intel.com> wrote:
>
> Hi all,
>
> Since vpermq is really slow, we should avoid using it when it is
> the only instruction could be used for ix86_expand_vecop_qihi2.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
>         PR target/115069
>         * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
>         Do not enable the optimization when AVX512BW is not enabled.
> ---
>  gcc/config/i386/i386-expand.cc | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index a6132911e6a..f24c800bb4f 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -24323,6 +24323,11 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, rtx op1, rtx op2)
>    bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
>    bool uns_p = code != ASHIFTRT;
>
> +  /* vpermq is slow and we should not fall into the optimization when
> +     it is the only instruction to be selected.  */

Please rather say something like:

/* Without VPMOVWB (provided by AVX512BW ISA), the expansion uses the generic
permutation to merge the data back into the right place.  This
permutation results
in VPERMQ, which is slow, so better fall back to expand_vecop_qihi.  */

Uros.

> +  if (!TARGET_AVX512BW)
> +    return false;
> +
>    if ((qimode == V16QImode && !TARGET_AVX2)
>        || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
>        /* There are no V64HImode instructions.  */
> --
> 2.31.1
>
  

Patch

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a6132911e6a..f24c800bb4f 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24323,6 +24323,11 @@  ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, rtx op1, rtx op2)
   bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
   bool uns_p = code != ASHIFTRT;
 
+  /* vpermq is slow and we should not fall into the optimization when
+     it is the only instruction to be selected.  */
+  if (!TARGET_AVX512BW)
+    return false;
+
   if ((qimode == V16QImode && !TARGET_AVX2)
       || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
       /* There are no V64HImode instructions.  */