i386: Extend cvtps2pd to memory

Message ID 20220630055907.50030-1-haochen.jiang@intel.com
State New
Headers
Series i386: Extend cvtps2pd to memory |

Commit Message

Haochen Jiang June 30, 2022, 5:59 a.m. UTC
  Hi all,

This patch aims to fix the cvtps2pd insn, which should also work on
memory operand but currently does not. After this fix, when loop == 2,
it will eliminate movq instruction.

Regtested on x86_64-pc-linux-gnu. Ok for trunk?

BRs,
Haochen

gcc/ChangeLog:

	PR target/43618
	* config/i386/sse.md (extendv2sfv2df2): New define_expand.
	(sse2_cvtps2pd_load<mask_name>): Rename extendvsdfv2df2.

gcc/testsuite/ChangeLog:

	PR target/43618
	* gcc.target/i386/pr43618-1.c: New test.
---
 gcc/config/i386/sse.md                    | 24 ++++++++++++++++++-----
 gcc/testsuite/gcc.target/i386/pr43618-1.c | 13 ++++++++++++
 2 files changed, 32 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr43618-1.c
  

Comments

Uros Bizjak June 30, 2022, 6:20 a.m. UTC | #1
On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang <haochen.jiang@intel.com> wrote:
>
> Hi all,
>
> This patch aims to fix the cvtps2pd insn, which should also work on
> memory operand but currently does not. After this fix, when loop == 2,
> it will eliminate movq instruction.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
>         PR target/43618
>         * config/i386/sse.md (extendv2sfv2df2): New define_expand.
>         (sse2_cvtps2pd_load<mask_name>): Rename extendvsdfv2df2.
>
> gcc/testsuite/ChangeLog:
>
>         PR target/43618
>         * gcc.target/i386/pr43618-1.c: New test.

This patch could be as simple as:

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8cd0f617bf3..c331445cb2d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -9195,7 +9195,7 @@
(define_insn "extendv2sfv2df2"
  [(set (match_operand:V2DF 0 "register_operand" "=v")
       (float_extend:V2DF
-         (match_operand:V2SF 1 "register_operand" "v")))]
+         (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
  "TARGET_MMX_WITH_SSE"
  "%vcvtps2pd\t{%1, %0|%0, %1}"
  [(set_attr "type" "ssecvt")

Uros.
  
Li, Pan2 via Gcc-patches June 30, 2022, 7:24 a.m. UTC | #2
> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: Thursday, June 30, 2022 2:20 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> 
> On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang <haochen.jiang@intel.com>
> wrote:
> >
> > Hi all,
> >
> > This patch aims to fix the cvtps2pd insn, which should also work on
> > memory operand but currently does not. After this fix, when loop == 2,
> > it will eliminate movq instruction.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> >         PR target/43618
> >         * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> >         (sse2_cvtps2pd_load<mask_name>): Rename extendvsdfv2df2.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         PR target/43618
> >         * gcc.target/i386/pr43618-1.c: New test.
> 
> This patch could be as simple as:
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> 8cd0f617bf3..c331445cb2d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -9195,7 +9195,7 @@
> (define_insn "extendv2sfv2df2"
>   [(set (match_operand:V2DF 0 "register_operand" "=v")
>        (float_extend:V2DF
> -         (match_operand:V2SF 1 "register_operand" "v")))]
> +         (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
>   "TARGET_MMX_WITH_SSE"
>   "%vcvtps2pd\t{%1, %0|%0, %1}"
>   [(set_attr "type" "ssecvt")

We also tested on this version, it is ok.

The reason why the patch looks like this is because in the previous insn
sse2_cvtps2pd<mask_name>, the constraint vm and vector_operand
actually does not match the actual instruction. Memory operand is V2SF,
not V4SF.

Therefore, we changed the constraint in that insn. Then it caused another issue.
For memory operand, it seems that we cannot generate those mask instructions.
So I change the pattern to how extendv2hfv2df2 works.

Haochen

> Uros.
  
Uros Bizjak June 30, 2022, 7:41 a.m. UTC | #3
On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen <haochen.jiang@intel.com> wrote:
>
> > -----Original Message-----
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: Thursday, June 30, 2022 2:20 PM
> > To: Jiang, Haochen <haochen.jiang@intel.com>
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>
> > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> >
> > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang <haochen.jiang@intel.com>
> > wrote:
> > >
> > > Hi all,
> > >
> > > This patch aims to fix the cvtps2pd insn, which should also work on
> > > memory operand but currently does not. After this fix, when loop == 2,
> > > it will eliminate movq instruction.
> > >
> > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > >
> > > BRs,
> > > Haochen
> > >
> > > gcc/ChangeLog:
> > >
> > >         PR target/43618
> > >         * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > >         (sse2_cvtps2pd_load<mask_name>): Rename extendvsdfv2df2.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         PR target/43618
> > >         * gcc.target/i386/pr43618-1.c: New test.
> >
> > This patch could be as simple as:
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > 8cd0f617bf3..c331445cb2d 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -9195,7 +9195,7 @@
> > (define_insn "extendv2sfv2df2"
> >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> >        (float_extend:V2DF
> > -         (match_operand:V2SF 1 "register_operand" "v")))]
> > +         (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> >   "TARGET_MMX_WITH_SSE"
> >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> >   [(set_attr "type" "ssecvt")
>
> We also tested on this version, it is ok.
>
> The reason why the patch looks like this is because in the previous insn
> sse2_cvtps2pd<mask_name>, the constraint vm and vector_operand
> actually does not match the actual instruction. Memory operand is V2SF,
> not V4SF.
>
> Therefore, we changed the constraint in that insn. Then it caused another issue.
> For memory operand, it seems that we cannot generate those mask instructions.
> So I change the pattern to how extendv2hfv2df2 works.

If you want to change the memory access in sse2_cvtps2pd<mask_name>,
then please see how e.g. <insn>v2hiv2di is handled in sse.md. In
addition to two instructions, you will need one define_insn_and_split
with a pre-reload splitter.

Uros,
  
Uros Bizjak June 30, 2022, 8:45 a.m. UTC | #4
On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen <haochen.jiang@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: Uros Bizjak <ubizjak@gmail.com>
> > > Sent: Thursday, June 30, 2022 2:20 PM
> > > To: Jiang, Haochen <haochen.jiang@intel.com>
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>
> > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > >
> > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang <haochen.jiang@intel.com>
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This patch aims to fix the cvtps2pd insn, which should also work on
> > > > memory operand but currently does not. After this fix, when loop == 2,
> > > > it will eliminate movq instruction.
> > > >
> > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > >
> > > > BRs,
> > > > Haochen
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         PR target/43618
> > > >         * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > >         (sse2_cvtps2pd_load<mask_name>): Rename extendvsdfv2df2.

Rename FROM ...

Please also mention change to sse2_cvtps2pd<mask_name>.

> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         PR target/43618
> > > >         * gcc.target/i386/pr43618-1.c: New test.
> > >
> > > This patch could be as simple as:
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > 8cd0f617bf3..c331445cb2d 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -9195,7 +9195,7 @@
> > > (define_insn "extendv2sfv2df2"
> > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > >        (float_extend:V2DF
> > > -         (match_operand:V2SF 1 "register_operand" "v")))]
> > > +         (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > >   "TARGET_MMX_WITH_SSE"
> > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > >   [(set_attr "type" "ssecvt")
> >
> > We also tested on this version, it is ok.
> >
> > The reason why the patch looks like this is because in the previous insn
> > sse2_cvtps2pd<mask_name>, the constraint vm and vector_operand
> > actually does not match the actual instruction. Memory operand is V2SF,
> > not V4SF.
> >
> > Therefore, we changed the constraint in that insn. Then it caused another issue.
> > For memory operand, it seems that we cannot generate those mask instructions.
> > So I change the pattern to how extendv2hfv2df2 works.
>
> If you want to change the memory access in sse2_cvtps2pd<mask_name>,
> then please see how e.g. <insn>v2hiv2di is handled in sse.md. In
> addition to two instructions, you will need one define_insn_and_split
> with a pre-reload splitter.

Oh, nowadays combine does vec_select from a paradoxical subreg on its own.

+(define_expand "extendv2sfv2df2"
+  [(set (match_operand:V2DF 0 "register_operand")
+        (float_extend:V2DF
+          (match_operand:V2SF 1 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+{
+  if (!MEM_P (operands[1]))
+    {

You will need force reg here:

        rtx op1 = force_reg (V2SFmode, operands[1]);
+      operands[1] = lowpart_subreg (V4SFmode, op1, V2SFmode);
+      emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1]));
+      DONE;
+    }
+})


-(define_insn "extendv2sfv2df2"
+(define_insn "sse2_cvtps2pd_load<mask_name>"

Please name this insn "*sse2_cvtps2pd<mask_name>_1". Please note the
star at the beginning, You don't have to make the name public.

OK with the above changes.

Thanks,
Uros,
  
Uros Bizjak June 30, 2022, 8:53 a.m. UTC | #5
On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen <haochen.jiang@intel.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Uros Bizjak <ubizjak@gmail.com>
> > > > Sent: Thursday, June 30, 2022 2:20 PM
> > > > To: Jiang, Haochen <haochen.jiang@intel.com>
> > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>
> > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > > >
> > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang <haochen.jiang@intel.com>
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This patch aims to fix the cvtps2pd insn, which should also work on
> > > > > memory operand but currently does not. After this fix, when loop == 2,
> > > > > it will eliminate movq instruction.
> > > > >
> > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > >
> > > > > BRs,
> > > > > Haochen
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         PR target/43618
> > > > >         * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > > >         (sse2_cvtps2pd_load<mask_name>): Rename extendvsdfv2df2.
>
> Rename FROM ...
>
> Please also mention change to sse2_cvtps2pd<mask_name>.
>
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         PR target/43618
> > > > >         * gcc.target/i386/pr43618-1.c: New test.
> > > >
> > > > This patch could be as simple as:
> > > >
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > > 8cd0f617bf3..c331445cb2d 100644
> > > > --- a/gcc/config/i386/sse.md
> > > > +++ b/gcc/config/i386/sse.md
> > > > @@ -9195,7 +9195,7 @@
> > > > (define_insn "extendv2sfv2df2"
> > > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > > >        (float_extend:V2DF
> > > > -         (match_operand:V2SF 1 "register_operand" "v")))]
> > > > +         (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > > >   "TARGET_MMX_WITH_SSE"
> > > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > >   [(set_attr "type" "ssecvt")
> > >
> > > We also tested on this version, it is ok.
> > >
> > > The reason why the patch looks like this is because in the previous insn
> > > sse2_cvtps2pd<mask_name>, the constraint vm and vector_operand
> > > actually does not match the actual instruction. Memory operand is V2SF,
> > > not V4SF.
> > >
> > > Therefore, we changed the constraint in that insn. Then it caused another issue.
> > > For memory operand, it seems that we cannot generate those mask instructions.
> > > So I change the pattern to how extendv2hfv2df2 works.
> >
> > If you want to change the memory access in sse2_cvtps2pd<mask_name>,
> > then please see how e.g. <insn>v2hiv2di is handled in sse.md. In
> > addition to two instructions, you will need one define_insn_and_split
> > with a pre-reload splitter.
>
> Oh, nowadays combine does vec_select from a paradoxical subreg on its own.
>
> +(define_expand "extendv2sfv2df2"
> +  [(set (match_operand:V2DF 0 "register_operand")
> +        (float_extend:V2DF
> +          (match_operand:V2SF 1 "nonimmediate_operand")))]
> +  "TARGET_MMX_WITH_SSE"
> +{
> +  if (!MEM_P (operands[1]))
> +    {
>
> You will need force reg here:
>
>         rtx op1 = force_reg (V2SFmode, operands[1]);
> +      operands[1] = lowpart_subreg (V4SFmode, op1, V2SFmode);
> +      emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1]));
> +      DONE;
> +    }
> +})
>
>
> -(define_insn "extendv2sfv2df2"
> +(define_insn "sse2_cvtps2pd_load<mask_name>"
>
> Please name this insn "*sse2_cvtps2pd<mask_name>_1". Please note the
> star at the beginning, You don't have to make the name public.
>
> OK with the above changes.

Forgot to mention:


-         (match_operand:V2SF 1 "register_operand" "v")))]
-  "TARGET_MMX_WITH_SSE"
-  "%vcvtps2pd\t{%1, %0|%0, %1}"
+         (match_operand:V2SF 1 "memory_operand" "m")))]
+  "TARGET_MMX_WITH_SSE && <mask_avx512vl_condition>"
+  "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_oper
and2>, %q1}"
   [(set_attr "type" "ssecvt")

The new insn does not need to be limited to TARGET_MMX_WITH_SSE, so we
can use TARGET_SSE2 here.

Which opens the question if the expander could also be TARGET_SSE2
only. There are no MMX registers involved in any of the patterns
anymore.

Uros.
>
> Thanks,
> Uros,
  
Li, Pan2 via Gcc-patches June 30, 2022, 8:57 a.m. UTC | #6
> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: Thursday, June 30, 2022 4:53 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> 
> On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen <haochen.jiang@intel.com>
> wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Uros Bizjak <ubizjak@gmail.com>
> > > > > Sent: Thursday, June 30, 2022 2:20 PM
> > > > > To: Jiang, Haochen <haochen.jiang@intel.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > <hongtao.liu@intel.com>
> > > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > > > >
> > > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang
> > > > > <haochen.jiang@intel.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > This patch aims to fix the cvtps2pd insn, which should also
> > > > > > work on memory operand but currently does not. After this fix,
> > > > > > when loop == 2, it will eliminate movq instruction.
> > > > > >
> > > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > > >
> > > > > > BRs,
> > > > > > Haochen
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         PR target/43618
> > > > > >         * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > > > >         (sse2_cvtps2pd_load<mask_name>): Rename extendvsdfv2df2.
> >
> > Rename FROM ...
> >
> > Please also mention change to sse2_cvtps2pd<mask_name>.
> >
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >         PR target/43618
> > > > > >         * gcc.target/i386/pr43618-1.c: New test.
> > > > >
> > > > > This patch could be as simple as:
> > > > >
> > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > index 8cd0f617bf3..c331445cb2d 100644
> > > > > --- a/gcc/config/i386/sse.md
> > > > > +++ b/gcc/config/i386/sse.md
> > > > > @@ -9195,7 +9195,7 @@
> > > > > (define_insn "extendv2sfv2df2"
> > > > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > > > >        (float_extend:V2DF
> > > > > -         (match_operand:V2SF 1 "register_operand" "v")))]
> > > > > +         (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > > > >   "TARGET_MMX_WITH_SSE"
> > > > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > > >   [(set_attr "type" "ssecvt")
> > > >
> > > > We also tested on this version, it is ok.
> > > >
> > > > The reason why the patch looks like this is because in the
> > > > previous insn sse2_cvtps2pd<mask_name>, the constraint vm and
> > > > vector_operand actually does not match the actual instruction.
> > > > Memory operand is V2SF, not V4SF.
> > > >
> > > > Therefore, we changed the constraint in that insn. Then it caused another
> issue.
> > > > For memory operand, it seems that we cannot generate those mask
> instructions.
> > > > So I change the pattern to how extendv2hfv2df2 works.
> > >
> > > If you want to change the memory access in sse2_cvtps2pd<mask_name>,
> > > then please see how e.g. <insn>v2hiv2di is handled in sse.md. In
> > > addition to two instructions, you will need one
> > > define_insn_and_split with a pre-reload splitter.
> >
> > Oh, nowadays combine does vec_select from a paradoxical subreg on its own.
> >
> > +(define_expand "extendv2sfv2df2"
> > +  [(set (match_operand:V2DF 0 "register_operand")
> > +        (float_extend:V2DF
> > +          (match_operand:V2SF 1 "nonimmediate_operand")))]
> > +  "TARGET_MMX_WITH_SSE"
> > +{
> > +  if (!MEM_P (operands[1]))
> > +    {
> >
> > You will need force reg here:
> >
> >         rtx op1 = force_reg (V2SFmode, operands[1]);
> > +      operands[1] = lowpart_subreg (V4SFmode, op1, V2SFmode);
> > +      emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1]));
> > +      DONE;
> > +    }
> > +})
> >
> >
> > -(define_insn "extendv2sfv2df2"
> > +(define_insn "sse2_cvtps2pd_load<mask_name>"
> >
> > Please name this insn "*sse2_cvtps2pd<mask_name>_1". Please note the
> > star at the beginning, You don't have to make the name public.
> >
> > OK with the above changes.
> 
> Forgot to mention:
> 
> 
> -         (match_operand:V2SF 1 "register_operand" "v")))]
> -  "TARGET_MMX_WITH_SSE"
> -  "%vcvtps2pd\t{%1, %0|%0, %1}"
> +         (match_operand:V2SF 1 "memory_operand" "m")))]
> + "TARGET_MMX_WITH_SSE && <mask_avx512vl_condition>"
> +  "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_oper
> and2>, %q1}"
>    [(set_attr "type" "ssecvt")
> 
> The new insn does not need to be limited to TARGET_MMX_WITH_SSE, so we
> can use TARGET_SSE2 here.
> 
> Which opens the question if the expander could also be TARGET_SSE2 only.
> There are no MMX registers involved in any of the patterns anymore.
Yes.
> 
> Uros.
> >
> > Thanks,
> > Uros,
  
Li, Pan2 via Gcc-patches July 4, 2022, 5:09 a.m. UTC | #7
Hi all,

I revised my patch according to all your reviews.

Regtested on x86_64-pc-linux-gnu.

BRs,
Haochen

> -----Original Message-----
> From: Liu, Hongtao <hongtao.liu@intel.com>
> Sent: Thursday, June 30, 2022 4:57 PM
> To: Uros Bizjak <ubizjak@gmail.com>; Jiang, Haochen
> <haochen.jiang@intel.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] i386: Extend cvtps2pd to memory
> 
> 
> 
> > -----Original Message-----
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: Thursday, June 30, 2022 4:53 PM
> > To: Jiang, Haochen <haochen.jiang@intel.com>
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>
> > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> >
> > On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen
> <haochen.jiang@intel.com>
> > wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Uros Bizjak <ubizjak@gmail.com>
> > > > > > Sent: Thursday, June 30, 2022 2:20 PM
> > > > > > To: Jiang, Haochen <haochen.jiang@intel.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > > <hongtao.liu@intel.com>
> > > > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang
> > > > > > <haochen.jiang@intel.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > This patch aims to fix the cvtps2pd insn, which should also
> > > > > > > work on memory operand but currently does not. After this fix,
> > > > > > > when loop == 2, it will eliminate movq instruction.
> > > > > > >
> > > > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > > > >
> > > > > > > BRs,
> > > > > > > Haochen
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >         PR target/43618
> > > > > > >         * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > > > > >         (sse2_cvtps2pd_load<mask_name>): Rename
> extendvsdfv2df2.
> > >
> > > Rename FROM ...
> > >
> > > Please also mention change to sse2_cvtps2pd<mask_name>.
> > >
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > >         PR target/43618
> > > > > > >         * gcc.target/i386/pr43618-1.c: New test.
> > > > > >
> > > > > > This patch could be as simple as:
> > > > > >
> > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > index 8cd0f617bf3..c331445cb2d 100644
> > > > > > --- a/gcc/config/i386/sse.md
> > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > @@ -9195,7 +9195,7 @@
> > > > > > (define_insn "extendv2sfv2df2"
> > > > > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > > > > >        (float_extend:V2DF
> > > > > > -         (match_operand:V2SF 1 "register_operand" "v")))]
> > > > > > +         (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > > > > >   "TARGET_MMX_WITH_SSE"
> > > > > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > > > >   [(set_attr "type" "ssecvt")
> > > > >
> > > > > We also tested on this version, it is ok.
> > > > >
> > > > > The reason why the patch looks like this is because in the
> > > > > previous insn sse2_cvtps2pd<mask_name>, the constraint vm and
> > > > > vector_operand actually does not match the actual instruction.
> > > > > Memory operand is V2SF, not V4SF.
> > > > >
> > > > > Therefore, we changed the constraint in that insn. Then it caused
> another
> > issue.
> > > > > For memory operand, it seems that we cannot generate those mask
> > instructions.
> > > > > So I change the pattern to how extendv2hfv2df2 works.
> > > >
> > > > If you want to change the memory access in
> sse2_cvtps2pd<mask_name>,
> > > > then please see how e.g. <insn>v2hiv2di is handled in sse.md. In
> > > > addition to two instructions, you will need one
> > > > define_insn_and_split with a pre-reload splitter.
> > >
> > > Oh, nowadays combine does vec_select from a paradoxical subreg on its
> own.
> > >
> > > +(define_expand "extendv2sfv2df2"
> > > +  [(set (match_operand:V2DF 0 "register_operand")
> > > +        (float_extend:V2DF
> > > +          (match_operand:V2SF 1 "nonimmediate_operand")))]
> > > +  "TARGET_MMX_WITH_SSE"
> > > +{
> > > +  if (!MEM_P (operands[1]))
> > > +    {
> > >
> > > You will need force reg here:
> > >
> > >         rtx op1 = force_reg (V2SFmode, operands[1]);
> > > +      operands[1] = lowpart_subreg (V4SFmode, op1, V2SFmode);
> > > +      emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1]));
> > > +      DONE;
> > > +    }
> > > +})
> > >
> > >
> > > -(define_insn "extendv2sfv2df2"
> > > +(define_insn "sse2_cvtps2pd_load<mask_name>"
> > >
> > > Please name this insn "*sse2_cvtps2pd<mask_name>_1". Please note
> the
> > > star at the beginning, You don't have to make the name public.
> > >
> > > OK with the above changes.
> >
> > Forgot to mention:
> >
> >
> > -         (match_operand:V2SF 1 "register_operand" "v")))]
> > -  "TARGET_MMX_WITH_SSE"
> > -  "%vcvtps2pd\t{%1, %0|%0, %1}"
> > +         (match_operand:V2SF 1 "memory_operand" "m")))]
> > + "TARGET_MMX_WITH_SSE && <mask_avx512vl_condition>"
> > +  "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_oper
> > and2>, %q1}"
> >    [(set_attr "type" "ssecvt")
> >
> > The new insn does not need to be limited to TARGET_MMX_WITH_SSE, so
> we
> > can use TARGET_SSE2 here.
> >
> > Which opens the question if the expander could also be TARGET_SSE2 only.
> > There are no MMX registers involved in any of the patterns anymore.
> Yes.
> >
> > Uros.
> > >
> > > Thanks,
> > > Uros,
  
Uros Bizjak July 4, 2022, 6:17 a.m. UTC | #8
On Mon, Jul 4, 2022 at 7:10 AM Jiang, Haochen <haochen.jiang@intel.com> wrote:
>
> Hi all,
>
> I revised my patch according to all your reviews.
>
> Regtested on x86_64-pc-linux-gnu.

OK.

Thanks,
Uros.

>
> BRs,
> Haochen
>
> > -----Original Message-----
> > From: Liu, Hongtao <hongtao.liu@intel.com>
> > Sent: Thursday, June 30, 2022 4:57 PM
> > To: Uros Bizjak <ubizjak@gmail.com>; Jiang, Haochen
> > <haochen.jiang@intel.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] i386: Extend cvtps2pd to memory
> >
> >
> >
> > > -----Original Message-----
> > > From: Uros Bizjak <ubizjak@gmail.com>
> > > Sent: Thursday, June 30, 2022 4:53 PM
> > > To: Jiang, Haochen <haochen.jiang@intel.com>
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao <hongtao.liu@intel.com>
> > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > >
> > > On Thu, Jun 30, 2022 at 10:45 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 9:41 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jun 30, 2022 at 9:24 AM Jiang, Haochen
> > <haochen.jiang@intel.com>
> > > wrote:
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Uros Bizjak <ubizjak@gmail.com>
> > > > > > > Sent: Thursday, June 30, 2022 2:20 PM
> > > > > > > To: Jiang, Haochen <haochen.jiang@intel.com>
> > > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > > > <hongtao.liu@intel.com>
> > > > > > > Subject: Re: [PATCH] i386: Extend cvtps2pd to memory
> > > > > > >
> > > > > > > On Thu, Jun 30, 2022 at 7:59 AM Haochen Jiang
> > > > > > > <haochen.jiang@intel.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > This patch aims to fix the cvtps2pd insn, which should also
> > > > > > > > work on memory operand but currently does not. After this fix,
> > > > > > > > when loop == 2, it will eliminate movq instruction.
> > > > > > > >
> > > > > > > > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> > > > > > > >
> > > > > > > > BRs,
> > > > > > > > Haochen
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > >         PR target/43618
> > > > > > > >         * config/i386/sse.md (extendv2sfv2df2): New define_expand.
> > > > > > > >         (sse2_cvtps2pd_load<mask_name>): Rename
> > extendvsdfv2df2.
> > > >
> > > > Rename FROM ...
> > > >
> > > > Please also mention change to sse2_cvtps2pd<mask_name>.
> > > >
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > >         PR target/43618
> > > > > > > >         * gcc.target/i386/pr43618-1.c: New test.
> > > > > > >
> > > > > > > This patch could be as simple as:
> > > > > > >
> > > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > > > > index 8cd0f617bf3..c331445cb2d 100644
> > > > > > > --- a/gcc/config/i386/sse.md
> > > > > > > +++ b/gcc/config/i386/sse.md
> > > > > > > @@ -9195,7 +9195,7 @@
> > > > > > > (define_insn "extendv2sfv2df2"
> > > > > > >   [(set (match_operand:V2DF 0 "register_operand" "=v")
> > > > > > >        (float_extend:V2DF
> > > > > > > -         (match_operand:V2SF 1 "register_operand" "v")))]
> > > > > > > +         (match_operand:V2SF 1 "nonimmediate_operand" "vm")))]
> > > > > > >   "TARGET_MMX_WITH_SSE"
> > > > > > >   "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > > > > >   [(set_attr "type" "ssecvt")
> > > > > >
> > > > > > We also tested on this version, it is ok.
> > > > > >
> > > > > > The reason why the patch looks like this is because in the
> > > > > > previous insn sse2_cvtps2pd<mask_name>, the constraint vm and
> > > > > > vector_operand actually does not match the actual instruction.
> > > > > > Memory operand is V2SF, not V4SF.
> > > > > >
> > > > > > Therefore, we changed the constraint in that insn. Then it caused
> > another
> > > issue.
> > > > > > For memory operand, it seems that we cannot generate those mask
> > > instructions.
> > > > > > So I change the pattern to how extendv2hfv2df2 works.
> > > > >
> > > > > If you want to change the memory access in
> > sse2_cvtps2pd<mask_name>,
> > > > > then please see how e.g. <insn>v2hiv2di is handled in sse.md. In
> > > > > addition to two instructions, you will need one
> > > > > define_insn_and_split with a pre-reload splitter.
> > > >
> > > > Oh, nowadays combine does vec_select from a paradoxical subreg on its
> > own.
> > > >
> > > > +(define_expand "extendv2sfv2df2"
> > > > +  [(set (match_operand:V2DF 0 "register_operand")
> > > > +        (float_extend:V2DF
> > > > +          (match_operand:V2SF 1 "nonimmediate_operand")))]
> > > > +  "TARGET_MMX_WITH_SSE"
> > > > +{
> > > > +  if (!MEM_P (operands[1]))
> > > > +    {
> > > >
> > > > You will need force reg here:
> > > >
> > > >         rtx op1 = force_reg (V2SFmode, operands[1]);
> > > > +      operands[1] = lowpart_subreg (V4SFmode, op1, V2SFmode);
> > > > +      emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1]));
> > > > +      DONE;
> > > > +    }
> > > > +})
> > > >
> > > >
> > > > -(define_insn "extendv2sfv2df2"
> > > > +(define_insn "sse2_cvtps2pd_load<mask_name>"
> > > >
> > > > Please name this insn "*sse2_cvtps2pd<mask_name>_1". Please note
> > the
> > > > star at the beginning, You don't have to make the name public.
> > > >
> > > > OK with the above changes.
> > >
> > > Forgot to mention:
> > >
> > >
> > > -         (match_operand:V2SF 1 "register_operand" "v")))]
> > > -  "TARGET_MMX_WITH_SSE"
> > > -  "%vcvtps2pd\t{%1, %0|%0, %1}"
> > > +         (match_operand:V2SF 1 "memory_operand" "m")))]
> > > + "TARGET_MMX_WITH_SSE && <mask_avx512vl_condition>"
> > > +  "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_oper
> > > and2>, %q1}"
> > >    [(set_attr "type" "ssecvt")
> > >
> > > The new insn does not need to be limited to TARGET_MMX_WITH_SSE, so
> > we
> > > can use TARGET_SSE2 here.
> > >
> > > Which opens the question if the expander could also be TARGET_SSE2 only.
> > > There are no MMX registers involved in any of the patterns anymore.
> > Yes.
> > >
> > > Uros.
> > > >
> > > > Thanks,
> > > > Uros,
  

Patch

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8b2602bfa79..f96bb3dc6c3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -9175,11 +9175,25 @@ 
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "extendv2sfv2df2"
+  [(set (match_operand:V2DF 0 "register_operand")
+        (float_extend:V2DF
+          (match_operand:V2SF 1 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+{
+  if (!MEM_P (operands[1]))
+    {
+      operands[1] = lowpart_subreg (V4SFmode, operands[1], V2SFmode);
+      emit_insn (gen_sse2_cvtps2pd (operands[0], operands[1]));
+      DONE;
+    }
+})
+
 (define_insn "sse2_cvtps2pd<mask_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=v")
 	(float_extend:V2DF
 	  (vec_select:V2SF
-	    (match_operand:V4SF 1 "vector_operand" "vm")
+	    (match_operand:V4SF 1 "register_operand" "v")
 	    (parallel [(const_int 0) (const_int 1)]))))]
   "TARGET_SSE2 && <mask_avx512vl_condition>"
   "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
@@ -9191,12 +9205,12 @@ 
    (set_attr "prefix" "maybe_vex")
    (set_attr "mode" "V2DF")])
 
-(define_insn "extendv2sfv2df2"
+(define_insn "sse2_cvtps2pd_load<mask_name>"
   [(set (match_operand:V2DF 0 "register_operand" "=v")
 	(float_extend:V2DF
-	  (match_operand:V2SF 1 "register_operand" "v")))]
-  "TARGET_MMX_WITH_SSE"
-  "%vcvtps2pd\t{%1, %0|%0, %1}"
+	  (match_operand:V2SF 1 "memory_operand" "m")))]
+  "TARGET_MMX_WITH_SSE && <mask_avx512vl_condition>"
+  "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
   [(set_attr "type" "ssecvt")
    (set_attr "amdfam10_decode" "direct")
    (set_attr "athlon_decode" "double")
diff --git a/gcc/testsuite/gcc.target/i386/pr43618-1.c b/gcc/testsuite/gcc.target/i386/pr43618-1.c
new file mode 100644
index 00000000000..3c84ea444aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr43618-1.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "movq" } } */
+/* { dg-final { scan-assembler "cvtps2pd" } } */
+
+void
+foo (float a[2], double b[2])
+{
+    int i;
+    for (i = 0; i < 2; i++)
+      b[i] = a[i];
+}
+