[x86,take,#2] Move V1TI shift/rotate lowering from expand to pre-reload split.

Message ID 008101d8ae91$ef2ca100$cd85e300$@nextmovesoftware.com
State New
Headers
Series [x86,take,#2] Move V1TI shift/rotate lowering from expand to pre-reload split. |

Commit Message

Roger Sayle Aug. 12, 2022, 9:24 p.m. UTC
  Hi Uros,
As requested, here's an updated version of my patch that introduces a new
const_0_to_255_not_mul_8_operand as you've requested.  I think in this
instance, having mutually exclusive patterns that can appear in any order,
without imposing implicit ordering constraints, is slightly preferable,
especially as (thanks to STV)  some related patterns may appear in
sse.md and others appear in i386.md (making ordering tricky).

This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-08-12  Roger Sayle  <roger@nextmovesoftware.com>
            Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
        * config/i386/predicates.md (const_0_to_255_not_mul_8_operand):
        New predicate for values between 0/1 and 255, not multiples of 8.
        * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
        shifts by constant bit counts.
        (*ashlvti3_internal): New define_insn_and_split that lowers
        logical left shifts by constant bit counts, that aren't multiples
        of 8, before reload.
        (lshrv1ti3): Delay lowering of logical right shifts by constant.
        (*lshrv1ti3_internal): New define_insn_and_split that lowers
        logical right shifts by constant bit counts, that aren't multiples
        of 8, before reload.
        (ashrv1ti3):: Delay lowering of arithmetic right shifts by
        constant bit counts.
        (*ashrv1ti3_internal): New define_insn_and_split that lowers
        arithmetic right shifts by constant bit counts before reload.
        (rotlv1ti3): Delay lowering of rotate left by constant.
        (*rotlv1ti3_internal): New define_insn_and_split that lowers
        rotate left by constant bits counts before reload.
        (rotrv1ti3): Delay lowering of rotate right by constant.
        (*rotrv1ti3_internal): New define_insn_and_split that lowers
        rotate right by constant bits counts before reload.


Thanks again,
Roger

> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 08 August 2022 08:48
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] Move V1TI shift/rotate lowering from expand to pre-
> reload split.
> 
> On Fri, Aug 5, 2022 at 8:36 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> >
> > This patch moves the lowering of 128-bit V1TImode shifts and rotations
> > by constant bit counts to sequences of SSE operations from the RTL
> > expansion pass to the pre-reload split pass.  Postponing this
> > splitting of shifts and rotates enables (will enable) the TImode
> > equivalents of these operations/ instructions to be considered as
> > candidates by the (TImode) STV pass.
> > Technically, this patch changes the existing expanders to continue to
> > lower shifts by variable amounts, but constant operands become RTL
> > instructions, specified by define_insn_and_split that are triggered by
> > x86_pre_reload_split.  The one minor complication is that logical
> > shifts by multiples of eight, don't get split, but are handled by
> > existing insn patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3.
> > There should be no changes in generated code with this patch, which
> > just adjusts the pass in which transformations get applied.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> >
> > 2022-08-05  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
> >         shifts by constant bit counts.
> >         (*ashlvti3_internal): New define_insn_and_split that lowers
> >         logical left shifts by constant bit counts, that aren't multiples
> >         of 8, before reload.
> >         (lshrv1ti3): Delay lowering of logical right shifts by constant.
> >         (*lshrv1ti3_internal): New define_insn_and_split that lowers
> >         logical right shifts by constant bit counts, that aren't multiples
> >         of 8, before reload.
> >         (ashrv1ti3):: Delay lowering of arithmetic right shifts by
> >         constant bit counts.
> >         (*ashrv1ti3_internal): New define_insn_and_split that lowers
> >         arithmetic right shifts by constant bit counts before reload.
> >         (rotlv1ti3): Delay lowering of rotate left by constant.
> >         (*rotlv1ti3_internal): New define_insn_and_split that lowers
> >         rotate left by constant bits counts before reload.
> >         (rotrv1ti3): Delay lowering of rotate right by constant.
> >         (*rotrv1ti3_internal): New define_insn_and_split that lowers
> >         rotate right by constant bits counts before reload.
> 
> +(define_insn_and_split "*ashlv1ti3_internal"
> +  [(set (match_operand:V1TI 0 "register_operand")
>   (ashift:V1TI
>   (match_operand:V1TI 1 "register_operand")
> - (match_operand:QI 2 "general_operand")))]
> -  "TARGET_SSE2 && TARGET_64BIT"
> + (match_operand:SI 2 "const_0_to_255_operand")))]
> +  "TARGET_SSE2
> +   && TARGET_64BIT
> +   && (INTVAL (operands[2]) & 7) != 0
> 
> Please introduce const_0_to_255_not_mul_8_operand predicate.
> Alternatively, and preferably, you can use pattern shadowing, where the
> preceding, more constrained pattern will match before the following, more
> broad pattern will.
> 
> Uros.
  

Comments

Uros Bizjak Aug. 13, 2022, 9:57 a.m. UTC | #1
On Fri, Aug 12, 2022 at 11:24 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> As requested, here's an updated version of my patch that introduces a new
> const_0_to_255_not_mul_8_operand as you've requested.  I think in this
> instance, having mutually exclusive patterns that can appear in any order,
> without imposing implicit ordering constraints, is slightly preferable,
> especially as (thanks to STV)  some related patterns may appear in
> sse.md and others appear in i386.md (making ordering tricky).
>
> This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-12  Roger Sayle  <roger@nextmovesoftware.com>
>             Uroš Bizjak  <ubizjak@gmail.com>
>
> gcc/ChangeLog
>         * config/i386/predicates.md (const_0_to_255_not_mul_8_operand):
>         New predicate for values between 0/1 and 255, not multiples of 8.
>         * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left
>         shifts by constant bit counts.
>         (*ashlvti3_internal): New define_insn_and_split that lowers
>         logical left shifts by constant bit counts, that aren't multiples
>         of 8, before reload.
>         (lshrv1ti3): Delay lowering of logical right shifts by constant.
>         (*lshrv1ti3_internal): New define_insn_and_split that lowers
>         logical right shifts by constant bit counts, that aren't multiples
>         of 8, before reload.
>         (ashrv1ti3):: Delay lowering of arithmetic right shifts by
>         constant bit counts.
>         (*ashrv1ti3_internal): New define_insn_and_split that lowers
>         arithmetic right shifts by constant bit counts before reload.
>         (rotlv1ti3): Delay lowering of rotate left by constant.
>         (*rotlv1ti3_internal): New define_insn_and_split that lowers
>         rotate left by constant bits counts before reload.
>         (rotrv1ti3): Delay lowering of rotate right by constant.
>         (*rotrv1ti3_internal): New define_insn_and_split that lowers
>         rotate right by constant bits counts before reload.

OK with a small nit:

+  "TARGET_SSE2
+   && TARGET_64BIT

Please put these target options to one line, as in many examples
throuhout i386.md

Thanks,
Uros.
  

Patch

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 064596d..4f16bb7 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -931,6 +931,14 @@ 
   return val <= 255*8 && val % 8 == 0;
 })
 
+;; Match 1 to 255 except multiples of 8
+(define_predicate "const_0_to_255_not_mul_8_operand"
+  (match_code "const_int")
+{
+  unsigned HOST_WIDE_INT val = INTVAL (op);
+  return val <= 255 && val % 8 != 0;
+})
+
 ;; Return true if OP is CONST_INT >= 1 and <= 31 (a valid operand
 ;; for shift & compare patterns, as shifting by 0 does not change flags).
 (define_predicate "const_1_to_31_operand"
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 14d12d1..468c5f1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15995,10 +15995,29 @@ 
 
 (define_expand "ashlv1ti3"
   [(set (match_operand:V1TI 0 "register_operand")
+        (ashift:V1TI
+         (match_operand:V1TI 1 "register_operand")
+         (match_operand:QI 2 "general_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"
+{
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_shift (ASHIFT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*ashlv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
 	(ashift:V1TI
 	 (match_operand:V1TI 1 "register_operand")
-	 (match_operand:QI 2 "general_operand")))]
-  "TARGET_SSE2 && TARGET_64BIT"
+	 (match_operand:SI 2 "const_0_to_255_not_mul_8_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   ix86_expand_v1ti_shift (ASHIFT, operands);
   DONE;
@@ -16011,6 +16030,25 @@ 
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_shift (LSHIFTRT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*lshrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(lshiftrt:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_not_mul_8_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
   ix86_expand_v1ti_shift (LSHIFTRT, operands);
   DONE;
 })
@@ -16022,6 +16060,26 @@ 
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_ashiftrt (operands);
+      DONE;
+    }
+})
+
+
+(define_insn_and_split "*ashrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(ashiftrt:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
   ix86_expand_v1ti_ashiftrt (operands);
   DONE;
 })
@@ -16033,6 +16091,25 @@ 
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_rotate (ROTATE, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*rotlv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(rotate:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
   ix86_expand_v1ti_rotate (ROTATE, operands);
   DONE;
 })
@@ -16044,6 +16121,25 @@ 
 	 (match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
+  if (!CONST_INT_P (operands[2]))
+    {
+      ix86_expand_v1ti_rotate (ROTATERT, operands);
+      DONE;
+    }
+})
+
+(define_insn_and_split "*rotrv1ti3_internal"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(rotatert:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:SI 2 "const_0_to_255_operand")))]
+  "TARGET_SSE2
+   && TARGET_64BIT
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
   ix86_expand_v1ti_rotate (ROTATERT, operands);
   DONE;
 })