[Take,#2] x86_64: Expand ashrv1ti (and PR target/102986)

Message ID 030001d7ce3e$5b4742d0$11d5c870$@nextmovesoftware.com
State New
Headers
Series [Take,#2] x86_64: Expand ashrv1ti (and PR target/102986) |

Commit Message

Roger Sayle Oct. 31, 2021, 10:02 a.m. UTC
  Very many thanks to Jakub for proof-reading my patch, catching my silly
GNU-style
mistakes and making excellent suggestions.  This revised patch incorporates
all of
his feedback, and has been tested on x86_64-pc-linux-gnu with make bootstrap
and
make -k check with no new failures.

2021-10-31  Roger Sayle  <roger@nextmovesoftware.com>
	    Jakub Jelinek  <jakub@redhat.com>

gcc/ChangeLog
	PR target/102986
	* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
	ix86_expand_ti_to_v1ti): New helper functions.
	(ix86_expand_v1ti_shift): Check if the amount operand is an
	integer constant, and expand as a TImode shift if it isn't.
	(ix86_expand_v1ti_rotate): Check if the amount operand is an
	integer constant, and expand as a TImode rotate if it isn't.
	(ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
	right shifts of V1TImode quantities.
	* config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
	* config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
	to QImode general_operand, and let the helper functions lower
	shifts by non-constant operands, as TImode shifts.  Make
	conditional on TARGET_64BIT.
	(ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
	(rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
	Make conditional on TARGET_64BIT.

gcc/testsuite/ChangeLog
	PR target/102986
	* gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
	* gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
	* gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
	* gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
	* gcc.target/i386/sse2-v1ti-shift-3.c: New test case.

Thanks.
Roger
--

-----Original Message-----
From: Jakub Jelinek <jakub@redhat.com> 
Sent: 30 October 2021 11:30
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Uros Bizjak'
<ubizjak@gmail.com>
Subject: Re: [PATCH] x86_64: Expand ashrv1ti (and PR target/102986)

On Sat, Oct 30, 2021 at 11:16:41AM +0100, Roger Sayle wrote:
> 2021-10-30  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
> 	PR target/102986
> 	* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
> 	ix86_expand_ti_to_v1ti): New helper functions.
> 	(ix86_expand_v1ti_shift): Check if the amount operand is an
> 	integer constant, and expand as a TImode shift if it isn't.
> 	(ix86_expand_v1ti_rotate): Check if the amount operand is an
> 	integer constant, and expand as a TImode rotate if it isn't.
> 	(ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
> 	right shifts of V1TImode quantities.
> 	* config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
> 	* config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
> 	to QImode general_operand, and let the helper functions lower
> 	shifts by non-constant operands, as TImode shifts.
> 	(ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
> 	(rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
> 
> gcc/testsuite/ChangeLog
> 	PR target/102986
> 	* gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
> 	* gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
> 	* gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
> 	* gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
> 	* gcc.target/i386/sse2-v1ti-shift-3.c: New test case.
> 
> Sorry again for the breakage in my last patch.   I wasn't testing things
> that shouldn't have been affected/changed.

Not a review, will defer that to Uros, but just nits:

> +/* Expand move of V1TI mode register X to a new TI mode register.  */ 
> +static rtx ix86_expand_v1ti_to_ti (rtx x)

ix86_expand_v1ti_to_ti should be at the start of next line, so static rtx
ix86_expand_v1ti_to_ti (rtx x)

Ditto for other functions and also in functions you've added by the previous
patch.
> +      emit_insn (code == ASHIFT ? gen_ashlti3(tmp2, tmp1, operands[2])
> +				: gen_lshrti3(tmp2, tmp1, operands[2]));

Space before ( twice.

> +      emit_insn (code == ROTATE ? gen_rotlti3(tmp2, tmp1, operands[2])
> +				: gen_rotrti3(tmp2, tmp1, operands[2]));

Likewise.

> +      emit_insn (gen_ashrti3(tmp2, tmp1, operands[2]));

Similarly.

Also, I wonder for all these patterns (previously and now added), shouldn't
they have && TARGET_64BIT in conditions?  I mean, we don't really support
scalar TImode for ia32, but VALID_SSE_REG_MODE includes V1TImode and while
the constant shifts can be done, I think the variable shifts can't, there
are no TImode shift patterns...

	Jakub
  

Comments

Uros Bizjak Nov. 1, 2021, 7:27 a.m. UTC | #1
On Sun, Oct 31, 2021 at 11:02 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Very many thanks to Jakub for proof-reading my patch, catching my silly
> GNU-style
> mistakes and making excellent suggestions.  This revised patch incorporates
> all of
> his feedback, and has been tested on x86_64-pc-linux-gnu with make bootstrap
> and
> make -k check with no new failures.
>
> 2021-10-31  Roger Sayle  <roger@nextmovesoftware.com>
>             Jakub Jelinek  <jakub@redhat.com>
>
> gcc/ChangeLog
>         PR target/102986
>         * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
>         ix86_expand_ti_to_v1ti): New helper functions.
>         (ix86_expand_v1ti_shift): Check if the amount operand is an
>         integer constant, and expand as a TImode shift if it isn't.
>         (ix86_expand_v1ti_rotate): Check if the amount operand is an
>         integer constant, and expand as a TImode rotate if it isn't.
>         (ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
>         right shifts of V1TImode quantities.
>         * config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
>         * config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
>         to QImode general_operand, and let the helper functions lower
>         shifts by non-constant operands, as TImode shifts.  Make
>         conditional on TARGET_64BIT.
>         (ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
>         (rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
>         Make conditional on TARGET_64BIT.
>
> gcc/testsuite/ChangeLog
>         PR target/102986
>         * gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
>         * gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
>         * gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
>         * gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
>         * gcc.target/i386/sse2-v1ti-shift-3.c: New test case.
>
> Thanks.
> Roger
> --
>
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: 30 October 2021 11:30
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Uros Bizjak'
> <ubizjak@gmail.com>
> Subject: Re: [PATCH] x86_64: Expand ashrv1ti (and PR target/102986)
>
> On Sat, Oct 30, 2021 at 11:16:41AM +0100, Roger Sayle wrote:
> > 2021-10-30  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >       PR target/102986
> >       * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
> >       ix86_expand_ti_to_v1ti): New helper functions.
> >       (ix86_expand_v1ti_shift): Check if the amount operand is an
> >       integer constant, and expand as a TImode shift if it isn't.
> >       (ix86_expand_v1ti_rotate): Check if the amount operand is an
> >       integer constant, and expand as a TImode rotate if it isn't.
> >       (ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
> >       right shifts of V1TImode quantities.
> >       * config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
> >       * config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
> >       to QImode general_operand, and let the helper functions lower
> >       shifts by non-constant operands, as TImode shifts.
> >       (ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
> >       (rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
> >
> > gcc/testsuite/ChangeLog
> >       PR target/102986
> >       * gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
> >       * gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
> >       * gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
> >       * gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
> >       * gcc.target/i386/sse2-v1ti-shift-3.c: New test case.
> >
> > Sorry again for the breakage in my last patch.   I wasn't testing things
> > that shouldn't have been affected/changed.
>
> Not a review, will defer that to Uros, but just nits:
>
> > +/* Expand move of V1TI mode register X to a new TI mode register.  */
> > +static rtx ix86_expand_v1ti_to_ti (rtx x)
>
> ix86_expand_v1ti_to_ti should be at the start of next line, so static rtx
> ix86_expand_v1ti_to_ti (rtx x)
>
> Ditto for other functions and also in functions you've added by the previous
> patch.
> > +      emit_insn (code == ASHIFT ? gen_ashlti3(tmp2, tmp1, operands[2])
> > +                             : gen_lshrti3(tmp2, tmp1, operands[2]));
>
> Space before ( twice.
>
> > +      emit_insn (code == ROTATE ? gen_rotlti3(tmp2, tmp1, operands[2])
> > +                             : gen_rotrti3(tmp2, tmp1, operands[2]));
>
> Likewise.
>
> > +      emit_insn (gen_ashrti3(tmp2, tmp1, operands[2]));
>
> Similarly.
>
> Also, I wonder for all these patterns (previously and now added), shouldn't
> they have && TARGET_64BIT in conditions?  I mean, we don't really support
> scalar TImode for ia32, but VALID_SSE_REG_MODE includes V1TImode and while
> the constant shifts can be done, I think the variable shifts can't, there
> are no TImode shift patterns...

- (match_operand:SI 2 "const_int_operand")))]
-  "TARGET_SSE2"
+ (match_operand:QI 2 "general_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"

I wonder if this change is too restrictive, as it disables V1TI shifts
by constant on 32bit targets. Perhaps we can introduce a conditional
predicate, like:

(define_predicate "shiftv1ti_input_operand"
  (if_then_else (match_test "TARGET_64BIT")
    (match_operand 0 "general_operand")
    (match_operand 0 "const_int_operand")))

However, I'm not familiar with how the middle-end behaves with the
above approach - will it try to put the constant in a register under
some circumstances and consequently fail the expansion?

And one mandatory :)  nit:

+      rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
+      rtx tmp2 = gen_reg_rtx (TImode);
+      emit_insn (code == ASHIFT ? gen_ashlti3 (tmp2, tmp1, operands[2])
+ : gen_lshrti3 (tmp2, tmp1, operands[2]));
+      rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
+      emit_move_insn (operands[0], tmp3);
+      return;

I'd write this as:

      rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
      rtx tmp2 = gen_reg_rtx (TImode);
      rtx (*shift) (rtx, rtx, rtx)
            = (code == ASHIFT) ? gen_ashlti3 : gen_lshrti3;
      emit_insn (shift (tmp2, tmp1, operands[2]));

      rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
      emit_move_insn (operands[0], tmp3);
      return;

Otherwise LGTM (and kudos for writing out all those sequences).

Thanks,
Uros.
  
Jakub Jelinek Nov. 1, 2021, 8:43 a.m. UTC | #2
On Mon, Nov 01, 2021 at 08:27:12AM +0100, Uros Bizjak wrote:
> > Also, I wonder for all these patterns (previously and now added), shouldn't
> > they have && TARGET_64BIT in conditions?  I mean, we don't really support
> > scalar TImode for ia32, but VALID_SSE_REG_MODE includes V1TImode and while
> > the constant shifts can be done, I think the variable shifts can't, there
> > are no TImode shift patterns...
> 
> - (match_operand:SI 2 "const_int_operand")))]
> -  "TARGET_SSE2"
> + (match_operand:QI 2 "general_operand")))]
> +  "TARGET_SSE2 && TARGET_64BIT"
> 
> I wonder if this change is too restrictive, as it disables V1TI shifts
> by constant on 32bit targets. Perhaps we can introduce a conditional
> predicate, like:
> 
> (define_predicate "shiftv1ti_input_operand"
>   (if_then_else (match_test "TARGET_64BIT")
>     (match_operand 0 "general_operand")
>     (match_operand 0 "const_int_operand")))
> 
> However, I'm not familiar with how the middle-end behaves with the
> above approach - will it try to put the constant in a register under
> some circumstances and consequently fail the expansion?

That would run again into the assertions that shift expanders must never
fail.
The question is if a V1TImode shift can ever appear in 32-bit x86, because
typedef __int128 V __attribute__((vector_size (16)));
is rejected with
error: ‘__int128’ is not supported on this target
when -m32 is in use, no matter what ISA flags are used.

	Jakub
  
Uros Bizjak Nov. 1, 2021, 9:03 a.m. UTC | #3
On Mon, Nov 1, 2021 at 9:43 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Nov 01, 2021 at 08:27:12AM +0100, Uros Bizjak wrote:
> > > Also, I wonder for all these patterns (previously and now added), shouldn't
> > > they have && TARGET_64BIT in conditions?  I mean, we don't really support
> > > scalar TImode for ia32, but VALID_SSE_REG_MODE includes V1TImode and while
> > > the constant shifts can be done, I think the variable shifts can't, there
> > > are no TImode shift patterns...
> >
> > - (match_operand:SI 2 "const_int_operand")))]
> > -  "TARGET_SSE2"
> > + (match_operand:QI 2 "general_operand")))]
> > +  "TARGET_SSE2 && TARGET_64BIT"
> >
> > I wonder if this change is too restrictive, as it disables V1TI shifts
> > by constant on 32bit targets. Perhaps we can introduce a conditional
> > predicate, like:
> >
> > (define_predicate "shiftv1ti_input_operand"
> >   (if_then_else (match_test "TARGET_64BIT")
> >     (match_operand 0 "general_operand")
> >     (match_operand 0 "const_int_operand")))
> >
> > However, I'm not familiar with how the middle-end behaves with the
> > above approach - will it try to put the constant in a register under
> > some circumstances and consequently fail the expansion?
>
> That would run again into the assertions that shift expanders must never
> fail.
> The question is if a V1TImode shift can ever appear in 32-bit x86, because
> typedef __int128 V __attribute__((vector_size (16)));
> is rejected with
> error: ‘__int128’ is not supported on this target
> when -m32 is in use, no matter what ISA flags are used.

We can do:

typedef int __v1ti __attribute__((mode (V1TI)));

__v1ti foo (__v1ti a)
{
  return a << 11;
}

gcc -O2 -msse2 -m32:

v1ti.c:1:1: warning: specifying vector types with ‘__attribute__
((mode))’ is deprecated [-Wattributes]
   1 | typedef int __v1ti __attribute__((mode (V1TI)));
     | ^~~~~~~
v1ti.c:1:1: note: use ‘__attribute__ ((vector_size))’ instead
during RTL pass: expand
v1ti.c: In function ‘foo’:
v1ti.c:5:12: internal compiler error: in expand_shift_1, at expmed.c:2668
   5 |   return a << 11;
     |          ~~^~~~~

which looks like an oversight of some kind, since TI (and V2TI) mode
errors out with:

v1ti.c:1:1: error: unable to emulate ‘TI’

and

v1ti.c:1:1: error: unable to emulate ‘V2TI’

I will submit a PR with the above issue.

But I agree, V1TI is x86_64 specific, so the added insn constraint is OK.

Thanks,
Uros.

>         Jakub
>
  

Patch

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 4c3800e..db967e4 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -6157,12 +6157,52 @@  ix86_split_lshr (rtx *operands, rtx scratch, machine_mode mode)
     }
 }
 
+/* Expand move of V1TI mode register X to a new TI mode register.  */
+static rtx
+ix86_expand_v1ti_to_ti (rtx x)
+{
+  rtx result = gen_reg_rtx (TImode);
+  emit_move_insn (result, gen_lowpart (TImode, x));
+  return result;
+}
+
+/* Expand move of TI mode register X to a new V1TI mode register.  */
+static rtx
+ix86_expand_ti_to_v1ti (rtx x)
+{
+  rtx result = gen_reg_rtx (V1TImode);
+  if (TARGET_SSE2)
+    {
+      rtx lo = gen_lowpart (DImode, x);
+      rtx hi = gen_highpart (DImode, x);
+      rtx tmp = gen_reg_rtx (V2DImode);
+      emit_insn (gen_vec_concatv2di (tmp, lo, hi));
+      emit_move_insn (result, gen_lowpart (V1TImode, tmp));
+    }
+  else
+    emit_move_insn (result, gen_lowpart (V1TImode, x));
+  return result;
+}
+
 /* Expand V1TI mode shift (of rtx_code CODE) by constant.  */
-void ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[])
+void
+ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[])
 {
-  HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
   rtx op1 = force_reg (V1TImode, operands[1]);
 
+  if (!CONST_INT_P (operands[2]))
+    {
+      rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
+      rtx tmp2 = gen_reg_rtx (TImode);
+      emit_insn (code == ASHIFT ? gen_ashlti3 (tmp2, tmp1, operands[2])
+				: gen_lshrti3 (tmp2, tmp1, operands[2]));
+      rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
+      emit_move_insn (operands[0], tmp3);
+      return;
+    }
+
+  HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
+
   if (bits == 0)
     {
       emit_move_insn (operands[0], op1);
@@ -6173,7 +6213,7 @@  void ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[])
     {
       rtx tmp = gen_reg_rtx (V1TImode);
       if (code == ASHIFT)
-        emit_insn (gen_sse2_ashlv1ti3 (tmp, op1, GEN_INT (bits)));
+	emit_insn (gen_sse2_ashlv1ti3 (tmp, op1, GEN_INT (bits)));
       else
 	emit_insn (gen_sse2_lshrv1ti3 (tmp, op1, GEN_INT (bits)));
       emit_move_insn (operands[0], tmp);
@@ -6228,11 +6268,24 @@  void ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[])
 }
 
 /* Expand V1TI mode rotate (of rtx_code CODE) by constant.  */
-void ix86_expand_v1ti_rotate (enum rtx_code code, rtx operands[])
+void
+ix86_expand_v1ti_rotate (enum rtx_code code, rtx operands[])
 {
-  HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
   rtx op1 = force_reg (V1TImode, operands[1]);
 
+  if (!CONST_INT_P (operands[2]))
+    {
+      rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
+      rtx tmp2 = gen_reg_rtx (TImode);
+      emit_insn (code == ROTATE ? gen_rotlti3 (tmp2, tmp1, operands[2])
+				: gen_rotrti3 (tmp2, tmp1, operands[2]));
+      rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
+      emit_move_insn (operands[0], tmp3);
+      return;
+    }
+
+  HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
+
   if (bits == 0)
     {
       emit_move_insn (operands[0], op1);
@@ -6320,6 +6373,469 @@  void ix86_expand_v1ti_rotate (enum rtx_code code, rtx operands[])
   emit_move_insn (operands[0], tmp4);
 }
 
+/* Expand V1TI mode ashiftrt by constant.  */
+void
+ix86_expand_v1ti_ashiftrt (rtx operands[])
+{
+  rtx op1 = force_reg (V1TImode, operands[1]);
+
+  if (!CONST_INT_P (operands[2]))
+    {
+      rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
+      rtx tmp2 = gen_reg_rtx (TImode);
+      emit_insn (gen_ashrti3 (tmp2, tmp1, operands[2]));
+      rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
+      emit_move_insn (operands[0], tmp3);
+      return;
+    }
+
+  HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
+
+  if (bits == 0)
+    {
+      emit_move_insn (operands[0], op1);
+      return;
+    }
+
+  if (bits == 127)
+    {
+      /* Two operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+      rtx tmp3 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+      rtx tmp4 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp4, gen_lowpart (V1TImode, tmp3));
+      emit_move_insn (operands[0], tmp4);
+      return;
+    }
+
+  if (bits == 64)
+    {
+      /* Three operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+      rtx tmp3 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+      rtx tmp4 = gen_reg_rtx (V2DImode);
+      rtx tmp5 = gen_reg_rtx (V2DImode);
+      rtx tmp6 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp4, gen_lowpart (V2DImode, tmp1));
+      emit_move_insn (tmp5, gen_lowpart (V2DImode, tmp3));
+      emit_insn (gen_vec_interleave_highv2di (tmp6, tmp4, tmp5));
+
+      rtx tmp7 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6));
+      emit_move_insn (operands[0], tmp7);
+      return;
+    }
+
+  if (bits == 96)
+    {
+      /* Three operations.  */
+      rtx tmp3 = gen_reg_rtx (V2DImode);
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (31)));
+
+      rtx tmp4 = gen_reg_rtx (V2DImode);
+      rtx tmp5 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp3, gen_lowpart (V2DImode, tmp1));
+      emit_move_insn (tmp4, gen_lowpart (V2DImode, tmp2));
+      emit_insn (gen_vec_interleave_highv2di (tmp5, tmp3, tmp4));
+
+      rtx tmp6 = gen_reg_rtx (V4SImode);
+      rtx tmp7 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp6, gen_lowpart (V4SImode, tmp5));
+      emit_insn (gen_sse2_pshufd (tmp7, tmp6, GEN_INT (0xfd)));
+
+      rtx tmp8 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp8, gen_lowpart (V1TImode, tmp7));
+      emit_move_insn (operands[0], tmp8);
+      return;
+    }
+
+  if (TARGET_AVX2 || TARGET_SSE4_1)
+    {
+      /* Three operations.  */
+      if (bits == 32)
+	{
+	  rtx tmp1 = gen_reg_rtx (V4SImode);
+	  rtx tmp2 = gen_reg_rtx (V4SImode);
+	  emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+	  emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (31)));
+
+	  rtx tmp3 = gen_reg_rtx (V1TImode);
+	  emit_insn (gen_sse2_lshrv1ti3 (tmp3, op1, GEN_INT (32)));
+
+	  if (TARGET_AVX2)
+	    {
+	      rtx tmp4 = gen_reg_rtx (V4SImode);
+	      rtx tmp5 = gen_reg_rtx (V4SImode);
+	      emit_move_insn (tmp4, gen_lowpart (V4SImode, tmp3));
+	      emit_insn (gen_avx2_pblenddv4si (tmp5, tmp2, tmp4,
+					       GEN_INT (7)));
+
+	      rtx tmp6 = gen_reg_rtx (V1TImode);
+	      emit_move_insn (tmp6, gen_lowpart (V1TImode, tmp5));
+	      emit_move_insn (operands[0], tmp6);
+	    }
+	  else
+	    {
+	      rtx tmp4 = gen_reg_rtx (V8HImode);
+	      rtx tmp5 = gen_reg_rtx (V8HImode);
+	      rtx tmp6 = gen_reg_rtx (V8HImode);
+	      emit_move_insn (tmp4, gen_lowpart (V8HImode, tmp2));
+	      emit_move_insn (tmp5, gen_lowpart (V8HImode, tmp3));
+	      emit_insn (gen_sse4_1_pblendw (tmp6, tmp4, tmp5,
+					     GEN_INT (0x3f)));
+
+	      rtx tmp7 = gen_reg_rtx (V1TImode);
+	      emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6));
+	      emit_move_insn (operands[0], tmp7);
+	    }
+	  return;
+	}
+
+      /* Three operations.  */
+      if (bits == 8 || bits == 16 || bits == 24)
+	{
+	  rtx tmp1 = gen_reg_rtx (V4SImode);
+	  rtx tmp2 = gen_reg_rtx (V4SImode);
+	  emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+	  emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits)));
+
+	  rtx tmp3 = gen_reg_rtx (V1TImode);
+	  emit_insn (gen_sse2_lshrv1ti3 (tmp3, op1, GEN_INT (bits)));
+
+	  if (TARGET_AVX2)
+	    {
+	      rtx tmp4 = gen_reg_rtx (V4SImode);
+	      rtx tmp5 = gen_reg_rtx (V4SImode);
+	      emit_move_insn (tmp4, gen_lowpart (V4SImode, tmp3));
+	      emit_insn (gen_avx2_pblenddv4si (tmp5, tmp2, tmp4,
+					       GEN_INT (7)));
+
+	      rtx tmp6 = gen_reg_rtx (V1TImode);
+	      emit_move_insn (tmp6, gen_lowpart (V1TImode, tmp5));
+	      emit_move_insn (operands[0], tmp6);
+	    }
+	  else
+	    {
+	      rtx tmp4 = gen_reg_rtx (V8HImode);
+	      rtx tmp5 = gen_reg_rtx (V8HImode);
+	      rtx tmp6 = gen_reg_rtx (V8HImode);
+	      emit_move_insn (tmp4, gen_lowpart (V8HImode, tmp2));
+	      emit_move_insn (tmp5, gen_lowpart (V8HImode, tmp3));
+	      emit_insn (gen_sse4_1_pblendw (tmp6, tmp4, tmp5,
+					     GEN_INT (0x3f)));
+
+	      rtx tmp7 = gen_reg_rtx (V1TImode);
+	      emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6));
+	      emit_move_insn (operands[0], tmp7);
+	    }
+	  return;
+	}
+    }
+
+  if (bits > 96)
+    {
+      /* Four operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits - 96)));
+
+      rtx tmp3 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_ashrv4si3 (tmp3, tmp1, GEN_INT (31)));
+
+      rtx tmp4 = gen_reg_rtx (V2DImode);
+      rtx tmp5 = gen_reg_rtx (V2DImode);
+      rtx tmp6 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp4, gen_lowpart (V2DImode, tmp2));
+      emit_move_insn (tmp5, gen_lowpart (V2DImode, tmp3));
+      emit_insn (gen_vec_interleave_highv2di (tmp6, tmp4, tmp5));
+
+      rtx tmp7 = gen_reg_rtx (V4SImode);
+      rtx tmp8 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp7, gen_lowpart (V4SImode, tmp6));
+      emit_insn (gen_sse2_pshufd (tmp8, tmp7, GEN_INT (0xfd)));
+
+      rtx tmp9 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp9, gen_lowpart (V1TImode, tmp8));
+      emit_move_insn (operands[0], tmp9);
+      return;
+    }
+
+  if (TARGET_SSE4_1 && (bits == 48 || bits == 80))
+    {
+      /* Four operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+      rtx tmp3 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+      rtx tmp4 = gen_reg_rtx (V1TImode);
+      emit_insn (gen_sse2_lshrv1ti3 (tmp4, op1, GEN_INT (bits)));
+
+      rtx tmp5 = gen_reg_rtx (V8HImode);
+      rtx tmp6 = gen_reg_rtx (V8HImode);
+      rtx tmp7 = gen_reg_rtx (V8HImode);
+      emit_move_insn (tmp5, gen_lowpart (V8HImode, tmp3));
+      emit_move_insn (tmp6, gen_lowpart (V8HImode, tmp4));
+      emit_insn (gen_sse4_1_pblendw (tmp7, tmp5, tmp6,
+				     GEN_INT (bits == 48 ? 0x1f : 0x07)));
+
+      rtx tmp8 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp8, gen_lowpart (V1TImode, tmp7));
+      emit_move_insn (operands[0], tmp8);
+      return;
+    }
+
+  if ((bits & 7) == 0)
+    {
+      /* Five operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+      rtx tmp3 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+      rtx tmp4 = gen_reg_rtx (V1TImode);
+      emit_insn (gen_sse2_lshrv1ti3 (tmp4, op1, GEN_INT (bits)));
+
+      rtx tmp5 = gen_reg_rtx (V1TImode);
+      rtx tmp6 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp5, gen_lowpart (V1TImode, tmp3));
+      emit_insn (gen_sse2_ashlv1ti3 (tmp6, tmp5, GEN_INT (128 - bits)));
+
+      rtx tmp7 = gen_reg_rtx (V2DImode);
+      rtx tmp8 = gen_reg_rtx (V2DImode);
+      rtx tmp9 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp7, gen_lowpart (V2DImode, tmp4));
+      emit_move_insn (tmp8, gen_lowpart (V2DImode, tmp6));
+      emit_insn (gen_iorv2di3 (tmp9, tmp7, tmp8));
+
+      rtx tmp10 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp10, gen_lowpart (V1TImode, tmp9));
+      emit_move_insn (operands[0], tmp10);
+      return;
+    }
+
+  if (TARGET_AVX2 && bits < 32)
+    {
+      /* Six operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits)));
+
+      rtx tmp3 = gen_reg_rtx (V1TImode);
+      emit_insn (gen_sse2_lshrv1ti3 (tmp3, op1, GEN_INT (64)));
+
+      rtx tmp4 = gen_reg_rtx (V2DImode);
+      rtx tmp5 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp4, gen_lowpart (V2DImode, op1));
+      emit_insn (gen_lshrv2di3 (tmp5, tmp4, GEN_INT (bits)));
+
+      rtx tmp6 = gen_reg_rtx (V2DImode);
+      rtx tmp7 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp6, gen_lowpart (V2DImode, tmp3));
+      emit_insn (gen_ashlv2di3 (tmp7, tmp6, GEN_INT (64 - bits)));
+
+      rtx tmp8 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_iorv2di3 (tmp8, tmp5, tmp7));
+
+      rtx tmp9 = gen_reg_rtx (V4SImode);
+      rtx tmp10 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp9, gen_lowpart (V4SImode, tmp8));
+      emit_insn (gen_avx2_pblenddv4si (tmp10, tmp2, tmp9, GEN_INT (7)));
+
+      rtx tmp11 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp11, gen_lowpart (V1TImode, tmp10));
+      emit_move_insn (operands[0], tmp11);
+      return;
+    }
+
+  if (TARGET_SSE4_1 && bits < 15)
+    {
+      /* Six operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits)));
+
+      rtx tmp3 = gen_reg_rtx (V1TImode);
+      emit_insn (gen_sse2_lshrv1ti3 (tmp3, op1, GEN_INT (64)));
+
+      rtx tmp4 = gen_reg_rtx (V2DImode);
+      rtx tmp5 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp4, gen_lowpart (V2DImode, op1));
+      emit_insn (gen_lshrv2di3 (tmp5, tmp4, GEN_INT (bits)));
+
+      rtx tmp6 = gen_reg_rtx (V2DImode);
+      rtx tmp7 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp6, gen_lowpart (V2DImode, tmp3));
+      emit_insn (gen_ashlv2di3 (tmp7, tmp6, GEN_INT (64 - bits)));
+
+      rtx tmp8 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_iorv2di3 (tmp8, tmp5, tmp7));
+
+      rtx tmp9 = gen_reg_rtx (V8HImode);
+      rtx tmp10 = gen_reg_rtx (V8HImode);
+      rtx tmp11 = gen_reg_rtx (V8HImode);
+      emit_move_insn (tmp9, gen_lowpart (V8HImode, tmp2));
+      emit_move_insn (tmp10, gen_lowpart (V8HImode, tmp8));
+      emit_insn (gen_sse4_1_pblendw (tmp11, tmp9, tmp10, GEN_INT (0x3f)));
+
+      rtx tmp12 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp12, gen_lowpart (V1TImode, tmp11));
+      emit_move_insn (operands[0], tmp12);
+      return;
+    }
+
+  if (bits == 1)
+    {
+      /* Eight operations.  */
+      rtx tmp1 = gen_reg_rtx (V1TImode);
+      emit_insn (gen_sse2_lshrv1ti3 (tmp1, op1, GEN_INT (64)));
+
+      rtx tmp2 = gen_reg_rtx (V2DImode);
+      rtx tmp3 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp2, gen_lowpart (V2DImode, op1));
+      emit_insn (gen_lshrv2di3 (tmp3, tmp2, GEN_INT (1)));
+
+      rtx tmp4 = gen_reg_rtx (V2DImode);
+      rtx tmp5 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp4, gen_lowpart (V2DImode, tmp1));
+      emit_insn (gen_ashlv2di3 (tmp5, tmp4, GEN_INT (63)));
+
+      rtx tmp6 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_iorv2di3 (tmp6, tmp3, tmp5));
+
+      rtx tmp7 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_lshrv2di3 (tmp7, tmp2, GEN_INT (63)));
+
+      rtx tmp8 = gen_reg_rtx (V4SImode);
+      rtx tmp9 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp8, gen_lowpart (V4SImode, tmp7));
+      emit_insn (gen_sse2_pshufd (tmp9, tmp8, GEN_INT (0xbf)));
+
+      rtx tmp10 = gen_reg_rtx (V2DImode);
+      rtx tmp11 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp10, gen_lowpart (V2DImode, tmp9));
+      emit_insn (gen_ashlv2di3 (tmp11, tmp10, GEN_INT (31)));
+
+      rtx tmp12 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_iorv2di3 (tmp12, tmp6, tmp11));
+
+      rtx tmp13 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp13, gen_lowpart (V1TImode, tmp12));
+      emit_move_insn (operands[0], tmp13);
+      return;
+    }
+
+  if (bits > 64)
+    {
+      /* Eight operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+      rtx tmp3 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+      rtx tmp4 = gen_reg_rtx (V1TImode);
+      emit_insn (gen_sse2_lshrv1ti3 (tmp4, op1, GEN_INT (64)));
+
+      rtx tmp5 = gen_reg_rtx (V2DImode);
+      rtx tmp6 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp5, gen_lowpart (V2DImode, tmp4));
+      emit_insn (gen_lshrv2di3 (tmp6, tmp5, GEN_INT (bits - 64)));
+
+      rtx tmp7 = gen_reg_rtx (V1TImode);
+      rtx tmp8 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp3));
+      emit_insn (gen_sse2_ashlv1ti3 (tmp8, tmp7, GEN_INT (64)));
+ 
+      rtx tmp9 = gen_reg_rtx (V2DImode);
+      rtx tmp10 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp9, gen_lowpart (V2DImode, tmp3));
+      emit_insn (gen_ashlv2di3 (tmp10, tmp9, GEN_INT (128 - bits)));
+
+      rtx tmp11 = gen_reg_rtx (V2DImode);
+      rtx tmp12 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp11, gen_lowpart (V2DImode, tmp8));
+      emit_insn (gen_iorv2di3 (tmp12, tmp10, tmp11));
+
+      rtx tmp13 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_iorv2di3 (tmp13, tmp6, tmp12));
+
+      rtx tmp14 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp14, gen_lowpart (V1TImode, tmp13));
+      emit_move_insn (operands[0], tmp14);
+    }
+  else
+    {
+      /* Nine operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+      rtx tmp3 = gen_reg_rtx (V4SImode);
+      emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+      rtx tmp4 = gen_reg_rtx (V1TImode);
+      emit_insn (gen_sse2_lshrv1ti3 (tmp4, op1, GEN_INT (64)));
+
+      rtx tmp5 = gen_reg_rtx (V2DImode);
+      rtx tmp6 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp5, gen_lowpart (V2DImode, op1));
+      emit_insn (gen_lshrv2di3 (tmp6, tmp5, GEN_INT (bits)));
+
+      rtx tmp7 = gen_reg_rtx (V2DImode);
+      rtx tmp8 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp7, gen_lowpart (V2DImode, tmp4));
+      emit_insn (gen_ashlv2di3 (tmp8, tmp7, GEN_INT (64 - bits)));
+
+      rtx tmp9 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_iorv2di3 (tmp9, tmp6, tmp8));
+
+      rtx tmp10 = gen_reg_rtx (V1TImode);
+      rtx tmp11 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp10, gen_lowpart (V1TImode, tmp3));
+      emit_insn (gen_sse2_ashlv1ti3 (tmp11, tmp10, GEN_INT (64)));
+
+      rtx tmp12 = gen_reg_rtx (V2DImode);
+      rtx tmp13 = gen_reg_rtx (V2DImode);
+      emit_move_insn (tmp12, gen_lowpart (V2DImode, tmp11));
+      emit_insn (gen_ashlv2di3 (tmp13, tmp12, GEN_INT (64 - bits)));
+
+      rtx tmp14 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_iorv2di3 (tmp14, tmp9, tmp13));
+
+      rtx tmp15 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp15, gen_lowpart (V1TImode, tmp14));
+      emit_move_insn (operands[0], tmp15);
+    }
+}
+
 /* Return mode for the memcpy/memset loop counter.  Prefer SImode over
    DImode for constant loop counts.  */
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 9918a28..bd52450 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -161,6 +161,7 @@  extern void ix86_split_ashr (rtx *, rtx, machine_mode);
 extern void ix86_split_lshr (rtx *, rtx, machine_mode);
 extern void ix86_expand_v1ti_shift (enum rtx_code, rtx[]);
 extern void ix86_expand_v1ti_rotate (enum rtx_code, rtx[]);
+extern void ix86_expand_v1ti_ashiftrt (rtx[]);
 extern rtx ix86_find_base_term (rtx);
 extern bool ix86_check_movabs (rtx, int);
 extern bool ix86_check_no_addr_space (rtx);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bdc6067..3307c1b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15079,8 +15079,8 @@ 
   [(set (match_operand:V1TI 0 "register_operand")
 	(ashift:V1TI
 	 (match_operand:V1TI 1 "register_operand")
-	 (match_operand:SI 2 "const_int_operand")))]
-  "TARGET_SSE2"
+	 (match_operand:QI 2 "general_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"
 {
   ix86_expand_v1ti_shift (ASHIFT, operands);
   DONE;
@@ -15090,19 +15090,30 @@ 
   [(set (match_operand:V1TI 0 "register_operand")
 	(lshiftrt:V1TI
 	 (match_operand:V1TI 1 "register_operand")
-	 (match_operand:SI 2 "const_int_operand")))]
-  "TARGET_SSE2"
+	 (match_operand:QI 2 "general_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"
 {
   ix86_expand_v1ti_shift (LSHIFTRT, operands);
   DONE;
 })
 
+(define_expand "ashrv1ti3"
+  [(set (match_operand:V1TI 0 "register_operand")
+	(ashiftrt:V1TI
+	 (match_operand:V1TI 1 "register_operand")
+	 (match_operand:QI 2 "general_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"
+{
+  ix86_expand_v1ti_ashiftrt (operands);
+  DONE;
+})
+
 (define_expand "rotlv1ti3"
   [(set (match_operand:V1TI 0 "register_operand")
 	(rotate:V1TI
 	 (match_operand:V1TI 1 "register_operand")
-	 (match_operand:SI 2 "const_int_operand")))]
-  "TARGET_SSE2"
+	 (match_operand:QI 2 "const_int_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"
 {
   ix86_expand_v1ti_rotate (ROTATE, operands);
   DONE;
@@ -15112,8 +15123,8 @@ 
   [(set (match_operand:V1TI 0 "register_operand")
 	(rotatert:V1TI
 	 (match_operand:V1TI 1 "register_operand")
-	 (match_operand:SI 2 "const_int_operand")))]
-  "TARGET_SSE2"
+	 (match_operand:QI 2 "const_int_operand")))]
+  "TARGET_SSE2 && TARGET_64BIT"
 {
   ix86_expand_v1ti_rotate (ROTATERT, operands);
   DONE;
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-1.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-1.c
new file mode 100644
index 0000000..05869bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-1.c
@@ -0,0 +1,167 @@ 
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 ti;
+
+ti ashr(ti x, unsigned int i) { return x >> i; }
+
+v1ti ashr_1(v1ti x) { return x >> 1; }
+v1ti ashr_2(v1ti x) { return x >> 2; }
+v1ti ashr_7(v1ti x) { return x >> 7; }
+v1ti ashr_8(v1ti x) { return x >> 8; }
+v1ti ashr_9(v1ti x) { return x >> 9; }
+v1ti ashr_15(v1ti x) { return x >> 15; }
+v1ti ashr_16(v1ti x) { return x >> 16; }
+v1ti ashr_17(v1ti x) { return x >> 17; }
+v1ti ashr_23(v1ti x) { return x >> 23; }
+v1ti ashr_24(v1ti x) { return x >> 24; }
+v1ti ashr_25(v1ti x) { return x >> 25; }
+v1ti ashr_31(v1ti x) { return x >> 31; }
+v1ti ashr_32(v1ti x) { return x >> 32; }
+v1ti ashr_33(v1ti x) { return x >> 33; }
+v1ti ashr_47(v1ti x) { return x >> 47; }
+v1ti ashr_48(v1ti x) { return x >> 48; }
+v1ti ashr_49(v1ti x) { return x >> 49; }
+v1ti ashr_63(v1ti x) { return x >> 63; }
+v1ti ashr_64(v1ti x) { return x >> 64; }
+v1ti ashr_65(v1ti x) { return x >> 65; }
+v1ti ashr_72(v1ti x) { return x >> 72; }
+v1ti ashr_79(v1ti x) { return x >> 79; }
+v1ti ashr_80(v1ti x) { return x >> 80; }
+v1ti ashr_81(v1ti x) { return x >> 81; }
+v1ti ashr_95(v1ti x) { return x >> 95; }
+v1ti ashr_96(v1ti x) { return x >> 96; }
+v1ti ashr_97(v1ti x) { return x >> 97; }
+v1ti ashr_111(v1ti x) { return x >> 111; }
+v1ti ashr_112(v1ti x) { return x >> 112; }
+v1ti ashr_113(v1ti x) { return x >> 113; }
+v1ti ashr_119(v1ti x) { return x >> 119; }
+v1ti ashr_120(v1ti x) { return x >> 120; }
+v1ti ashr_121(v1ti x) { return x >> 121; }
+v1ti ashr_126(v1ti x) { return x >> 126; }
+v1ti ashr_127(v1ti x) { return x >> 127; }
+
+typedef v1ti (*fun)(v1ti);
+
+struct {
+  unsigned int i;
+  fun ashr;
+} table[35] = {
+  {   1, ashr_1   },
+  {   2, ashr_2   },
+  {   7, ashr_7   },
+  {   8, ashr_8   },
+  {   9, ashr_9   },
+  {  15, ashr_15  },
+  {  16, ashr_16  },
+  {  17, ashr_17  },
+  {  23, ashr_23  },
+  {  24, ashr_24  },
+  {  25, ashr_25  },
+  {  31, ashr_31  },
+  {  32, ashr_32  },
+  {  33, ashr_33  },
+  {  47, ashr_47  },
+  {  48, ashr_48  },
+  {  49, ashr_49  },
+  {  63, ashr_63  },
+  {  64, ashr_64  },
+  {  65, ashr_65  },
+  {  72, ashr_72  },
+  {  79, ashr_79  },
+  {  80, ashr_80  },
+  {  81, ashr_81  },
+  {  95, ashr_95  },
+  {  96, ashr_96  },
+  {  97, ashr_97  },
+  { 111, ashr_111 },
+  { 112, ashr_112 },
+  { 113, ashr_113 },
+  { 119, ashr_119 },
+  { 120, ashr_120 },
+  { 121, ashr_121 },
+  { 126, ashr_126 },
+  { 127, ashr_127 }
+};
+
+void test(ti x)
+{
+  unsigned int i;
+  v1ti t = (v1ti)x;
+
+  for (i=0; i<(sizeof(table)/sizeof(table[0])); i++) {
+    if ((ti)(*table[i].ashr)(t) != ashr(x,table[i].i))
+      __builtin_abort();
+  }
+}
+
+int main()
+{
+  ti x;
+
+  x = ((ti)0x0011223344556677ull)<<64 | 0x8899aabbccddeeffull;
+  test(x);
+  x = ((ti)0xffeeddccbbaa9988ull)<<64 | 0x7766554433221100ull;
+  test(x);
+  x = ((ti)0x0123456789abcdefull)<<64 | 0x0123456789abcdefull;
+  test(x);
+  x = ((ti)0xfedcba9876543210ull)<<64 | 0xfedcba9876543210ull;
+  test(x);
+  x = ((ti)0x0123456789abcdefull)<<64 | 0xfedcba9876543210ull;
+  test(x);
+  x = ((ti)0xfedcba9876543210ull)<<64 | 0x0123456789abcdefull;
+  test(x);
+  x = 0;
+  test(x);
+  x = 0xffffffffffffffffull;
+  test(x);
+  x = ((ti)0xffffffffffffffffull)<<64;
+  test(x);
+  x = ((ti)0xffffffffffffffffull)<<64 | 0xffffffffffffffffull;
+  test(x);
+  x = ((ti)0x5a5a5a5a5a5a5a5aull)<<64 | 0x5a5a5a5a5a5a5a5aull;
+  test(x);
+  x = ((ti)0xa5a5a5a5a5a5a5a5ull)<<64 | 0xa5a5a5a5a5a5a5a5ull;
+  test(x);
+  x = 0xffull;
+  test(x);
+  x = 0xff00ull;
+  test(x);
+  x = 0xff0000ull;
+  test(x);
+  x = 0xff000000ull;
+  test(x);
+  x = 0xff00000000ull;
+  test(x);
+  x = 0xff0000000000ull;
+  test(x);
+  x = 0xff000000000000ull;
+  test(x);
+  x = 0xff00000000000000ull;
+  test(x);
+  x = ((ti)0xffull)<<64;
+  test(x);
+  x = ((ti)0xff00ull)<<64;
+  test(x);
+  x = ((ti)0xff0000ull)<<64;
+  test(x);
+  x = ((ti)0xff000000ull)<<64;
+  test(x);
+  x = ((ti)0xff00000000ull)<<64;
+  test(x);
+  x = ((ti)0xff0000000000ull)<<64;
+  test(x);
+  x = ((ti)0xff000000000000ull)<<64;
+  test(x);
+  x = ((ti)0xff00000000000000ull)<<64;
+  test(x);
+  x = 0xdeadbeefcafebabeull;
+  test(x);
+  x = ((ti)0xdeadbeefcafebabeull)<<64;
+  test(x);
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-2.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-2.c
new file mode 100644
index 0000000..b3d0aa3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-2.c
@@ -0,0 +1,166 @@ 
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2 -mavx2 " } */
+
+typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 ti;
+
+ti ashr(ti x, unsigned int i) { return x >> i; }
+
+v1ti ashr_1(v1ti x) { return x >> 1; }
+v1ti ashr_2(v1ti x) { return x >> 2; }
+v1ti ashr_7(v1ti x) { return x >> 7; }
+v1ti ashr_8(v1ti x) { return x >> 8; }
+v1ti ashr_9(v1ti x) { return x >> 9; }
+v1ti ashr_15(v1ti x) { return x >> 15; }
+v1ti ashr_16(v1ti x) { return x >> 16; }
+v1ti ashr_17(v1ti x) { return x >> 17; }
+v1ti ashr_23(v1ti x) { return x >> 23; }
+v1ti ashr_24(v1ti x) { return x >> 24; }
+v1ti ashr_25(v1ti x) { return x >> 25; }
+v1ti ashr_31(v1ti x) { return x >> 31; }
+v1ti ashr_32(v1ti x) { return x >> 32; }
+v1ti ashr_33(v1ti x) { return x >> 33; }
+v1ti ashr_47(v1ti x) { return x >> 47; }
+v1ti ashr_48(v1ti x) { return x >> 48; }
+v1ti ashr_49(v1ti x) { return x >> 49; }
+v1ti ashr_63(v1ti x) { return x >> 63; }
+v1ti ashr_64(v1ti x) { return x >> 64; }
+v1ti ashr_65(v1ti x) { return x >> 65; }
+v1ti ashr_72(v1ti x) { return x >> 72; }
+v1ti ashr_79(v1ti x) { return x >> 79; }
+v1ti ashr_80(v1ti x) { return x >> 80; }
+v1ti ashr_81(v1ti x) { return x >> 81; }
+v1ti ashr_95(v1ti x) { return x >> 95; }
+v1ti ashr_96(v1ti x) { return x >> 96; }
+v1ti ashr_97(v1ti x) { return x >> 97; }
+v1ti ashr_111(v1ti x) { return x >> 111; }
+v1ti ashr_112(v1ti x) { return x >> 112; }
+v1ti ashr_113(v1ti x) { return x >> 113; }
+v1ti ashr_119(v1ti x) { return x >> 119; }
+v1ti ashr_120(v1ti x) { return x >> 120; }
+v1ti ashr_121(v1ti x) { return x >> 121; }
+v1ti ashr_126(v1ti x) { return x >> 126; }
+v1ti ashr_127(v1ti x) { return x >> 127; }
+
+typedef v1ti (*fun)(v1ti);
+
+struct {
+  unsigned int i;
+  fun ashr;
+} table[35] = {
+  {   1, ashr_1   },
+  {   2, ashr_2   },
+  {   7, ashr_7   },
+  {   8, ashr_8   },
+  {   9, ashr_9   },
+  {  15, ashr_15  },
+  {  16, ashr_16  },
+  {  17, ashr_17  },
+  {  23, ashr_23  },
+  {  24, ashr_24  },
+  {  25, ashr_25  },
+  {  31, ashr_31  },
+  {  32, ashr_32  },
+  {  33, ashr_33  },
+  {  47, ashr_47  },
+  {  48, ashr_48  },
+  {  49, ashr_49  },
+  {  63, ashr_63  },
+  {  64, ashr_64  },
+  {  65, ashr_65  },
+  {  72, ashr_72  },
+  {  79, ashr_79  },
+  {  80, ashr_80  },
+  {  81, ashr_81  },
+  {  95, ashr_95  },
+  {  96, ashr_96  },
+  {  97, ashr_97  },
+  { 111, ashr_111 },
+  { 112, ashr_112 },
+  { 113, ashr_113 },
+  { 119, ashr_119 },
+  { 120, ashr_120 },
+  { 121, ashr_121 },
+  { 126, ashr_126 },
+  { 127, ashr_127 }
+};
+
+void test(ti x)
+{
+  unsigned int i;
+  v1ti t = (v1ti)x;
+
+  for (i=0; i<(sizeof(table)/sizeof(table[0])); i++) {
+    if ((ti)(*table[i].ashr)(t) != ashr(x,table[i].i))
+      __builtin_abort();
+  }
+}
+
+int main()
+{
+  ti x;
+
+  x = ((ti)0x0011223344556677ull)<<64 | 0x8899aabbccddeeffull;
+  test(x);
+  x = ((ti)0xffeeddccbbaa9988ull)<<64 | 0x7766554433221100ull;
+  test(x);
+  x = ((ti)0x0123456789abcdefull)<<64 | 0x0123456789abcdefull;
+  test(x);
+  x = ((ti)0xfedcba9876543210ull)<<64 | 0xfedcba9876543210ull;
+  test(x);
+  x = ((ti)0x0123456789abcdefull)<<64 | 0xfedcba9876543210ull;
+  test(x);
+  x = ((ti)0xfedcba9876543210ull)<<64 | 0x0123456789abcdefull;
+  test(x);
+  x = 0;
+  test(x);
+  x = 0xffffffffffffffffull;
+  test(x);
+  x = ((ti)0xffffffffffffffffull)<<64;
+  test(x);
+  x = ((ti)0xffffffffffffffffull)<<64 | 0xffffffffffffffffull;
+  test(x);
+  x = ((ti)0x5a5a5a5a5a5a5a5aull)<<64 | 0x5a5a5a5a5a5a5a5aull;
+  test(x);
+  x = ((ti)0xa5a5a5a5a5a5a5a5ull)<<64 | 0xa5a5a5a5a5a5a5a5ull;
+  test(x);
+  x = 0xffull;
+  test(x);
+  x = 0xff00ull;
+  test(x);
+  x = 0xff0000ull;
+  test(x);
+  x = 0xff000000ull;
+  test(x);
+  x = 0xff00000000ull;
+  test(x);
+  x = 0xff0000000000ull;
+  test(x);
+  x = 0xff000000000000ull;
+  test(x);
+  x = 0xff00000000000000ull;
+  test(x);
+  x = ((ti)0xffull)<<64;
+  test(x);
+  x = ((ti)0xff00ull)<<64;
+  test(x);
+  x = ((ti)0xff0000ull)<<64;
+  test(x);
+  x = ((ti)0xff000000ull)<<64;
+  test(x);
+  x = ((ti)0xff00000000ull)<<64;
+  test(x);
+  x = ((ti)0xff0000000000ull)<<64;
+  test(x);
+  x = ((ti)0xff000000000000ull)<<64;
+  test(x);
+  x = ((ti)0xff00000000000000ull)<<64;
+  test(x);
+  x = 0xdeadbeefcafebabeull;
+  test(x);
+  x = ((ti)0xdeadbeefcafebabeull)<<64;
+  test(x);
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-3.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-3.c
new file mode 100644
index 0000000..61d4f4c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-3.c
@@ -0,0 +1,166 @@ 
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2 -msse4.1" } */
+
+typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 ti;
+
+ti ashr(ti x, unsigned int i) { return x >> i; }
+
+v1ti ashr_1(v1ti x) { return x >> 1; }
+v1ti ashr_2(v1ti x) { return x >> 2; }
+v1ti ashr_7(v1ti x) { return x >> 7; }
+v1ti ashr_8(v1ti x) { return x >> 8; }
+v1ti ashr_9(v1ti x) { return x >> 9; }
+v1ti ashr_15(v1ti x) { return x >> 15; }
+v1ti ashr_16(v1ti x) { return x >> 16; }
+v1ti ashr_17(v1ti x) { return x >> 17; }
+v1ti ashr_23(v1ti x) { return x >> 23; }
+v1ti ashr_24(v1ti x) { return x >> 24; }
+v1ti ashr_25(v1ti x) { return x >> 25; }
+v1ti ashr_31(v1ti x) { return x >> 31; }
+v1ti ashr_32(v1ti x) { return x >> 32; }
+v1ti ashr_33(v1ti x) { return x >> 33; }
+v1ti ashr_47(v1ti x) { return x >> 47; }
+v1ti ashr_48(v1ti x) { return x >> 48; }
+v1ti ashr_49(v1ti x) { return x >> 49; }
+v1ti ashr_63(v1ti x) { return x >> 63; }
+v1ti ashr_64(v1ti x) { return x >> 64; }
+v1ti ashr_65(v1ti x) { return x >> 65; }
+v1ti ashr_72(v1ti x) { return x >> 72; }
+v1ti ashr_79(v1ti x) { return x >> 79; }
+v1ti ashr_80(v1ti x) { return x >> 80; }
+v1ti ashr_81(v1ti x) { return x >> 81; }
+v1ti ashr_95(v1ti x) { return x >> 95; }
+v1ti ashr_96(v1ti x) { return x >> 96; }
+v1ti ashr_97(v1ti x) { return x >> 97; }
+v1ti ashr_111(v1ti x) { return x >> 111; }
+v1ti ashr_112(v1ti x) { return x >> 112; }
+v1ti ashr_113(v1ti x) { return x >> 113; }
+v1ti ashr_119(v1ti x) { return x >> 119; }
+v1ti ashr_120(v1ti x) { return x >> 120; }
+v1ti ashr_121(v1ti x) { return x >> 121; }
+v1ti ashr_126(v1ti x) { return x >> 126; }
+v1ti ashr_127(v1ti x) { return x >> 127; }
+
+typedef v1ti (*fun)(v1ti);
+
+struct {
+  unsigned int i;
+  fun ashr;
+} table[35] = {
+  {   1, ashr_1   },
+  {   2, ashr_2   },
+  {   7, ashr_7   },
+  {   8, ashr_8   },
+  {   9, ashr_9   },
+  {  15, ashr_15  },
+  {  16, ashr_16  },
+  {  17, ashr_17  },
+  {  23, ashr_23  },
+  {  24, ashr_24  },
+  {  25, ashr_25  },
+  {  31, ashr_31  },
+  {  32, ashr_32  },
+  {  33, ashr_33  },
+  {  47, ashr_47  },
+  {  48, ashr_48  },
+  {  49, ashr_49  },
+  {  63, ashr_63  },
+  {  64, ashr_64  },
+  {  65, ashr_65  },
+  {  72, ashr_72  },
+  {  79, ashr_79  },
+  {  80, ashr_80  },
+  {  81, ashr_81  },
+  {  95, ashr_95  },
+  {  96, ashr_96  },
+  {  97, ashr_97  },
+  { 111, ashr_111 },
+  { 112, ashr_112 },
+  { 113, ashr_113 },
+  { 119, ashr_119 },
+  { 120, ashr_120 },
+  { 121, ashr_121 },
+  { 126, ashr_126 },
+  { 127, ashr_127 }
+};
+
+void test(ti x)
+{
+  unsigned int i;
+  v1ti t = (v1ti)x;
+
+  for (i=0; i<(sizeof(table)/sizeof(table[0])); i++) {
+    if ((ti)(*table[i].ashr)(t) != ashr(x,table[i].i))
+      __builtin_abort();
+  }
+}
+
+int main()
+{
+  ti x;
+
+  x = ((ti)0x0011223344556677ull)<<64 | 0x8899aabbccddeeffull;
+  test(x);
+  x = ((ti)0xffeeddccbbaa9988ull)<<64 | 0x7766554433221100ull;
+  test(x);
+  x = ((ti)0x0123456789abcdefull)<<64 | 0x0123456789abcdefull;
+  test(x);
+  x = ((ti)0xfedcba9876543210ull)<<64 | 0xfedcba9876543210ull;
+  test(x);
+  x = ((ti)0x0123456789abcdefull)<<64 | 0xfedcba9876543210ull;
+  test(x);
+  x = ((ti)0xfedcba9876543210ull)<<64 | 0x0123456789abcdefull;
+  test(x);
+  x = 0;
+  test(x);
+  x = 0xffffffffffffffffull;
+  test(x);
+  x = ((ti)0xffffffffffffffffull)<<64;
+  test(x);
+  x = ((ti)0xffffffffffffffffull)<<64 | 0xffffffffffffffffull;
+  test(x);
+  x = ((ti)0x5a5a5a5a5a5a5a5aull)<<64 | 0x5a5a5a5a5a5a5a5aull;
+  test(x);
+  x = ((ti)0xa5a5a5a5a5a5a5a5ull)<<64 | 0xa5a5a5a5a5a5a5a5ull;
+  test(x);
+  x = 0xffull;
+  test(x);
+  x = 0xff00ull;
+  test(x);
+  x = 0xff0000ull;
+  test(x);
+  x = 0xff000000ull;
+  test(x);
+  x = 0xff00000000ull;
+  test(x);
+  x = 0xff0000000000ull;
+  test(x);
+  x = 0xff000000000000ull;
+  test(x);
+  x = 0xff00000000000000ull;
+  test(x);
+  x = ((ti)0xffull)<<64;
+  test(x);
+  x = ((ti)0xff00ull)<<64;
+  test(x);
+  x = ((ti)0xff0000ull)<<64;
+  test(x);
+  x = ((ti)0xff000000ull)<<64;
+  test(x);
+  x = ((ti)0xff00000000ull)<<64;
+  test(x);
+  x = ((ti)0xff0000000000ull)<<64;
+  test(x);
+  x = ((ti)0xff000000000000ull)<<64;
+  test(x);
+  x = ((ti)0xff00000000000000ull)<<64;
+  test(x);
+  x = 0xdeadbeefcafebabeull;
+  test(x);
+  x = ((ti)0xdeadbeefcafebabeull)<<64;
+  test(x);
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-2.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-2.c
new file mode 100644
index 0000000..18da2ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-2.c
@@ -0,0 +1,13 @@ 
+/* PR target/102986 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 sv1ti __attribute__ ((__vector_size__ (16)));
+
+uv1ti ashl(uv1ti x, unsigned int i) { return x << i; }
+uv1ti lshr(uv1ti x, unsigned int i) { return x >> i; }
+sv1ti ashr(sv1ti x, unsigned int i) { return x >> i; }
+uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
+uv1ti rotl(uv1ti x, unsigned int i) { return (x << i) | (x >> (128-i)); }
+
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-3.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-3.c
new file mode 100644
index 0000000..8d5c122
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-3.c
@@ -0,0 +1,113 @@ 
+/* PR target/102986 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 sv1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
+
+typedef unsigned __int128 uti;
+typedef __int128 sti;
+typedef __int128 ti;
+
+uv1ti ashl_v1ti(uv1ti x, unsigned int i) { return x << i; }
+uv1ti lshr_v1ti(uv1ti x, unsigned int i) { return x >> i; }
+sv1ti ashr_v1ti(sv1ti x, unsigned int i) { return x >> i; }
+uv1ti rotr_v1ti(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
+uv1ti rotl_v1ti(uv1ti x, unsigned int i) { return (x << i) | (x >> (128-i)); }
+
+uti ashl_ti(uti x, unsigned int i) { return x << i; }
+uti lshr_ti(uti x, unsigned int i) { return x >> i; }
+sti ashr_ti(sti x, unsigned int i) { return x >> i; }
+uti rotr_ti(uti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
+uti rotl_ti(uti x, unsigned int i) { return (x << i) | (x >> (128-i)); }
+
+void test(ti x)
+{
+  unsigned int i;
+  uv1ti ut = (uv1ti)x;
+  sv1ti st = (sv1ti)x;
+
+  for (i=0; i<128; i++) {
+    if ((ti)ashl_v1ti(ut,i) != (ti)ashl_ti(x,i))
+      __builtin_abort();
+    if ((ti)lshr_v1ti(ut,i) != (ti)lshr_ti(x,i))
+      __builtin_abort();
+    if ((ti)ashr_v1ti(st,i) != (ti)ashr_ti(x,i))
+      __builtin_abort();
+    if ((ti)rotr_v1ti(ut,i) != (ti)rotr_ti(x,i))
+      __builtin_abort();
+    if ((ti)rotl_v1ti(ut,i) != (ti)rotl_ti(x,i))
+      __builtin_abort();
+  }
+}
+
+int main()
+{
+  ti x;
+
+  x = ((ti)0x0011223344556677ull)<<64 | 0x8899aabbccddeeffull;
+  test(x);
+  x = ((ti)0xffeeddccbbaa9988ull)<<64 | 0x7766554433221100ull;
+  test(x);
+  x = ((ti)0x0123456789abcdefull)<<64 | 0x0123456789abcdefull;
+  test(x);
+  x = ((ti)0xfedcba9876543210ull)<<64 | 0xfedcba9876543210ull;
+  test(x);
+  x = ((ti)0x0123456789abcdefull)<<64 | 0xfedcba9876543210ull;
+  test(x);
+  x = ((ti)0xfedcba9876543210ull)<<64 | 0x0123456789abcdefull;
+  test(x);
+  x = 0;
+  test(x);
+  x = 0xffffffffffffffffull;
+  test(x);
+  x = ((ti)0xffffffffffffffffull)<<64;
+  test(x);
+  x = ((ti)0xffffffffffffffffull)<<64 | 0xffffffffffffffffull;
+  test(x);
+  x = ((ti)0x5a5a5a5a5a5a5a5aull)<<64 | 0x5a5a5a5a5a5a5a5aull;
+  test(x);
+  x = ((ti)0xa5a5a5a5a5a5a5a5ull)<<64 | 0xa5a5a5a5a5a5a5a5ull;
+  test(x);
+  x = 0xffull;
+  test(x);
+  x = 0xff00ull;
+  test(x);
+  x = 0xff0000ull;
+  test(x);
+  x = 0xff000000ull;
+  test(x);
+  x = 0xff00000000ull;
+  test(x);
+  x = 0xff0000000000ull;
+  test(x);
+  x = 0xff000000000000ull;
+  test(x);
+  x = 0xff00000000000000ull;
+  test(x);
+  x = ((ti)0xffull)<<64;
+  test(x);
+  x = ((ti)0xff00ull)<<64;
+  test(x);
+  x = ((ti)0xff0000ull)<<64;
+  test(x);
+  x = ((ti)0xff000000ull)<<64;
+  test(x);
+  x = ((ti)0xff00000000ull)<<64;
+  test(x);
+  x = ((ti)0xff0000000000ull)<<64;
+  test(x);
+  x = ((ti)0xff000000000000ull)<<64;
+  test(x);
+  x = ((ti)0xff00000000000000ull)<<64;
+  test(x);
+  x = 0xdeadbeefcafebabeull;
+  test(x);
+  x = ((ti)0xdeadbeefcafebabeull)<<64;
+  test(x);
+
+  return 0;
+}
+