From patchwork Sat Nov 27 09:03:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 48224 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7A2003857C7A for ; Sat, 27 Nov 2021 09:04:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7A2003857C7A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1638003881; bh=vUxqhPfBqhpGRQkFKnWwN1vPz3FJ/fTisl/xACYRbpM=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=C+liwvKa/7Cmzk/6Z8jYRC7TWiaUbOUz3G65paB2CKAYa4aZGRg5+FJJrNgmxqaOv iULq784ihDgfwiaeE+vZY6EdhLCq5iwkdYEoxcLsHJqFkLvXFNpQgeK6FpgoQTSsH/ xnA92MtXOQxP5Pq8DnTq5fRCmvPIP1gpc8y+AHKs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id E03393858C3A for ; Sat, 27 Nov 2021 09:04:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E03393858C3A Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-404-niA-NEUUPYmcP-ABoI9zxg-1; Sat, 27 Nov 2021 04:04:04 -0500 X-MC-Unique: niA-NEUUPYmcP-ABoI9zxg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C02EB1006AA0; Sat, 27 Nov 2021 09:04:03 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.23]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 33DE960BF1; Sat, 27 Nov 2021 09:04:03 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 1AR940Ax1746238 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Sat, 27 Nov 2021 10:04:00 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 1AR940Cn1746237; Sat, 27 Nov 2021 10:04:00 +0100 Date: Sat, 27 Nov 2021 10:03:59 +0100 To: Uros Bizjak Subject: [PATCH] x86: Fix up x86_{,64_}sh{l,r}d patterns [PR103431] Message-ID: <20211127090359.GU2646553@tucnak> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi! The following testcase is miscompiled because the x86_{,64_}sh{l,r}d patterns don't properly describe what the instructions do. One thing is left out, in particular that there is initial count &= 63 for sh{l,r}dq and initial count &= 31 for sh{l,r}d{l,w}. And another thing not described properly, in particular the behavior when count (after the masking) is 0. The pattern says it is e.g. res = (op0 << op2) | (op1 >> (64 - op2)) but that triggers UB on op1 >> 64. For op2 0 we actually want res = (op0 << op2) | 0 When constants are propagated to these patterns during RTL optimizations, both such problems trigger wrong-code issues. This patch represents the patterns as e.g. res = (op0 << (op2 & 63)) | (unsigned long long) ((uint128_t) op1 >> (64 - (op2 & 63))) so there is both the initial masking and op2 == 0 behavior results in zero being ored. The patch introduces alternate patterns for constant op2 where simplify-rtx.c will fold those expressions into simple numbers, and define_insn_and_split pre-reload splitter for how the patterns looked before into the new form, so that it can pattern match during combine even computations that assumed the shift amount will be in the range of 1 .. bitsize-1. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-11-27 Jakub Jelinek PR middle-end/103431 * config/i386/i386.md (x86_64_shld, x86_shld, x86_64_shrd, x86_shrd): Change insn pattern to accurately describe the instructions. (*x86_64_shld_1, *x86_shld_1, *x86_64_shrd_1, *x86_shrd_1): New define_insn patterns. (*x86_64_shld_2, *x86_shld_2, *x86_64_shrd_2, *x86_shrd_2): New define_insn_and_split patterns. (*ashl3_doubleword_mask, *ashl3_doubleword_mask_1, *3_doubleword_mask, *3_doubleword_mask_1, ix86_rotl3_doubleword, ix86_rotr3_doubleword): Adjust splitters for x86_{,64_}sh{l,r}d pattern changes. * gcc.dg/pr103431.c: New test. Jakub --- gcc/config/i386/i386.md.jj 2021-11-26 10:10:05.912797010 +0100 +++ gcc/config/i386/i386.md 2021-11-26 16:09:42.242483871 +0100 @@ -11301,9 +11301,12 @@ (define_insn_and_split "*ashl3_doub "&& 1" [(parallel [(set (match_dup 6) - (ior:DWIH (ashift:DWIH (match_dup 6) (match_dup 2)) - (lshiftrt:DWIH (match_dup 5) - (minus:QI (match_dup 8) (match_dup 2))))) + (ior:DWIH (ashift:DWIH (match_dup 6) + (and:QI (match_dup 2) (match_dup 8))) + (subreg:DWIH + (lshiftrt: (zero_extend: (match_dup 5)) + (minus:QI (match_dup 9) + (and:QI (match_dup 2) (match_dup 8)))) 0))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 4) @@ -11312,7 +11315,8 @@ (define_insn_and_split "*ashl3_doub { split_double_mode (mode, &operands[0], 2, &operands[4], &operands[6]); - operands[8] = GEN_INT ( * BITS_PER_UNIT); + operands[8] = GEN_INT ( * BITS_PER_UNIT - 1); + operands[9] = GEN_INT ( * BITS_PER_UNIT); if ((INTVAL (operands[3]) & (( * BITS_PER_UNIT) - 1)) != (( * BITS_PER_UNIT) - 1)) @@ -11342,9 +11346,12 @@ (define_insn_and_split "*ashl3_doub "&& 1" [(parallel [(set (match_dup 6) - (ior:DWIH (ashift:DWIH (match_dup 6) (match_dup 2)) - (lshiftrt:DWIH (match_dup 5) - (minus:QI (match_dup 8) (match_dup 2))))) + (ior:DWIH (ashift:DWIH (match_dup 6) + (and:QI (match_dup 2) (match_dup 8))) + (subreg:DWIH + (lshiftrt: (zero_extend: (match_dup 5)) + (minus:QI (match_dup 9) + (and:QI (match_dup 2) (match_dup 8)))) 0))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 4) @@ -11353,7 +11360,8 @@ (define_insn_and_split "*ashl3_doub { split_double_mode (mode, &operands[0], 2, &operands[4], &operands[6]); - operands[8] = GEN_INT ( * BITS_PER_UNIT); + operands[8] = GEN_INT ( * BITS_PER_UNIT - 1); + operands[9] = GEN_INT ( * BITS_PER_UNIT); if ((INTVAL (operands[3]) & (( * BITS_PER_UNIT) - 1)) != (( * BITS_PER_UNIT) - 1)) @@ -11404,9 +11412,14 @@ (define_peephole2 (define_insn "x86_64_shld" [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") (ior:DI (ashift:DI (match_dup 0) - (match_operand:QI 2 "nonmemory_operand" "Jc")) - (lshiftrt:DI (match_operand:DI 1 "register_operand" "r") - (minus:QI (const_int 64) (match_dup 2))))) + (and:QI (match_operand:QI 2 "nonmemory_operand" "Jc") + (const_int 63))) + (subreg:DI + (lshiftrt:TI + (zero_extend:TI + (match_operand:DI 1 "register_operand" "r")) + (minus:QI (const_int 64) + (and:QI (match_dup 2) (const_int 63)))) 0))) (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT" "shld{q}\t{%s2%1, %0|%0, %1, %2}" @@ -11417,12 +11430,58 @@ (define_insn "x86_64_shld" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "*x86_64_shld_1" + [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") + (ior:DI (ashift:DI (match_dup 0) + (match_operand:QI 2 "const_0_to_63_operand" "J")) + (subreg:DI + (lshiftrt:TI + (zero_extend:TI + (match_operand:DI 1 "register_operand" "r")) + (match_operand:QI 3 "const_0_to_255_operand" "N")) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT + && INTVAL (operands[3]) == 64 - INTVAL (operands[2])" + "shld{q}\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ishift") + (set_attr "prefix_0f" "1") + (set_attr "mode" "DI") + (set_attr "athlon_decode" "vector") + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) + +(define_insn_and_split "*x86_64_shld_2" + [(set (match_operand:DI 0 "nonimmediate_operand") + (ior:DI (ashift:DI (match_dup 0) + (match_operand:QI 2 "nonmemory_operand")) + (lshiftrt:DI (match_operand:DI 1 "register_operand") + (minus:QI (const_int 64) (match_dup 2))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 0) + (ior:DI (ashift:DI (match_dup 0) + (and:QI (match_dup 2) (const_int 63))) + (subreg:DI + (lshiftrt:TI + (zero_extend:TI (match_dup 1)) + (minus:QI (const_int 64) + (and:QI (match_dup 2) + (const_int 63)))) 0))) + (clobber (reg:CC FLAGS_REG))])]) + (define_insn "x86_shld" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") (ior:SI (ashift:SI (match_dup 0) - (match_operand:QI 2 "nonmemory_operand" "Ic")) - (lshiftrt:SI (match_operand:SI 1 "register_operand" "r") - (minus:QI (const_int 32) (match_dup 2))))) + (and:QI (match_operand:QI 2 "nonmemory_operand" "Ic") + (const_int 31))) + (subreg:SI + (lshiftrt:DI + (zero_extend:DI + (match_operand:SI 1 "register_operand" "r")) + (minus:QI (const_int 32) + (and:QI (match_dup 2) (const_int 31)))) 0))) (clobber (reg:CC FLAGS_REG))] "" "shld{l}\t{%s2%1, %0|%0, %1, %2}" @@ -11434,6 +11493,47 @@ (define_insn "x86_shld" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "*x86_shld_1" + [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") + (ior:SI (ashift:SI (match_dup 0) + (match_operand:QI 2 "const_0_to_31_operand" "I")) + (subreg:SI + (lshiftrt:DI + (zero_extend:DI + (match_operand:SI 1 "register_operand" "r")) + (match_operand:QI 3 "const_0_to_63_operand" "J")) 0))) + (clobber (reg:CC FLAGS_REG))] + "INTVAL (operands[3]) == 32 - INTVAL (operands[2])" + "shld{l}\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ishift") + (set_attr "prefix_0f" "1") + (set_attr "mode" "SI") + (set_attr "pent_pair" "np") + (set_attr "athlon_decode" "vector") + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) + +(define_insn_and_split "*x86_shld_2" + [(set (match_operand:SI 0 "nonimmediate_operand") + (ior:SI (ashift:SI (match_dup 0) + (match_operand:QI 2 "nonmemory_operand")) + (lshiftrt:SI (match_operand:SI 1 "register_operand") + (minus:QI (const_int 32) (match_dup 2))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 0) + (ior:SI (ashift:SI (match_dup 0) + (and:QI (match_dup 2) (const_int 31))) + (subreg:SI + (lshiftrt:DI + (zero_extend:DI (match_dup 1)) + (minus:QI (const_int 32) + (and:QI (match_dup 2) + (const_int 31)))) 0))) + (clobber (reg:CC FLAGS_REG))])]) + (define_expand "@x86_shift_adj_1" [(set (reg:CCZ FLAGS_REG) (compare:CCZ (and:QI (match_operand:QI 2 "register_operand") @@ -12080,9 +12180,12 @@ (define_insn_and_split "*3_do "&& 1" [(parallel [(set (match_dup 4) - (ior:DWIH (lshiftrt:DWIH (match_dup 4) (match_dup 2)) - (ashift:DWIH (match_dup 7) - (minus:QI (match_dup 8) (match_dup 2))))) + (ior:DWIH (lshiftrt:DWIH (match_dup 4) + (and:QI (match_dup 2) (match_dup 8))) + (subreg:DWIH + (ashift: (zero_extend: (match_dup 7)) + (minus:QI (match_dup 9) + (and:QI (match_dup 2) (match_dup 8)))) 0))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 6) @@ -12091,7 +12194,8 @@ (define_insn_and_split "*3_do { split_double_mode (mode, &operands[0], 2, &operands[4], &operands[6]); - operands[8] = GEN_INT ( * BITS_PER_UNIT); + operands[8] = GEN_INT ( * BITS_PER_UNIT - 1); + operands[9] = GEN_INT ( * BITS_PER_UNIT); if ((INTVAL (operands[3]) & (( * BITS_PER_UNIT) - 1)) != (( * BITS_PER_UNIT) - 1)) @@ -12121,9 +12225,12 @@ (define_insn_and_split "*3_do "&& 1" [(parallel [(set (match_dup 4) - (ior:DWIH (lshiftrt:DWIH (match_dup 4) (match_dup 2)) - (ashift:DWIH (match_dup 7) - (minus:QI (match_dup 8) (match_dup 2))))) + (ior:DWIH (lshiftrt:DWIH (match_dup 4) + (and:QI (match_dup 2) (match_dup 8))) + (subreg:DWIH + (ashift: (zero_extend: (match_dup 7)) + (minus:QI (match_dup 9) + (and:QI (match_dup 2) (match_dup 8)))) 0))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 6) @@ -12132,7 +12239,8 @@ (define_insn_and_split "*3_do { split_double_mode (mode, &operands[0], 2, &operands[4], &operands[6]); - operands[8] = GEN_INT ( * BITS_PER_UNIT); + operands[8] = GEN_INT ( * BITS_PER_UNIT - 1); + operands[9] = GEN_INT ( * BITS_PER_UNIT); if ((INTVAL (operands[3]) & (( * BITS_PER_UNIT) - 1)) != (( * BITS_PER_UNIT) - 1)) @@ -12177,9 +12285,14 @@ (define_peephole2 (define_insn "x86_64_shrd" [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") (ior:DI (lshiftrt:DI (match_dup 0) - (match_operand:QI 2 "nonmemory_operand" "Jc")) - (ashift:DI (match_operand:DI 1 "register_operand" "r") - (minus:QI (const_int 64) (match_dup 2))))) + (and:QI (match_operand:QI 2 "nonmemory_operand" "Jc") + (const_int 63))) + (subreg:DI + (ashift:TI + (zero_extend:TI + (match_operand:DI 1 "register_operand" "r")) + (minus:QI (const_int 64) + (and:QI (match_dup 2) (const_int 63)))) 0))) (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT" "shrd{q}\t{%s2%1, %0|%0, %1, %2}" @@ -12190,12 +12303,58 @@ (define_insn "x86_64_shrd" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "*x86_64_shrd_1" + [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") + (ior:DI (lshiftrt:DI (match_dup 0) + (match_operand:QI 2 "const_0_to_63_operand" "J")) + (subreg:DI + (ashift:TI + (zero_extend:TI + (match_operand:DI 1 "register_operand" "r")) + (match_operand:QI 3 "const_0_to_255_operand" "N")) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT + && INTVAL (operands[3]) == 64 - INTVAL (operands[2])" + "shrd{q}\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ishift") + (set_attr "prefix_0f" "1") + (set_attr "mode" "DI") + (set_attr "athlon_decode" "vector") + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) + +(define_insn_and_split "*x86_64_shrd_2" + [(set (match_operand:DI 0 "nonimmediate_operand") + (ior:DI (lshiftrt:DI (match_dup 0) + (match_operand:QI 2 "nonmemory_operand")) + (ashift:DI (match_operand:DI 1 "register_operand") + (minus:QI (const_int 64) (match_dup 2))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 0) + (ior:DI (lshiftrt:DI (match_dup 0) + (and:QI (match_dup 2) (const_int 63))) + (subreg:DI + (ashift:TI + (zero_extend:TI (match_dup 1)) + (minus:QI (const_int 64) + (and:QI (match_dup 2) + (const_int 63)))) 0))) + (clobber (reg:CC FLAGS_REG))])]) + (define_insn "x86_shrd" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") (ior:SI (lshiftrt:SI (match_dup 0) - (match_operand:QI 2 "nonmemory_operand" "Ic")) - (ashift:SI (match_operand:SI 1 "register_operand" "r") - (minus:QI (const_int 32) (match_dup 2))))) + (and:QI (match_operand:QI 2 "nonmemory_operand" "Ic") + (const_int 31))) + (subreg:SI + (ashift:DI + (zero_extend:DI + (match_operand:SI 1 "register_operand" "r")) + (minus:QI (const_int 32) + (and:QI (match_dup 2) (const_int 31)))) 0))) (clobber (reg:CC FLAGS_REG))] "" "shrd{l}\t{%s2%1, %0|%0, %1, %2}" @@ -12207,6 +12366,47 @@ (define_insn "x86_shrd" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "*x86_shrd_1" + [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") + (ior:SI (lshiftrt:SI (match_dup 0) + (match_operand:QI 2 "const_0_to_31_operand" "I")) + (subreg:SI + (ashift:DI + (zero_extend:DI + (match_operand:SI 1 "register_operand" "r")) + (match_operand:QI 3 "const_0_to_63_operand" "J")) 0))) + (clobber (reg:CC FLAGS_REG))] + "INTVAL (operands[3]) == 32 - INTVAL (operands[2])" + "shrd{l}\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ishift") + (set_attr "prefix_0f" "1") + (set_attr "mode" "SI") + (set_attr "pent_pair" "np") + (set_attr "athlon_decode" "vector") + (set_attr "amdfam10_decode" "vector") + (set_attr "bdver1_decode" "vector")]) + +(define_insn_and_split "*x86_shrd_2" + [(set (match_operand:SI 0 "nonimmediate_operand") + (ior:SI (lshiftrt:SI (match_dup 0) + (match_operand:QI 2 "nonmemory_operand")) + (ashift:SI (match_operand:SI 1 "register_operand") + (minus:QI (const_int 32) (match_dup 2))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 0) + (ior:SI (lshiftrt:SI (match_dup 0) + (and:QI (match_dup 2) (const_int 31))) + (subreg:SI + (ashift:DI + (zero_extend:DI (match_dup 1)) + (minus:QI (const_int 32) + (and:QI (match_dup 2) + (const_int 31)))) 0))) + (clobber (reg:CC FLAGS_REG))])]) + ;; Base name for insn mnemonic. (define_mode_attr cvt_mnemonic [(SI "{cltd|cdq}") (DI "{cqto|cqo}")]) @@ -12784,18 +12984,27 @@ (define_insn_and_split "ix86_rotl3_ [(set (match_dup 3) (match_dup 4)) (parallel [(set (match_dup 4) - (ior:DWIH (ashift:DWIH (match_dup 4) (match_dup 2)) - (lshiftrt:DWIH (match_dup 5) - (minus:QI (match_dup 6) (match_dup 2))))) + (ior:DWIH (ashift:DWIH (match_dup 4) + (and:QI (match_dup 2) (match_dup 6))) + (subreg:DWIH + (lshiftrt: (zero_extend: (match_dup 5)) + (minus:QI (match_dup 7) + (and:QI (match_dup 2) + (match_dup 6)))) 0))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 5) - (ior:DWIH (ashift:DWIH (match_dup 5) (match_dup 2)) - (lshiftrt:DWIH (match_dup 3) - (minus:QI (match_dup 6) (match_dup 2))))) + (ior:DWIH (ashift:DWIH (match_dup 5) + (and:QI (match_dup 2) (match_dup 6))) + (subreg:DWIH + (lshiftrt: (zero_extend: (match_dup 3)) + (minus:QI (match_dup 7) + (and:QI (match_dup 2) + (match_dup 6)))) 0))) (clobber (reg:CC FLAGS_REG))])] { - operands[6] = GEN_INT (GET_MODE_BITSIZE (mode)); + operands[6] = GEN_INT (GET_MODE_BITSIZE (mode) - 1); + operands[7] = GEN_INT (GET_MODE_BITSIZE (mode)); split_double_mode (mode, &operands[0], 1, &operands[4], &operands[5]); }) @@ -12812,18 +13021,27 @@ (define_insn_and_split "ix86_rotr3_ [(set (match_dup 3) (match_dup 4)) (parallel [(set (match_dup 4) - (ior:DWIH (lshiftrt:DWIH (match_dup 4) (match_dup 2)) - (ashift:DWIH (match_dup 5) - (minus:QI (match_dup 6) (match_dup 2))))) + (ior:DWIH (lshiftrt:DWIH (match_dup 4) + (and:QI (match_dup 2) (match_dup 6))) + (subreg:DWIH + (ashift: (zero_extend: (match_dup 5)) + (minus:QI (match_dup 7) + (and:QI (match_dup 2) + (match_dup 6)))) 0))) (clobber (reg:CC FLAGS_REG))]) (parallel [(set (match_dup 5) - (ior:DWIH (lshiftrt:DWIH (match_dup 5) (match_dup 2)) - (ashift:DWIH (match_dup 3) - (minus:QI (match_dup 6) (match_dup 2))))) + (ior:DWIH (lshiftrt:DWIH (match_dup 5) + (and:QI (match_dup 2) (match_dup 6))) + (subreg:DWIH + (ashift: (zero_extend: (match_dup 3)) + (minus:QI (match_dup 7) + (and:QI (match_dup 2) + (match_dup 6)))) 0))) (clobber (reg:CC FLAGS_REG))])] { - operands[6] = GEN_INT (GET_MODE_BITSIZE (mode)); + operands[6] = GEN_INT (GET_MODE_BITSIZE (mode) - 1); + operands[7] = GEN_INT (GET_MODE_BITSIZE (mode)); split_double_mode (mode, &operands[0], 1, &operands[4], &operands[5]); }) --- gcc/testsuite/gcc.dg/pr103431.c.jj 2021-11-26 16:33:24.412340246 +0100 +++ gcc/testsuite/gcc.dg/pr103431.c 2021-11-26 11:59:26.389714033 +0100 @@ -0,0 +1,21 @@ +/* PR middle-end/103431 */ +/* { dg-do run { target int128 } } */ +/* { dg-options "-O -fno-tree-bit-ccp -fno-tree-dominator-opts" } */ + +__attribute__((noipa)) +void foo (unsigned short a) +{ + __uint128_t b = 5; + int size = __SIZEOF_INT128__ * __CHAR_BIT__ - 1; + a /= 0xfffffffd; + __uint128_t c = (b << (a & size) | b >> (-(a & size) & size)); + if (c != 5) + __builtin_abort (); +} + +int +main () +{ + foo (0); + return 0; +}